Understanding the Contributions of Paleo‐Informed Natural Variability and Climate Changes to Hydroclimate Extremes in the San Joaquin Valley of California

To aid California's water sector to better understand and manage future climate extremes, we present a method for creating a regionally consistent ensemble of plausible daily future climate and streamflow scenarios that represent natural climate variability captured in a network of tree‐ring chronologies, and then embed anthropogenic climate change trends within those scenarios. We use 600 years of paleo‐reconstructed weather regimes to force a stochastic weather generator, which we develop for five subbasins in the San Joaquin Valley of California. To assess the compound effects of climate change, we create temperature series that reflect projected scenarios of warming and precipitation series that have been scaled to reflect thermodynamically driven shifts in the distribution of daily precipitation. We then use these weather scenarios to force hydrologic models for each of the five subbasins. The paleo‐forced streamflow scenarios highlight periods in the region's past that produce flood and drought extremes that surpass those in the modern record and exhibit large non‐stationarity through the reconstruction. Variance decomposition is employed to characterize the contribution of natural variability and climate change to variability in decision‐relevant metrics related to floods and drought. Our results show that a large portion of variability in individual subbasin and spatially compounding extreme events can be attributed to natural variability, but that anthropogenic climate changes become more influential at longer planning horizons. The joint importance of climate change and natural variability in shaping extreme floods and droughts is critical to resilient water systems planning and management in the San Joaquin.

The historic droughts and floods above, independent of anthropogenic-related warming, confirm the strong influence of natural climate variability in California and more broadly across the Western U.S.However, recent studies show that climate change is amplifying the frequency and severity of these extremes.Warming due to anthropogenic radiative forcing has intensified recent droughts in the region, primarily through enhanced atmospheric moisture demand and soil moisture depletion (Williams et al., 2020).As noted above, the recent cumulative drought conditions in California and the rest of the Western U.S. over the past two decades now ranks as the driest 22-year period in at least 1,200 years (Williams et al., 2022).Similarly, climate change is increasing the risk of extreme precipitation events via an increase in the strength of cool-season AR events associated with a rise in atmospheric moisture content (Kirchmeier-Young & Zhang, 2020;Kunkel, 2003).A recent study by X. Huang and Swain (2022) found that climate change has already doubled the likelihood of AR-driven megastorms similar to that which caused the Great Flood of 1861-1862, and that megastorm sequences of increased frequency and larger magnitude are likely with continued warming.Thus, the present and evolving risks posed by hydrologic extremes in California is defined by the combined influence of natural climate variability and anthropogenic climate change.Yet the degree to which these two factors will control the future frequency and magnitude of extremes remains uncertain (Bass et al., 2022;Hamlet & Lettenmaier, 2007;Siler et al., 2019).From the perspective of water resource decision-makers who are charged with planning and managing large-scale infrastructure to mitigate the impacts of extreme events, reducing some of the ambiguity on how natural variability and climate changes shape the evolution of extremes is critical to inform planners on how to jointly manage for these drivers' effects.For example, if climate change is identified as the dominant factor that will determine the future magnitude, frequency, and duration of extreme events, then resources and attention will need to be concentrated on identifying and narrowing the uncertainty of the most prominent climate change signals and propagating them into updated design event estimates used for planning.However, if natural variability plays an equal or larger role in determining the properties of hydrologic extremes relevant to engineering design, then research into the plausible range of extremes due to natural variability should be equally prioritized (e.g., see Koutsoyiannis, 2021).To appropriately manage for both drivers, a mix of dynamic and reversible adaptations along with irreversible investments must be utilized.It is thus critically important to quantify the relative and joint roles of climate change versus natural variability in shaping the characteristics of hydrologic extremes, to help balance the allocation of attention and resources in a way that best serves the water sector to prepare for plausible changes in future extreme events.
A growing body of work has sought to partition the relative effects of climate change and natural variability, with a focus on climate variables and in the context of multi-model ensemble simulations (Hawkins & Sutton, 2009;Knutti et al., 2017;Lehner & Deser, 2023;Lehner et al., 2020;Rowell, 2012;Yip et al., 2011).These studies primarily attribute variability in projected global and regional temperature and precipitation to climate change scenario uncertainty, global climate change model (GCM) uncertainty, and natural variability.Lehner et al. (2020) shows that scenario and model uncertainty are the dominant drivers of global decadal mean annual temperature and precipitation, but that natural variability dominates projections of GUPTA ET AL.

10.1029/2023EF003909
3 of 29 regional temperatures (in Southern Europe) and precipitation (in the U.S. Pacific Northwest and Sahel region), particularly at shorter time scales (i.e., those critical to informing near-term management and infrastructure investment).Lehner and Deser (2023) similarly demonstrates how natural variability becomes the increasingly dominant driver of variability in winter mean air temperature projections at smaller spatial scales.Fewer studies have explicitly considered the role of natural climate variability when partitioning variance in projections of hydrologic and water systems variables (Cai et al., 2021;I.-W. Jung et al., 2011;Kay et al., 2009;Schlef et al., 2018;Vidal et al., 2015;Whateley & Brown, 2016).Kay et al. (2009) found that flood frequency and wintertime runoff in Europe are mostly influenced by choice of GCM, although they quantified natural climate variability using a limited number of GCM integrations with different initial conditions.Vidal et al. (2015) found that natural variability highly influences low flows in snow-dominated catchments in the French Alps, and Cai et al. (2021) found that natural variability is a dominant driver of rainy season runoff in Northeastern China.I.-W.Jung et al. (2011) quantified natural variability using a block bootstrap on the historical record and found it to have the largest impact on the variance of large floods, as compared to GCM structure, emission scenario, land use change scenario, and hydrologic model parameter uncertainty.Similarly, Schlef et al. (2018) and Whateley and Brown (2016) created ensembles of future streamflow projections with a stochastic weather generator and rainfall-runoff model and found that the variance of reservoir storage as well as water system performance measures is mostly driven by natural climate variability, particularly in the first few decades of the projections.
The relative roles that natural variability and climate change play on influencing the variance of hydrologic variables of interest often depends on how natural variability is quantified and propagated into an ensemble of projections.In a majority of the climate studies (Hawkins & Sutton, 2009;Knutti et al., 2017;Lehner et al., 2020;Rowell, 2012;Yip et al., 2011) and three hydrologic studies (I.-W.Jung et al., 2011;Kay et al., 2009;Vidal et al., 2015) referenced above, natural variability was determined using multi-member ensembles of GCMs run with different initial conditions.However, the degree to which initial-condition ensembles can represent true natural climate variability is unclear (Deser et al., 2020).For instance, these models poorly represent regional precipitation and drought persistence (Moon et al., 2018;Rocheta et al., 2014) and underestimate AR moisture flux and frequency (Zhou & Kim, 2018); all of which are important for regional planning and management of water systems.While the recent generation of models in CMIP6 better represents key features of natural climate variability (e.g., blocking; major climate modes) compared to older generations, significant biases remain (Schiemann et al., 2020;Tatebe et al., 2019).
An alternative way to estimate the relative influence of natural variability and climate change on regional hydrologic response is through bottom-up approaches that employ stochastically generated ensembles (Dessai & Hulme, 2004;Nazemi & Wheater, 2014;Wilby & Dessai, 2010).These methods require synthetic generators trained on observed weather or hydrologic records, which can generate large ensembles of scenarios that extrapolate beyond the observation limits of the historical record, maintain physical plausibility, and embed climate changes into the ensemble.The generation and partitioning of variability in the resulting hydroclimate metrics could provide a more robust way to quantify how much variance in regional hydrologic extremes is driven by climate changes versus natural variability.However, the availability of stochastic models to support these analyses is limited, particularly when interested in the variance decomposition of multiple properties of different hydrologic extremes (i.e., magnitude, duration, frequency, and spatial coherence of floods and droughts).Furthermore, the ways in which flood and drought events are defined, and particularly the time horizon (moving window) over which they are defined, can influence how the relative influences of climate variability and change are perceived.As time horizon shortens, it becomes increasingly difficult to identify clear climate change signals amongst the noise of natural climate variability.For example, Lehner et al. (2020) attributed the vast majority of variance in winter precipitation projections over the U.S. Pacific Northwest to natural climate variability, but this was based on a 10-year moving window (i.e., decadal average).It is possible that any climate change impacts on mean winter precipitation, even if present, are not discernible from the noise within such a short moving window.This issue is especially true for the properties of extreme events, because there are so few samples available from which to estimate signal from noise (even with many ensemble members).To date, it remains unclear how the choice of time horizon influences our understanding of the relative roles of natural climate variability and climate change on the uncertainty in hydrologic extremes.
Based on the above knowledge gaps, this study addresses the following questions: To answer these questions, we contribute a framework for creating a regionally consistent ensemble of plausible daily future climate and streamflow scenarios that represent natural climate variability captured in a network of tree-ring chronologies, and then embed anthropogenic climate change trends within those scenarios.A key contribution of this study is a method to utilize 600 years of paleo-informed reconstructions of annual weather regimes (WRs; Gupta et al., 2023a) to force a weather-regime based daily-scale stochastic weather generator (Najibi et al., 2021;Steinschneider et al., 2019), which we utilize to force hydrologic models of five watersheds in the San Joaquin River basin.This novel component of the framework allows for the generation of regional precipitation and temperature that represents the substantial natural atmospheric variability that characterizes the San Joaquin region, addressing the need to integrate such variability into data sets used to inform robust water resources planning and management (Lehner & Deser, 2023).To assess the compound effect of climate change, we create temperature series that reflect projected scenarios of warming and precipitation series that have been scaled to reflect thermodynamically driven shifts in the distribution of daily precipitation.We then use these weather scenarios to force hydrologic models for each basin, generating ensembles of streamflow across the region.Decision relevant hydrologic metrics for characterizing flood and drought conditions are defined and calculated across San Joaquin subbasins and across the paleo-period using time horizons of varying scale (see Appendix B).Variance decomposition is then employed to characterize the relative contributions of natural variability and climate changes as drivers of flood and drought hazards in individual subbasins and for spatially compounding extremes that emerge across groups of subbasins.

Data and Methods
This study focuses on five subbasins within the San Joaquin River basin (Figure 1): the Tuolumne River, the Merced River, the San Joaquin River, the Stanislaus River, and the Calaveras River.The ultimate goal of this study is to partition the effects of natural climate variability and climate change on different properties of floods and droughts across these watersheds.We contribute a six-step methodology in order to achieve this goal (Figure 2).We first create a novel method to incorporate reconstructed WR dynamics (Gupta et al., 2023a) into the generation of daily weather through the paleo period (Sections 2.1 and 2.2).Then, we create 600 years of surface weather ensembles across five subbasins of the San Joaquin conditioned upon these reconstructed dynamics.We also create additional ensembles of surface weather layered with thermodynamic climate changes, such as temperature trends and precipitation scaling (Section 2.3).These ensembles are forced through hydrologic models (Sacramento Soil and Moisture Accounting Model [SAC-SMA] and the HYMOD conceptual hydrologic model [HYMOD]) calibrated for each subbasin to generate ensembles of daily streamflow (Section 2.4).From these streamflow ensembles, we calculate flood and drought metrics, including copula-based metrics to quantify joint flood hazard across basins (Section 2.5).Finally, analysis of variance (ANOVA) is used to partition the contribution of natural variability and the imposed climate changes to variability in the different flood and drought metrics considered (Section 2.6).

Previous Reconstruction of Annual Weather Regime Dynamics
The foundation of generating paleo-based weather and streamflow in this study rests on utilizing reconstructions of annual WR dynamics that are created in Gupta et al. (2023a).Readers are asked to refer to our prior published paper for the full details on the methodology and our diagnostic evaluation of the reconstruction.In summary, we contributed a multi-objective optimization and regression-based framework to reconstruct the annual frequency of five WRs that influence regional weather in California.The framework utilizes a gridded tree-ring based reconstruction of the standardized precipitation index developed by Williams et al. (2021) and extended in Borkotoky et al. (2021) that is based on a network of 1620 detrended chronologies of ring width index across western North America.The approach selects the optimal usage of tree-ring information to balance the preservation of low and high frequency modes of variability in the WR dynamics.Using a set of selected regression hyperparameters, we reconstruct four principal components of WR occurrence as well as the annual frequency of WRs back to the year 1400 CE.We demonstrate the skill of the reconstruction in two ways: (a) showing the ability of the reconstruction to capture well-documented megadrought and pluvial periods and (b) comparing our reconstructed frequency of WRs in select periods that overlap with reconstructed geopotential height anomalies from a similar study, Wise and Dannenberg (2014).In many of these overlapping periods, we see consistency across our identified highest frequency WRs and the geopotential height anomaly maps created in Wise and Dannenberg (2014).Further, our reconstruction reveals periods where the annual frequency and variability of WRs extends far beyond the dynamics observed in the instrumental record.One of the products from Gupta et al. (2023a), the first four reconstructed principal components of annual WR occurrence (termed PC WR in our prior paper and in this study), effectively contains all the information on the annual frequencies of the five WRs and is used to drive the generation of daily WR sequences discussed in Section 2.2.Thus, building on Gupta et al. (2023a), the present study highlights the importance of using paleodata to (a) better capture the significant natural atmospheric variability that characterizes the San Joaquin region and (b) develop a new quantitative framework to incorporate this information into daily-scale synthetic weather generation.

Generation of Daily Weather Regime Dynamics
Ensembles of plausible future climate are generated using extensions of the WR-based stochastic weather generator presented in Najibi et al. (2021) and Steinschneider et al. (2019) to incorporate paleo-reconstructions of WR dynamics.The generator is comprised of a three-step hierarchical structure (Figure 3): (a) identification   from Gupta et al. (2023a).The NHMM is fit to the first nine principal components of daily 500 hPa geopotential height from NOAA-CIRES-DOE Twentieth Century Reanalysis (V3) data set (Slivinski et al., 2019) between 180-100°W and 30-60°N (i.e., the Pacific/North American sector) from 1950 to 2017.The NHMM is forced with the four reconstructed principal components (PC WR ) that overlap the same time period, defining a time-varying transition probability matrix shown in Equation 1: Here, the transition probability from WR i to WR j at time t is conditioned on a vector of daily covariates developed by repeating the annual values of each PC WR for each day of the year.These covariates (Level 1 in Figure 3) are used within a multinomial logistic regression with intercepts β 0j,i and coefficients β j,i to define the transition probabilities, with a prime denoting the vector transpose.The fitted multinomial regression can be used to estimate the time-varying transition probabilities and simulate WRs across the entire 600-year period over which reconstructed values of PC WR are available.More information on the NHMM can be found in Text S1 in Supporting Information S1.We use this method to create a 50-member ensemble of daily, 600-year WR time series (Level 2 in Figure 3; convergence plots of corresponding streamflow available in Figure S1 in Supporting Information S1).

Generation of Local Surface Weather Conditioned on WRs
Time series of daily surface weather are generated based on the simulated time series of WRs (Level 3 in Figure 3).Here, observed daily precipitation, minimum, and maximum temperature are taken from the 1/16° resolution gridded meteorological data set of Livneh et al. (2015) for water years (WY) 1950-2013.These historical weather data are block bootstrapped based on the sequence of simulated WRs to create new sequences of weather.For example, if the NHMM simulates a sequence of n consecutive days in WR i, an n-sized block of surface weather is resampled from the historical period that is also in WR i and that meets two other criteria: (a) the chosen historical block falls into a 2-week window around the simulated day of the year; and (b) the day prior to the historical block is in the same precipitation state as the simulated day (wet or dry).The weather generator is run simultaneously across the five basins to create internally consistent (i.e., spatially correlated) weather across the region.Given their large synoptic scale, we use the occurrence of an AR (taken from Gershunov et al., 2017) to represent a common precipitation state across basins.The process is repeated for each ensemble until a full sequence of 600 years of daily minimum and maximum temperature and precipitation has been generated.
Thermodynamic changes, in the form of shifts in temperature and precipitation scaling with warming, are imposed after surface weather is generated.
Step changes in temperature between 0 and 4°C are added to each grid cell's simulated temperature.Quantile mapping is used to scale the precipitation distribution with warming, whereby the upper tail (99.9th percentile) of the non-zero precipitation distribution is made more intense, the lower tail of non-zero precipitation is suppressed downward, but the mean of daily precipitation is left unchanged.This scaling reflects an intensification of the precipitation regime and is consistent with GCM-based projections of future precipitation in California (Michaelis et al., 2022).We consider 5 different scaling rates equivalent to 0X (0% °C −1 ), 0.5X (3.5% °C −1 ), 1X (7%°C −1 ), 1.5X (10.5% °C −1 ), and 2X (14% °C −1 ) the Clausius-Clapeyron (CC) scaling rate, which dictates how the moisture holding capacity of the atmosphere scales with warming.That is, the 99.9th percentile of non-zero precipitation is scaled up by either 0% °C −1 , 3.5% °C −1 , 7% °C −1 , 10.5% °C −1 , or 14% °C −1 , while the lower body of the distribution is scaled down accordingly to maintain the same distribution mean.The range of selected scaling rates are derived from observational and model-based studies that most often indicate extreme precipitation-temperature scaling at the 1XCC rate, but occasionally suggest the possibility for sub-CC (0X, 0.5X) or super-CC (1.5X, 2X) scaling rates due to interactive effects between enhanced specific humidity and storm dynamics (Ali et al., 2022;Martinkova & Kysely, 2020;Michaelis et al., 2022;Sun & Wang, 2022;Wasko et al., 2018).
These scaling rates are combined with the different scenarios of warming, so that precipitation scaling is tied to the imposed temperature scenario and respects the thermodynamic mechanism that drives precipitation change.
We consider five scenarios of warming, including 0, 1, 2, 3, and 4°C above the climatological average.This range of warming was inferred from an ensemble of CMIP6 mid-century (2015-2050) projections over central California under the SSP2-4.5 scenario, taken from CarbonPlan (see Figure S2 in Supporting Information S1; Chegwidden et al., 2022).All together, we develop 25 different scenarios of climate change (five temperature scenarios and five scaling scenarios), with each scenario containing 50 ensemble members (i.e., 50 stochastic 600-year time series of precipitation and temperature), in addition to a baseline ensemble with no changes imposed.Technical details on the quantile mapping procedure, and other details of the stochastic weather generator, are provided in Najibi et al. (2021) and Steinschneider et al. (2019).

Generation of Regional Streamflow Through Process-Based Hydrologic Models
Surface weather ensembles are used to simulate daily streamflow ensembles at the mouth of each of the five San Joaquin subbasins using the SAC-SMA (Burnash, 1995) coupled with a SNOW-17 model (Anderson, 1976).The models, documented in Wi and Steinschneider (2022), are spatially distributed and utilize a Lohmann routing model (Lohmann et al., 1998) to trace runoff from hydrologic response units through each river channel.The SAC-SMA models are calibrated using a pooled calibration approach (Wi et al., 2015) based on the average Nash Sutcliffe Efficiency across the five subbasins simultaneously.Calibration and evaluation was based on historical Full Natural Flows (FNF) between WY 1989-2013, acquired from California Data Exchange Center (CDEC) FNF stations that lie at the outlet of reservoirs shown in Figure 1: Don Pedro Reservoir (Tuolumne River; CDEC ID: TLG), Friant Dam on Millerton Lake (San Joaquin River; CDEC ID: MIL), New Exchequer Dam on Lake McClure (Merced River; CDEC ID: MRC), New Hogan Dam on New Hogan Lake (Calaveras River; CDEC ID: NHG), and New Melones Dam on New Melones Lake (Stanislaus River; CDEC ID: NML) (G.Huang & Kadir, 2016).FNF is unimpaired and representative of natural water production of a river basin unaltered by upstream human modifications.Thus any behavior related to the operations of the dams is effectively removed from this analysis.The models are calibrated over WY 1989-2003and then evaluated across WY 2004-2013.To verify that our streamflow extremes and variance decomposition results are not strongly dependent on the selection of the SAC-SMA model, we also employ the HYMOD conceptual hydrologic model (Moore, 2007) specifically in the Tuolumne Basin.Our primary results will be presented using the SAC-SMA model but more detailed analysis of how hydrologic model selection impacts the estimates of flood and drought metrics as well as their partitioning of variance is provided in Text S2 in Supporting Information S1.More information about the calibration process and parameter values for all hydrologic models can be found in Wi and Steinschneider (2022).

Metrics of Hydrologic Extremes
A series of flood and drought metrics, described below, are calculated for each ensemble member and each climate scenario and across two time horizons: 30 and 100 years.As stated in the latest update to the Central Valley Flood Protection Plan (CVFPP), the state of California is actively prioritizing investments in flood management over a 30-year planning horizon (California Department of Water Resources, 2022).A 100-year planning horizon is not actively used in the CVFPP, but it represents a time scale relevant to longer term major infrastructure investments.Further, it allows the exploration of the longer climate time horizon drivers.We partition the variance of each metric between the drivers of climate change and natural climate variability using the ensemble of scenarios described above.Appendix A contains a glossary with commonly used terms that are referred to through the methods.Appendix B contains a summarized list of all of the flood and drought metrics used in this study, including their decision relevance.

Flood Metrics
Flows associated with a 10-year and 100-year return period are used as flood metrics in this study.The 100-year floodplain currently drives larger riverine infrastructure development and flood risk management in California (California Department of Water Resources, 2022).Though not as common for current planning and management in California, the 10-year return period flow captures risk to smaller floodplains and drives smaller investments (California Department of Water Resources, 2006).The decadal and centennial flood are estimated by fitting a generalized extreme value (GEV) distribution to the 3-day annual maxima at each CDEC gauged location in the five subbasins.The 3-day flood was chosen because it is a common metric used in flood risk assessments in California (  Brekke, & Pruitt, 2010;Maurer, Hidalgo, et al., 2010), and because it better captures the concurrence of flooding across multiple basins (described further in Section 2.5.3).For each ensemble member, we fit the GEV distribution for the whole 600-year paleo-period as well as across smaller 30-year and 100-year moving windows.

Drought Metrics
There is no state statutory definition of drought since it can be classified differently across impacted sectors and stakeholders.Historical hydrologic droughts have been traditionally identified based on a combination of metrics that capture the magnitude and duration of water deficit at key reservoirs (California Department of Water Resources, 2015).Since we develop metrics for gauged locations near these reservoirs, we opt to use a more generalized Standardized Streamflow Index (SSI) to quantify hydrologic drought (Vicente-Serrano et al., 2012).
To calculate the SSI, daily simulated flows are first aggregated to a monthly time step.We then use a flexible non-parametric empirical method to estimate non-exceedance probabilities using the Gringorten plotting position (see Farahmand & AghaKouchak, 2015).To create the SSI, the associated non-exceedance probabilities are passed through the quantile function of the standard normal distribution, resulting in a series with an assumed mean of zero and standard deviation of one.We then use the SSI index to define three drought metrics, following McKee et al. (1993): 1. Drought Occurrence: The number of months characterized by an SSI value less than −1, divided by the total months in the window over which the metric was calculated.An SSI value of less than −1 captures moderate to severe drought hazard.2. Drought Intensity: The minimum SSI value in the moving window.3. Drought Duration: The maximum number of consecutive months with an SSI below −1.5 in the moving window.An SSI value of less than −1.5 captures severe drought hazard.
The SSI index is calculated for each ensemble member and climate change scenario, and the metrics are reported across 30-year and 100-year moving windows.

Copula-Based Flooding Metrics
The San Joaquin basin is a key component in the state's comprehensive water delivery system, and a levee breach due to compounding flooding across subbasins in the region could disrupt the lives of thousands of people, including underrepresented and vulnerable stakeholders like farmworkers as well as deliveries of irrigation water to 3 million acres of farmland in the Central Valley (Taylor, 2017).Thus, we develop a spatially-compounding flood metric to capture this hazard.As discussed in Zscheischler et al. (2020), spatially compounding flood hazard can be characterized using an n-dimensional Gaussian copula that defines a metric of joint flood hazard across n basins simultaneously.Let x t,1 , …, x t,n be the annual maxima of 3-day mean streamflow in each of the n basins in year t.We first fit GEV distributions to the individual 3-day annual maxima for each basin (i = 1, …, n).The 3-day annual maxima in each year t are then transformed to be uniform pseudo-observations,   =  −1  () , where   −1  is the inverse cumulative distribution function (CDF) of the fitted GEV distribution for basin i.These pseudo-observations are used to evaluate the joint CDF of the flood data based on a Gaussian copula: Here, ϕ −1 is the inverse CDF of the standard normal distribution and Φ n (•|Σ) is the multivariate normal CDF with zero mean and correlation matrix Σ, which is set equal to the Spearman rank correlation matrix for 3-day annual maxima across subbasins.Using the fitted copula, we can then calculate the joint probability that multiple subbasins experience flooding above some threshold.For example, consider two subbasins with 100-year flood magnitudes of x 1 and x 2 , respectively, inferred from their fitted (GEV) marginal distributions.Then, the probability that both subbasins simultaneously experience floods that exceed the 100-year flood is equal to (Zhang & Singh, 2019): Similar calculations are available to evaluate the probability that three or more subbasins experience flooding above set thresholds.These probabilities can be used directly as a metric of joint flood hazard, and we can partition the variance of this metric between climate changes and natural variability across our ensemble and for 30-year and 100-year moving windows.

Analysis of Variance in Hydrologic Metrics
We use an ANOVA to partition the variance in the hydrologic flood and drought metrics above into components attributable to different sources of variation.A two-way ANOVA was used to determine the uncertainty in hydrologic metrics attributable to uncertainty in temperature change (T), precipitation scaling rate (P), their interactions, and uncertainty in metrics attributable to natural variability.The temperature change factor has i = 1, …, 5 levels (0, 1, 2, 3, 4°C), and precipitation scaling factor has j = 1, …, 5 levels (0%, 3.5%, 7%, 10.5%, 14% per °C).
For each combination of levels, there are 50 stochastic realizations of the metric of interest.The linear model on which the ANOVA is based is given as: where x(i, j, s) is the hydrologic metric for a given level i and j of factors T and P, respectively, and a given ensemble member s.The grand mean for the metric x across the entire ensemble is μ; α(i) equals the average deviation in x from μ for ensemble members with temperature changes at level i; β( j) equals the average deviation in x from μ for ensemble members with precipitation scaling rate at level j; γ(i,j) TP is the interaction term between temperature change and precipitation scaling; and ɛ(i, j, s) is the error term, which is used here to represent natural variability in the metric not explained by the different climate change factors.The total sum of squares SS total expresses the total variation in the hydrologic metric x, and is comprised of the sum of variation attributable to temperature change (SS T ), precipitation scaling rate (SS P ), their interaction (SS Int ), and natural variability (SS ɛ ): The fraction of variance attributable to each source is calculated by dividing each component by SS total .This fraction of attributable variance is calculated separately in 30-year and 100-year rolling windows for each of the metrics above.

Results
The results of this work are presented as follows.First, Section 3.1 shows a comparison of the variability in the paleo-informed streamflow with events from the available observed historical record.Then, Section 3.2 shows the flood and drought extremes reconstructed for the baseline scenario (i.e., influence of natural variability alone).Section 3.3 demonstrates how the imposed climate changes affect those extremes.Section 3.4 demonstrates the variance partitioning of extremes across climate change and natural variability.A more detailed evaluation of the stochastic weather generator's performance is presented in Supporting Information S1 (see Figures S3-S7 in Supporting Information S1), which demonstrates how well the generator captures characteristics of precipitation and minimum and maximum temperature.

Paleo-Informed Streamflow Characteristics
A broad amount of variability is attained in the streamflow ensembles at Don Pedro when SAC-SMA is forced with paleo-reconstructed weather in the Tuolumne Basin (Figure 4). Figure 4a focuses on 7-day flows and the lower tail of the distribution and Figure 4b zooms in on the upper tail distribution of 3-day flows.Each gray line represents sorted flow volumes across 30-year chunks of the paleo-reconstruction across all 50 ensemble members.These volumes are compared with those that come from forcing the generator over the modern period  with historical Livneh precipitation and temperature data (red line).Key events from the observed record are annotated as colored horizontal lines.Overall, the paleo-informed streamflow envelopes and expands upon the historical SAC-SMA model flows by creating instances of wetter 3-day flows and drier 7-day flows.Furthermore, the paleo-ensemble is characterized by drier events than key drought periods from the observed record as demonstrated in Figure 4a.The generator is unable to create 3-day flows that reach the peak of the 1997 New Year's flood period due to underestimation of precipitation associated with this storm that is a known error in the Livneh data set (Pierce et al., 2021).In turn, models conditioned on the Livneh data set tend to underestimate the flows associated with this event.However, the inclusion of the paleo-reconstruction allows the generator to create flows that far surpass the magnitude of peak flows associated with the 1995 and 2017 floods.Overall, the expanded envelope of daily scale streamflows enabled by the paleo-reconstruction provide a rich context for exploring plausible flood and drought extremes in the Tuolumne Basin.Similar results are demonstrated for the rest of the San Joaquin River basins, particularly in capturing drought dynamics (Figure S9 in Supporting Information S1).The generator conditioned on the Livneh data set suffers from the same difficulty of capturing the 1997 flood peak flows; however, in some locations like Merced and Millerton, the paleo-conditioned generator provides extended variability that can help overcome these limitations (Figures S9b and S9d in Supporting Information S1).New Hogan Lake is the only gauged location in which the Livneh-conditioned model can capture the 1997 flood peak flows, but this is primarily because the associated peak flows were not as extreme in this region relative to other notable flooding events.Of the five basins, capturing dynamics in the Tuolumne is the most challenging; it is also representative of high-elevation basins that exhibit rich snow dynamics.Thus, we proceed through the rest of the results with a focus on the Tuolumne Basin, though corresponding figures for the rest of the basins can be found in Supporting Information S1.Section 3.2 further elaborates on the value of the paleo-forced generator and its representation of key flood and drought metrics through the reconstruction.

Individual Basin Flood Hazards
The individual Tuolumne subbasin flood hazard is quantified based on the 10-year and 100-year flood events associated with 3-day annual maximum flows, calculated using a GEV distribution fit to 3-day maxima in each subbasin and with two moving windows of length 30 and 100 years.Figures 5a and 5c show these return levels at the Don Pedro gauge in the Tuolumne Basin using a 30-year moving window.The return levels are calculated for all ensemble members of the baseline generator, where the solid line represents the mean return level across the ensemble members and the shading represents the 5th/95th percentiles.Figures 5b and 5d are non-exceedance plots of the 3-day annual maxima across the extent of the paleo-reconstruction ensemble.The dashed black line represents the 3-day annual maxima associated with the 10-year and 100-year return period events as derived from the SAC-SMA model forced with Livneh historical precipitation and temperature that overlaps with the observed record .In order to facilitate the most equivalent comparison between the two data sets, each gray line represents the sorted 3-day annual maxima volumes over sets of 30-year segments of the paleo-reconstruction and across all 50 ensemble members.
The return levels in Figures 5a and 5c both show clear peaks centered around 1600 CE, which highlights a prominent pluvial period in the region's past hydroclimate.This pluvial is represented in the original WR reconstruction from Gupta et al. (2023a) and broadly confirmed by other reconstructions (D'Arrigo & Jacoby, 1991;Dettinger & Ingram, 2013;Schimmelmann et al., 1998;Stahle et al., 2007).Dettinger and Ingram (2013) have also reconstructed pluvials around 1750-1770 CE and 1810-1820 CE, and while less pronounced than the 1600s pluvial, both panels (a) and (c) show increases in 3-day annual maxima during these times.When compared to the model-based modern hydrology (dashed black line), both figures suggest that return levels in the most recent 30-year period are lower than those that have been experienced in prior centuries of the paleo-period reconstruction.Panels (b) and (d) show the modern estimates of the 3-day annual maxima for the 10-year and 100-year events respectively, in comparison with the extent of the 3-day annual maxima created by the paleo-informed generator.The ensemble from the generator encompasses the modern estimates of the return levels and also provides many instances of more extreme flooding events, which provides additional challenging flood scenarios that can be used to understand the vulnerability of water systems in each of the San Joaquin Valley subbasins explored in this study.The rest of the gauged locations display similar 3-day annual maxima dynamics, though the magnitude of the flows differs across all basins and return periods (Figure S10 in Supporting Information S1).Lower peak flows tend to be associated with basins that are smaller in area, elevation, and slope (i.e., New Hogan Lake, Table S1 in Supporting Information S1).The ensemble member spread also tends to be larger for the more extreme and uncertain 100-year flood event.Clear non-stationary tendencies are exhibited in the representation of the 10-year and 100-year event across the reconstruction that have large implications for hazard characterization (panels (a) and (c)).For example, the flow volumes associated with the 10-year event during the 1600s wet period are within range of the 100-year event flows during the 1500s megadrought period.Thus, what may be considered a 10-year flood event in one wet period transitions to be a 100-year event in a dry period.This extent of variability uncovered in the flood metric demonstrates that using only the modern record to define design flood events could severely under-represent flood hazard in the San Joaquin Valley region and that defining hazard based off of the 10-year and 100-year flooding events has drastically changed over time.

Individual Basin Drought Hazards
Figures 6a, 6c, and 6e show the three SSI-based hydrologic drought metrics (occurrence, duration, and severity) calculated across a 30-year moving window for the period of 1400-2017 for the Don Pedro gauge in the Tuolumne River Basin.Figures 6b, 6d, and 6f are non-exceedance plots, where each line corresponds to the sorted drought metric values derived across the whole reconstructed 617-year record length for each of the 50 ensemble members.The dashed line represents the respective metric values derived from the SAC-SMA model flows forced with Livneh historical precipitation and temperature across the length of the modern record.Similar to the flooding metrics in Section 3.2.1, the drought metrics exhibit clear decadal-scale variability that is also present in the original WR reconstruction from Gupta et al. (2023a).For example, declines in drought occurrence, severity, and duration are seen during the early 1600s pluvial, while these drought characteristics become more intense during the 1500s megadrought that lasted from the middle of the century to the late 1580s (Figures 6a, 6c, and 6d; Stahle et al., 2007).The rest of the San Joaquin subbasins display this key behavior as well (Figure S11 in Supporting Information S1).The drought metrics reveal a slight long-term trend toward higher drought occurrence, longer duration, and more intense drought severity through the last three centuries of the reconstruction.This trend could, in part, be driven by key persistent drought periods that occurred in the mid to late 1800s (1856-1865, 1870-1877, and 1890-1896;Herweijer et al., 2006), the 1900s (the Dust Bowl in the 1930s and drought periods in the 1950s and late 1980s; Stahle et al., 2007) and the most recent 20-year drought periods in the 2000s.The black line demonstrates drought occurrence and severity that is on par with the late 1500s megadrought, though exhibiting a slightly shorter duration than a large section of the paleo-reconstruction.The shorter drought duration is likely due to the sporadic periods of wet weather that have characterized the most recent 30-year period, including the early 1980s and late 1990s (M.Dettinger & Cayan, 2014) and periods after each drought instance in the 2000s.
Panels (b), (d), and (f) compare the modern drought metrics to those calculated from the paleo-reconstructed ensembles.The ensembles encompass the modern estimates and also provides many traces that are characterized by more frequent, longer, and severe drought.The plausibility of the San Joaquin Valley subbasins confronting drought conditions that extend well beyond those that have been experienced in the modern observed record captured in Livneh forcing data is significant even in the absence of climate change.The traces in panels (b), (d), and (f) emphasize the need to better characterize the subbasin vulnerabilities for the challenging drought conditions that are captured within the reconstruction.

Joint Flood Hazard Across Basins
Gaussian copulas were fit to the 3-day annual maxima flows for multiple combinations of basins to characterize joint flood dynamics.The joint probability of flows at Don Pedro in the Tuolumne Basin and at Millerton Lake in the San Joaquin Basin simultaneously exceeding their respective, GEV-based 100-year flood estimates from the most recent 30-year period from 1987 to 2017 was calculated for the length of the reconstruction.Figure 7a shows the expected return period associated with those probabilities.Figure 7b includes New Melones Lake into the joint probability estimation.The return periods are calculated using a 30-year moving window across the entire reconstruction.Panels (c) and (d) are non-exceedance plots of the respective return periods across the extent of the paleo-reconstruction ensemble.The dashed black line represents the return periods for the 10-year and 100-year flood derived from the SAC-SMA model forced with Livneh historical precipitation and temperature.As with the flood metrics, in order to facilitate the most equivalent comparison between the two data sets, each gray line represents the sorted return periods for 30-year segments of the paleo-reconstruction and across all 50 ensemble members.
There is a strong increase in the likelihood of simultaneously exceeding the recently observed historical estimate of the 100-year event, particularly during the 1600s wet period (∼20% increase in likelihood; Figure 7).That is, the expected frequency of occurrence of simultaneous 100-year flooding events at both Don Pedro and Millerton Lake jumps to once every 320 years, as compared to once every 405 years in the most recent 30-year period.There is also a significant decline in the likelihood of joint flooding during the late 1500s megadrought.
When an additional basin is introduced into the copula-based metric, the overall temporal dynamics are similar (Figure 7b), but the expected return period increases significantly.That is, the likelihood of simultaneously exceeding historical flooding thresholds rapidly declines as more basins are considered.During the 1600s wet period, the expected frequency of occurrence of simultaneous 100-year flooding events at Don Pedro, Millerton Lake, and New Melones Lake jumps to once every 450 years, as compared to once every 507 years in the most recent 30-year period.For both joint flood metrics, the paleo-reconstruction effectively bounds the modern estimation of the return periods, which provides a richer space to characterize joint flood hazards across the subbasins (Figures 7c and 7d).
Similar non-stationary dynamics as observed in the flooding metrics in Section 3.2.1 are apparent in these joint flooding metrics as well.Simultaneous flooding in all three subbasins is rarer and more consequential for water systems planning and management than simultaneous flooding at Don Pedro and Millerton alone.Through the paleo-reconstruction, there are periods (like the 1600s wet period) where the likelihood of flooding in the three locations becomes just as common as flooding at Don Pedro and Millerton alone (around the late 1500s megadrought; Figures 7a and 7b).The additional variability that the reconstruction provides demonstrates how dramatically the return periods associated with these consequential events changes over time, particularly how these flooding events can become more frequent.Once again, using the modern record to quantify joint hazard across these subbasins could severely underrepresent flood hazards and the magnitude of design events.

Changes in Individual Basin Flood Hazard
Figure 8 shows the effect of thermodynamic climate changes on the 100-year, 3-day flood event at Don Pedro in the Tuolumne basin calculated across 30-year moving windows.The flow volumes are represented as deviations from the baseline reconstruction which is shown as a gray dashed line at 0. A modern baseline is placed as a dashed black line and is representative of the difference between the modern and the largest 100-year flood event volume calculated across the reconstruction.Figure 8a shows scenarios where the precipitation scaling rate is kept at 7% °C −1 while temperature is increased by 1, 2, and 3°C, while Figure 8b shows scenarios where the temperature trend is maintained at 1°C and the precipitation scaling rate is increased to 0% °C −1 , 7% °C −1 , and 14% °C −1 .Both increasing precipitation scaling rates and temperature trends shift the 100-year flood peak flows upwards, though temperature trends have a stronger impact.For reference, the volume differential between the extreme scenarios in Figure 8a is equivalent to about 100 Oroville Dams worth of water.Conversely, the maximum volume differential associated with the precipitation scaling in Figure 8b is equivalent to 33 Oroville Dams worth of water.The Tuolumne basin is a snow-dominated basin, and consequently it is not unexpected that the results suggest a greater influence on 100-year flows resulting from increasing temperature rather than increased precipitation scaling.Increased temperature shifts drive increased snowmelt and rain on snow events that promote greater flood volumes.

Changes in Individual Basin Drought Hazards
Figure 9 shows how the same thermodynamic scenarios imposed in Section 3.3.1 influence drought occurrence at Don Pedro, measured in terms of a change in the percent of the 30-year window that is classified to be in drought conditions with respect to the baseline scenario (gray dashed line at 0).A modern baseline is placed as a dashed black line and is representative of the difference between the modern drought occurrence line from Figure 6a and the worst drought occurrence metric calculated across the reconstruction.An increase in each of the thermodynamic mechanisms tends to increase the percentage of the window classified in drought.A comparison across Figures 9a and 9b show the larger impact of temperature trends on increased drought occurrence (reaching up to 5% or an additional 18 months classified in drought) by way of increased evapotranspiration.Precipitation scaling stretches the daily precipitation distribution which can lead to tail influences that impact the total number of drought months, but has a lower relative influence (reaching up to 1.8% or an additional 6 months classified in drought).For example, there are some instances, particularly in the 1T, 1xCC scenario in Figure 9a that result in values that approach the baseline.This is likely due to the precipitation scaling mechanism causing some months to have an increased SSI above the drought threshold that offsets the temperature increase.However, as the temperature shift further increases, this effect is dominated.Similar results are seen for drought severity and dura tion (Figure S12 in Supporting Information S1).Overall, there is a greater influence from increasing temperature trends on increasing drought severity and duration.It's worthwhile to note that the impact  from both temperature trends and precipitation scaling is relatively small (Figures S12c and S12d in Supporting Informa tion S1) with respect to increasing consecutive months classified in severe drought and these results are further reflected in Figure 12.

Joint Flood Hazard Across Basins
Figure 10 shows how similar thermodynamic scenarios influence joint flood hazard at Don Pedro (Tuolumne Basin) and Millerton Lake (San Joaquin Basin), measured in terms of change to return period associated with the 100-year event with respect to the baseline scenario (gray dashed line at 0).As with the prior sections, a modern dashed black baseline is included to represent the difference between the modern return period estimate and the lowest return period calculated across the reconstruction.Much like for flooding events in individual basins, increasing temperature trends have a larger influence on making compound flooding events more likely (Figure 10a).Given that Don Pedro and Millerton Lake are both in snow-dominated basins, temperature trends create similar snowmelt effects that lead to simultaneous flooding events.Precipitation scaling has a relatively reduced, but non-trivial effect (Figure 10b).The greatest influence from precipitation scaling is observed under higher imposed temperature trends (we use a constant 3°C temperature trend in this example).While an increase in precipitation scaling increases the likelihood of flooding in any given basin (Figure 8b), it decreases the likelihood of joint flooding, and makes the events rarer by increasing the return period (Figure 10b).Since the imposed precipitation scaling non-linearly adjusts peak flows, it ultimately leads to a decrease in correlation in flows across the two basins and therefore a decrease in joint flooding tendencies.

Variance Partitioning of Hydrologic Extremes
The results above show how different metrics of hydrologic extremes vary significantly over time due to natural climate variability as well as different mechanisms of climate change.Below we use variance partitioning to assess the relative importance of these competing factors.

Relative Variance Contributions for Individual Basin Flood Hazard
We conduct an ANOVA to partition the variance of the 10-year and 100-year 3-day floods for each gauged location.Figure 11 shows the results for Don Pedro, while results for the other sites are shown in Figures S13-S16 in Supporting Information S1.The columns show the results of the decomposition when flood metrics are derived with a 30-year, 100-year, and 617-year (whole record) time horizon, respectively.Two main insights emerge from Figure 11.First, natural variability is the primary driver of the variance when the flood metrics are calculated using a 30-year time horizon (Figures 11a and 11d).This is especially true for the 100-year flood, where approximately 70% of the variance in this metric is associated with natural variability.Figure 11d has direct relevance to the design standards actively used to inform California's flood planning and management.However, the influence of natural variability on the spread in flood metrics across the ensemble substantially decreases when the metric is calculated across a 100-year time horizon (Figures 11b and 11e), and becomes almost negligible when calculated over the entire 617-year period (Figures 11c and 11f).This suggests that the time horizon over which the flood metrics are calculated highly influences the perception of key drivers.A longer time horizon more clearly captures the effects of longer-term climate change on the variation in the flood metrics, while during shorter windows the variation in flood metrics across the ensemble is more likely to capture noise associated with natural variability.The reasons for this are twofold.First, when the time horizon is large, each ensemble member for a particular climate change scenario contains many annual maxima that are all drawn from the same underlying climate state, helping to converge design event estimates across ensemble members toward similar values.Second, when the time horizon is large, there are more opportunities for climate change signals to influence the distribution of annual maxima flows for all ensemble members under a given climate change scenario, which will help separate the distribution of annual maxima across the different scenarios.Together, these two factors will lead to more variance in the overall ensemble being explained by the climate change scenarios compared to natural variability.
Of the thermodynamic changes, temperature trends are the primary driver of variation in peak flows, followed by precipitation scaling.This result, also seen in Figure 8, suggests that temperature increases that lead to increased snowmelt and rain on snow events influences peak flows in the region more than increases in extreme precipitation due to increased moisture in the atmosphere.The interactions between the two drivers generally accounts for a smaller percentage of the variance, but as the time horizon increases, interactive effects are close to the same magnitude as precipitation scaling (16% vs. 24% for the whole period).This result highlights how the effects of precipitation scaling are dependent on the temperature increase, because precipitation scaling is parameterized as a percentage change in extreme precipitation per °C warming.
Figures S13-S16 in Supporting Information S1 show the same results for the remaining four basins.Overall, all locations exhibit similar behavior, where the influence of natural variability decreases with time horizon.Temperature change has a larger impact than precipitation scaling in all locations except for New Hogan Lake GUPTA ET AL. 10.1029/2023EF003909 19 of 29 (Figure S15 in Supporting Information S1).New Hogan Lake is relatively small, has a low elevation, and less snow dominated compared to the other basins (Table S1 in Supporting Information S1), and thus sees a greater influence from precipitation scaling on flood variability.
Overall, the results in Figure 11 portray conflicting storylines and complexity for flood planning and management depending on the way the flood metrics are defined.Under current CA planning conditions (represented in Figure 11d), the greater influence of natural variability on individual flood hazard would suggest prioritizing short-term adaptive tools like seasonal forecasts.However, under alternative planning scenarios that may utilize longer time horizons, infrastructure investments look to be more useful to manage hazards from thermodynamic climate changes.Most importantly, water system planners will need to engage with both drivers; prioritizing longer horizons of focus could neglect the effects of internal variability in the near term, which can lead to magnitudes of peak flows that far surpass those in the modern record (Figure 5).Ultimately, there needs to be consideration of both the exceptional magnitude of internal variability in more immediate decision relevant 30-year timescales while still being cognizant of the longer-term climate changes.Thus, it's important for water resources agencies that utilize dynamic and adaptive planning methods to effectively balance the value, resilience, and potential regrets of near term investments (e.g., Haasnoot et al., 2013;Schlumberger et al., 2022).

Relative Variance Contributions for Individual Basin Drought Hazards
Figure 12 shows the ANOVA decomposition for drought occurrence, intensity, and duration for 30-year and 100-year moving windows, as well as the entire 617-year period.The variance partitioning for drought occurrence follows a similar pattern to the flood metrics above (Figures 12a-12c).For short time horizons of 30 years, about 60%-80% of drought occurrence variability across the ensemble is associated with natural variability.However, as the time horizon grows, more variance is partitioned to the climate changes, and for extremely long horizons, almost all of the variance in drought occurrence across the ensemble is associated with climate change.Specifically, temperature change becomes the near-sole driver of drought occurrence variability, likely because of the strong increases in evapotranspiration with warming that drive drought occurrence.
For drought intensity, we see a similar pattern in variance partitioning between natural variability and climate change factors, but the magnitude and degree of change in the variance partitioning more heavily favors natural variability (Figures 12d-12f).For 30-year windows, natural variability accounts for upwards of 80% of the total variance in drought intensity, and this falls to the (still substantive) value of 28% when the window reaches 617 years.Of the climate changes, temperature trends once again are the main driver, but precipitation scaling and interactive effects also play an important role in drought intensity variability across the ensemble.Given that the mechanism of precipitation scaling stretches the daily precipitation distribution such that large precipitation values become larger and small precipitation values become smaller, we see a more significant influence from this mechanism on drought intensity than on the other metrics.
Unlike the other two drought metrics, drought duration is primarily driven by natural variability, even when the metric is derived across the longest window.Drought duration generally is linked to the length of time in which there is no precipitation.None of the imposed climate changes directly affects this behavior in the same manner that precipitation scaling directly influences drought intensity or temperature trends affect drought occurrence.Temperature increases can somewhat extend drought duration by increasing evapotranspiration at the beginning and end of a drought period (Figure 12h), but ultimately the duration of a drought is dictated by the occurrence of large storms that end the drought, which is primarily driven by natural variability in our climate scenarios.The decomposition results for the remaining four gauged locations are presented in Figures S17-S20 in Supporting Information S1.These gauged locations show similar behavior as the Don Pedro gauge.Temperature trends play a large role in influencing drought occurrence, and this influence is particularly large at Lake McClure (Merced) and New Melones (Stanislaus; S17a, S20a).Precipitation scaling plays a small role in drought occurrence, and drought duration is primarily driven by natural variability.
The drivers of drought are more complex than the flood hazard metrics due to the heterogeneity of behavior across the drought metrics.A comparison between Figures 12a, 12d, and 12g demonstrate vast differences in drivers (and, therefore, approaches for managing drought) depending on exactly what characteristic of drought is prioritized in planning.The choice of time horizon further complicates the understanding of the appropriate planning process, especially in the case of drought occurrence (Figures 12a and 12b).However, drought intensity and drought duration show more stable influence primarily by natural variability and would consequently need a mix of carefully coordinated shorter-term adaptive actions such as water transfers, conservation, or shifts in allocative priorities to higher value uses in the agriculture sector and diversifying of water sources (i.e., deployment of modular and decentralized desalination systems) in urban areas (Zaniolo et al., 2023).In smaller and more rural communities, focuses can be placed on upgrading infrastructure, like fixing aging wells, and providing temporary storage tanks (Dilling et al., 2019).These short term adaptive actions can provide flexibility to improve the robustness of longer-term infrastructure investments to extreme variability in San Joaquin Valley drought regimes (e.g., improved conveyance or construction of new water delivery pipelines, groundwater banking, managed aquifer recharge, among others; Hamilton et al., 2022;Herman et al., 2020).

Relative Variance Contributions for Joint Flood Hazard
Figure 13 shows the variance partitioning for the copula-based joint flood hazard metric in two cases: (a) bivariate flood risk at Don Pedro and Millerton Lake (Figures 13a and 13c); and (b) trivariate flood hazard at Don Pedro, Millerton Lake, and Lake McClure (Figures 13b and 13d), both for the 100-year, 3-day flood.In both cases, the primary driver of joint flood hazard is natural variability.Unlike flood hazard for individual basins (see Figure 11), the contributions of natural variability to the total variance in joint flood hazard does not decline substantially with time horizon.Additionally, as more locations are considered when quantifying joint flood hazard, natural variability becomes an even more prominent driver of spatially compounding major flood hazards.These results suggest that the dominating factor that dictates whether basins experience simultaneous large flooding is largely randomness in storm tracks and the associated spatial distribution of extreme precipitation and temperature-driven snowmelt.The thermodynamic climate changes that influence snowmelt or scale up storms do play a role, particularly if the basins are in close proximity (such as at Don Pedro and Millerton Lake in Figures 13a and 13c).However, as more basins are included, natural variability in the weather during large storms dominates.Figure 13 reveals the inherent challenges of managing for spatially compounding flood hazards in this region.If persistent climate changes are a more dominant factor in driving joint flooding across all basins, then shared investments in canal expansion or rehabilitation across the regions could be used to offset some of this risk.However, since natural variability is the key driver of large flooding, alternative methods of creating unified planning and management strategies again need to be considered, using a mix of carefully coordinated shorter-term adaptive actions that provide flexibility to improve the robustness of longer-term infrastructure investments to the extreme hydroclimatic variability of the Central Valley (Hamilton et al., 2022;Herman et al., 2020).

Conclusion
This study contributes a novel framework to better understand the relative role of natural climate variability and climate change in determining the uncertainty in future hydrologic extremes of great importance to water systems We first demonstrate the utility of the generator forced with paleodata in capturing and expanding on the dynamics of the modern record, which makes it particularly useful for facilitating exploratory modeling and further quantification of the robustness of water resources systems to challenging scenarios that have been seen in the region's past hydroclimate.We also highlight the large non-stationarity that exists in the flood and drought metrics through the length of the reconstruction, particularly taking note of consequential 100-year flooding periods that can become as likely as 10-year events in parts of the record (i.e., 10 times more likely).These results have large implications for commonly employed stationary analyses, such as deriving design event estimates from the modern record, to quantify flood risk in this region.Our results suggest that these techniques severely underrepresent hydroclimatic hazards and the magnitude of design events that infrastructure should be built for.
The results of the variance decomposition component of the study highlight the following main conclusions: • Uncertainty in future flooding within individual basins is largely driven by thermodynamic climate change, especially if evaluated over long time horizons.Flooding within snow-dominated basins is primarily driven by changes in temperature, while lower-elevation basins see a greater influence from precipitation scaling.• The relative importance of climate change and natural variability on the uncertainty in future drought depends on the drought metric of interest.Changes in temperature drive drought occurrence, while precipitation scaling plays a role in drought intensity.Drought duration is primarily driven by natural variability.• The uncertainty in simultaneous flood hazard across multiple basins is largely driven by natural variability, and this influence increases as additional basins are considered.• The perception of the most important driver is highly influenced by the time horizon over which a metric is calculated.Shorter time horizons are less likely to capture how climate change uncertainty influences the uncertainty in hydrologic extremes.
The variance decomposition reveals a complicated path to robust planning and management for both flood and drought in the region.The results suggest that natural variability and climate change influence both extremes to varying degrees.Furthermore, different characteristics of a single extreme (i.e., drought occurrence and duration) can be influenced by different drivers.
Additionally, if different time horizons are prioritized for planning for extremes, the understanding of the most important drivers of flood and drought hazards also changes.This last facet especially presents a problem for adaptive planning and management.This type of planning triggers management decisions based on the evolution of an observed variable (including hydroclimatic variables like precipitation or streamflow) over a specific horizon.As demonstrated in our study, tracking peak flows over a 30-year or 100-year horizon are both appropriate for longer-term flood management, but prioritizing the latter could neglect the effects of internal variability in the near term while increasing the potential for maladaptive longer-lived capital investments in infrastructure.Thus, it's important for water resources agencies that utilize these dynamic planning methods to effectively balance the value and potential regret of near term investments (Herman et al., 2020;Schlumberger et al., 2022).
One of the most important results of our study is that natural variability plays a very large role in dictating the future uncertainty in key metrics of flood and drought that form the basis of water resources planning; at times much larger than that of prominent climate change signals.This suggests that better quantification of the true range of natural variability in these extremes should be a major priority for the climate and hydrologic research community, and equally important, these efforts should directly inform future planning efforts for water resources systems.However, historically, this has often not been the case, with concerns about climate change often overshadowing the potential impacts of natural variability (see discussions in Koutsoyiannis, 2020Koutsoyiannis, , 2021)).
Our results show, in particular, the importance of natural variability to spatially compounding flood hazard, which arguably poses a more difficult and complex management problem than addressing hazards in any one basin due to the need for infrastructure coordination across space and time.This highlights the potential value that longer, paleo-based data could bring to the estimation of joint flood hazards.The field of paleoflood hydrology has historically focused on the identification and dating of flood evidence in fluvial sedimentary archives, but incorporating speleothems and botanical archives can substantially increase the comprehensiveness and quality of paleoflood data (Wilhelm et al., 2018).Alluvial archives are also being used in more densely-populated and flood-prone regions (Toonen et al., 2020), and recent studies have shown that incorporation of these data can significantly reduce the uncertainty of extreme flood estimates (Engeland et al., 2020;Reinders & Muñoz, 2021).Methodological advances that can use these new and diverse data sources to constrain joint flood hazard estimates across sites would be particularly helpful, as would guidance on how to appropriately and consistently incorporate paleodata into risk management practices that also consider the effects of climate change.The work of England et al. (2019) that helped incorporate paleodata into U.S. flood frequency guidance (Bulletin 17C) provides inspiration for such an approach.
The results also highlight the significant impact of natural variability on drought uncertainty, especially drought duration and intensity, and the implications stated above for joint flood hazards also extend to drought hazards.
There are state-of-the-art techniques currently being applied within the dendrochronology community that can help improve our understanding of the natural range of drought variability.Beyond using tree ring widths, some studies are isolating earlywood and latewood signals for better drought reconstruction (Song et al., 2022;Soulé et al., 2021) or using blue intensity (the intensity of reflectance of the blue channel light from a wood core) to identify more stable climate-growth relationships that inform more robust reconstructions (Akhmetzyanov et al., 2023).Furthermore, better forecasts could provide water managers with more effective ways to navigate drought caused by natural variability.Skillful near-term drought predictions have been achieved by using decadal hindcasts from CMIP6 (Zhu et al., 2020) and machine learning based approaches, particularly those that can model catchment memory, are being used to create skillful seasonal drought predictions (Amanambu et al., 2022;Sutanto & Van Lanen, 2022).
One key limitation of this work is that we only consider a subset of plausible climate change scenarios.We focus on step change-based temperature trends and scaling of the 99.9th percentile of non-zero precipitation, though other relevant options that the weather generator can support include imposing linear and quadratic temperature trends, elevation-dependent warming, and scaling of mean precipitation.We recognize the experimental design is not comprehensive, but rather reflects two mechanisms of change that are likely to occur and to be consequential to the San Joaquin Valley in California.This limitation includes the omission of the possibility that properties of long-term climate variability will itself change in the future under climate change (i.e., increased precipitation variability (Pendergrass et al., 2017;Stevenson et al., 2022) and decreased winter temperature variability in the midlatitudes (Screen, 2014)).Another limitation is in our choice of how to represent natural variability in this study.We do so with one statistical model that utilizes a long series of paleodata but others methods exist (i.e., application of moving block bootstrap techniques on the historical record (Barkhordarian et al., 2012;I. Jung et al., 2010;Wilks, 1997)).As others have shown (Koutsoyiannis, 2021), the quantification of natural variability often greatly depends on the statistical model used and thus could lead to different decomposition outcomes.We also generate baseline weather ensembles using a limited record of meteorological observations from 1950 to 2013 that do not include the most recent extremes of the past several years; if included this more comprehensive data set would likely broaden the range of simulated extremes.A final key limitation of our study is that we focus our decomposition on mechanisms of climate change and do not fully consider the effects of hydrologic model choice or parameterization.Our brief analysis in the Tuolumne basin (Figures S8, S21, and S22 in Supporting Information S1) shows non-trivial differences in resulting flow duration curves, but smaller differences in the ultimate decomposition outcomes.However, we utilize two relatively simple and similar process-based models with spatially lumped parameters.If a statistical or deep learning model with a less analogous structure was used (i.e., a regional long short-term memory network), the decomposition might reveal a stronger influence from hydrologic model choice.
While outside the scope of this study, the framework presented and conclusions drawn here would benefit from a direct comparison against a similar approach using a climate ensemble drawn from a GCM, especially a single model initial-condition large ensemble (SMILE; see Lehner et al., 2020).In a SMILEs-based framework, projections of precipitation and temperature derived from a single GCM under multiple initial conditions and multiple emission scenarios could be downscaled and propagated through hydrologic models to create a future streamflow ensemble, which could be used for partitioning variance in hydrologic extremes across emission scenarios and natural variability.By comparing results between the framework of this study and a SMILEs-based framework, one could better understand whether and how the relative roles of natural variability and climate change are consistent or depend on methodological choice.
Regardless of method used, the results of this work strongly suggest that large ensembles of natural variability are likely needed to adequately assess future risks to water resources systems that are particularly sensitive to extreme events.In future work, we intend to pair the hydrologic ensembles developed here with a regional, multi-sector model of California's Central Valley (Zeff et al., 2021) to more fully assess the risk that future hydroclimate extremes pose to stakeholders across the system, including groundwater banks and irrigation districts.The ultimate goal of such work is to facilitate a greater understanding of how future extremes lead to heterogeneous shortage and flooding impacts across stakeholders, and to help identify robust adaptation strategies to address these future risks.

Appendix A: Glossary of Terms
• Baseline weather scenario: The 600-year daily precipitation and temperature scenario that is created by forcing the weather generator with paleo-reconstructed weather regimes.This scenario is comprised of 50 stochastic ensemble members.• Baseline streamflow scenario: The 600-year daily streamflow scenario acquired by driving the hydrologic model with paleo-reconstructed weather (often referred to as 0T, 0CC).This scenario is comprised of 50 stochastic ensemble members.• Climate scenario: A 600-year daily streamflow scenario created by forcing the hydrologic model with a baseline weather scenario that is layered with a set of thermodynamic climate changes.
is the relative contribution of natural variability and climate change to variability in decision-relevant drought and flood metrics for the San Joaquin Valley? 2. How does the selected scale of the time horizon used for analyses influence the perceived importance of these drivers?

Figure 1 .
Figure 1.The study area is comprised of five subbasins within the greater San Joaquin River basin.Temperature and precipitation are developed for gridded locations across the watersheds (listed in parenthesis) and streamflow is developed at the outlet of the reservoirs.

Figure 4 .
Figure 4. (a) 7-day and (b) 3-day flow volumes at the Don Pedro gauge in the Tuolumne Basin derived from the paleo-informed streamflow ensembles compared to the Livneh-forced generator over the modern period.Key events from the observed record are shown as colored lines.Each gray line represents sorted volumes for each year in 30-year chunks of the paleo-reconstruction across all 50 ensemble members.

Figure 5 .
Figure5.Three-day annual maxima associated with the (a) 10-year return period event and (c) 100-year return period event for the Don Pedro gauge in the Tuolumne subbasin calculated in 30-year moving windows and across the time period from 1400 to 2017.The dark green line represents the mean flooding return levels and the shading represents the 5th and 95th percentile confidence bounds.Panels (b) and (d) are non-exceedance plots of the 3-day annual maxima across the extent of the paleo-reconstruction ensemble.Each gray line represents the sorted 3-day annual maxima volumes for each year in a 30-year segment of the paleo-reconstruction.The dashed black line represents the 3-day annual maxima associated with the 10-year and 100-year return period events as derived from the Sacramento Soil and Moisture Accounting Model-simulated peak flows when forced withLivneh historical data (1987Livneh historical data ( -2013)).

Figure 6 .
Figure 6.Standardized Streamflow Index-based hydrologic drought metrics of (a) occurrence (c) severity, and (e) duration for the Don Pedro gauge in the Tuolumne Basin calculated in 30-year moving windows and across the time period from 1400 to 2017.The dark tan line represents the mean drought metric value and the shading represents the 5th and 95th percentile bounds.Panels (b), (d), and (f) are non-exceedance plots of the 3-day annual maxima across the extent of the paleo-reconstruction ensemble.Each gray line represents the sorted 3-day annual maxima volumes across the length of the paleo reconstruction.The dashed black line represents the metric values as derived from the Sacramento Soil and Moisture Accounting Model-simulated peak flows associated with the modern record (1987-2013).

Figure 7 .
Figure7.The expected return periods associated with the joint probability of simultaneously exceeding historical 100-year flood flows at (a) Don Pedro (Tuolumne Basin) and Millerton Lake (San Joaquin Basin), and (c) including New Melones (Stanislaus Basin) calculated in 30-year moving windows across the time period from 1400 to 2017.The dark turquoise line represents the average return period respectively across the ensemble, and the shading represents the 5th and 95th percentile bounds.Panels (b) and (d) show the non-exceedance plots for the return periods derived across the whole paleo-reconstruction in 30-year segments.The dashed black line represents the return periods as derived from the Sacramento Soil and Moisture Accounting Model-simulated peak flows associated with the modern record.

Figure 9 .
Figure9.The effect of increasing (a) temperature and (b) precipitation scaling rates on drought occurrence at Don Pedro (Tuolumne Basin).The dark brown lines represent the increase in the percentage of the 30-year window classified in drought conditions with respect to the baseline scenario (gray line at 0) and the shading represents the 5th and 95th percentile bounds.A modern baseline is included (black line) as a reference and represents the distance from the modern drought occurrence metric to the worst drought occurrence recorded in the reconstruction.

Figure 8 .
Figure8.The effect of increasing (a) temperature and (b) precipitation scaling rates on 100-year, 3-day flood flows at Don Pedro (Tuolumne Basin).The dark green lines represent the increase in mean flooding return levels with respect to the baseline scenario (gray line at 0) and the shading represents the 5th and 95th percentile bounds.A modern baseline (black line) is included as reference and represents the distance from the modern peak flow to the maximum peak flow recorded in the reconstruction.

Figure 10 .
Figure10.The effect of increasing (a) temperature and (b) precipitation scaling rates the change in return period associated with simultaneously exceeding historical 100-year-day flood flows at Don Pedro (Tuolumne Basin) and Millerton Lake (San Joaquin Basin).The dark blue lines represent the change in return period with respect to the baseline scenario (gray line at 0) and the shading represents the 5th and 95th percentile bounds.A modern baseline (black line at 0) is included as reference and represents the distance from the modern return period to the shortest return period recorded in the reconstruction.

Figure 11 .
Figure 11.A decomposition of the key drivers of variance in the flood metrics for the Don Pedro gauge in the Tuolumne River Basin for a (a, d) 30-year time horizon, (b, e) 100-year time horizon and (c, f) a 617-year time horizon.

Figure 12 .
Figure12.A decomposition of the key drivers of variance in the drought metrics for the Don Pedro gauge in the Tuolumne Basin for (a, d, g) 30-year window (b, e, h) 100-year window and (c, f, i) a 600-year window.

Figure 13 .
Figure 13.A decomposition of the key drivers of variance in joint flood metrics for (a, c) Don Pedro and Millerton and (b, d) Don Pedro, Millerton, and Lake McClure.

•
Ensemble member: Also referred to as a stochastic realization; each climate scenario is comprised of 50 stochastic ensemble members • Record length: The total length of the data set -Paleo-informed weather and streamflow data sets: 617 years (1400-2017 CE) at a daily time scale -Observed Livneh climate (temperature and precipitation) data set: 63 years (1950-2013 CE) at a daily time scale -Observed CDEC streamflow data set: 33 years (1986-2019) at a daily time scale • Time horizon: also referred to as moving window; the length (in years) of the sliding window that passes over the total record length.wide definition.Historical droughts have been identified based on a combination of metrics such as reservoir depth and deficit magnitude flow events in n basins n-dimensional Gaussian copula Flooding across the San Joaquin system could result in infrastructure failure such as levee breaks and disrupt deliveries of fresh water to in planning and management, but can represent longer-term investments N/A Gupta et al., 2023a)in components of the weather generator.Boundary forcing variables (here the PC WR fromGupta et al., 2023a)influence the evolution of the discrete weather regime (WR) time series.Surface weather is then conditioned upon the WR time series.Thermodynamic climate changes (temperature increases and precipitation-temperature scaling) are applied to the surface weather post-generation.to force a non-homogeneous hidden Markov model (NHMM), whereby WR states are modeled as a first-order Markov chain with a non-stationary transition probability matrix conditioned on the reconstructed PC WR