The Role of Snowmelt Temporal Pattern in Flood Estimation for a Small Snow‐Dominated Basin in the Sierra Nevada

Prior research confirmed the substantial bias from using precipitation‐based intensity‐duration‐frequency curves (PREC‐IDF) in design flood estimates and proposed next‐generation IDF curves (NG‐IDF) that represent both rainfall and snow processes in runoff generation. This study improves the NG‐IDF technology for a snow‐dominated test basin in the Sierra Nevada. A well‐validated physics‐based hydrologic model, the Distributed Hydrology Soil Vegetation Model (DHSVM), is used to continuously simulate snowmelt and streamflow that are used as benchmark data sets to systematically assess the NG‐IDF technology. We find that, for the studied small snow‐dominated basin, the use of standard rainfall hyetographs in the NG‐IDF technology leads to substantial underestimation of design floods. Thus, we propose probabilistic hyetographs that can represent unique patterns of events with different underlying mechanisms. For the test basin where flooding events are generated entirely by snowmelt, we develop a hyetograph that characterizes snowmelt temporal patterns, which greatly improves the performance of NG‐IDF technology in design flood estimates. In contrast to the standard rainfall hyetographs characterized by a symmetrically peaked, bell‐shaped curve, the snowmelt hyetograph displays a more rapid rise (i.e., greater intensity) and a distinct diurnal pattern influenced by solar energy input. The results also show that the uncertainty of hyetography plays an important role in design flood estimation and can have important implications for future flood projections.

Second, NG-IDF technology has not undergone validation at a basin scale that is more relevant for hydrological design compared to the point or the hillslope scale used in previous studies (Yan, Sun, Wigmosta, et al., 2020;Yan, Sun, Wigmosta, Skaggs, Hou, et al., 2019).Previous NG-IDF studies also assumed uniform bare soil cover conditions, neglecting the complexity of land surface and associated processes that drive runoff in real design problems.Lastly, the uncertainties associated with the W hyetograph selection on design flood estimates are unknown.The shape of hyetographs is subject to uncertainties due to the inherent natural variability of the climate.For instance, the standard NRCS rainfall hyetographs may differ from the corresponding temporal distribution curves derived from analyzing a specific local storm.Additionally, the shape of hyetograph can also vary in response to atmospheric temperature fluctuations.For instance, Wasko and Sharma (2015) indicate that higher temperatures, irrespective of the climatic region or season, tend to result in steeper temporal patterns.Consequently, it is crucial to accurately quantify the impacts of hyetograph shape uncertainties on the estimation of design floods.This paper aims to address these challenges.Specifically, we aim to 1. Propose a general method to develop a hyetograph of W considering snowmelt and/or ROS processes to illustrate the role of snowmelt temporal patterns in estimating design hydrologic extremes.2. Evaluate performances of NG-IDF technology in design flood estimates at a design basin scale following the current TR-55 hydrologic design guideline.3. Quantify the contribution of W hyetograph selection to the uncertainty in NG-IDF design flood estimates.

Methodology
In this section, we introduce the assessment framework of the NG-IDF technology in design flood estimates, followed by detailed descriptions of the study area, data sources, and each framework component.

Assessment Framework
The framework for quantitative assessment of design flood estimates from the NG-IDF technology is presented in Figure 1.First, we generate continuous simulations of streamflow for the basin of interest using the DHSVM model.DHSVM is forced by 15 min meteorological inputs.Using the Mountain Microclimate Simulation Model (Hungerford et al., 1989), daily Livneh meteorological data (Livneh et al., 2013) are disaggregated into 15-min time steps.Simulations from process-based models are used because long-term flow measurements are generally scarce at high-latitude locations and especially for headwater streams that fit the small-scale engineering design basin, due to inherent difficulties of access (Bales et al., 2006;Curran et al., 2016;Lundquist et al., 2016).With no observations found within the basin of interest, DHSVM is calibrated and validated against the available snow and streamflow observations at a higher level of the hydrologic unit (which contains the small design basin).Second, the NG-IDF curves are developed based on the DHSVM simulated annual maximum water available for runoff (W) following the methods detailed in Section 2.5.Meanwhile, flood frequency statistics (e.g., the 10-year flood) are also derived directly from the DHSVM annual maximum streamflow (AM-S) data, which is used as the benchmark to assess design floods estimated using the NG-IDF technology in the last step.Third, based on the DHSVM W time series, hyetographs of W are studied and proposed for NG-IDF curves.Fourth, with a selection of W hyetograph, the derived NG-IDF curves are used to drive the TR-55 event-based model to estimate the associated design floods.Last, the design floods from TR-55 are compared to the benchmark estimates of design floods from DHSVM continuous streamflow simulation with uncertainty quantification.In our frequency analysis of DHSVM streamflow and NG-IDF curves, we utilize the Monte Carlo (MC) method to measure the uncertainty present in the sample data.Further information regarding the quantification of uncertainty is provided in Section 2.7.

Study Area
The Upper West Walker Basin (UWWB) is a subbasin of the West Walker Basin (Hydrologic Unit Code: 16050302) located in the eastern Sierra Nevada Mountains, California (Figure 2a).The UWWB has a drainage area of about 633 km 2 and encompasses an elevation range of about 1,500 m.The primary source of streamflow is derived from the spring snowmelt, which originates from winter precipitation in the form of snow.Runoff from snowmelt flows through the DoD Marine Corps Mountain Warfare Training Center (MCMWTC) to Walker Lake with peak streamflow occurring between April and June (Hatchett et al., 2016).In this study, the assessment of NG-IDF technology focuses on a selected small test basin that drains into MCMWTC, rather than considering the entire UWWB (Figure 2b).We focus on infrastructure safety at the DoD MCMWTC site due to its critical role in force training to defend U.S. national security interests.The test basin has an area of 2.36 km 2 , which aligns with the TR-55 design scale for small watersheds (Cronshey et al., 1986).The test basin has uniform bedrock-derived soil but contains multiple land covers.The dominant land cover is grassland (32%) and forest (31%).The data sources used to derive this information will be introduced in the next section.

DHSVM Hydrological Simulation
DHSVM (Wigmosta et al., 1994) is a physics-based, spatially distributed hydrologic model that explicitly solves water and energy balances for each model grid cell with a spatial resolution ranging from 10 to 150 m.DHSVM includes a two-layer canopy submodel that represents canopy processes such as canopy snow processes and evapotranspiration, a two-layer energy-balance submodel for snow accumulation and melt, a four-layer soil submodel, and three-dimensional surface and saturated subsurface flow routing submodels (Wigmosta et al., 2002).DHSVM was originally developed for simulating hydrologic response in mountainous terrain and subsequent developments have enhanced the snow submodel to better represent climate-forest-snow interactions (Sun et al., 2018;Sun, Yan, Wigmosta, Lundquist, et al., 2022), extended the model capability for simulating urban hydrology and water quality (Cao et al., 2016;Fullerton et al., 2022;Yan et al., 2021), and parallelized the model structure for large-scale application at high-performance computing infrastructure (Perkins et al., 2019).
The required meteorological forcing data for DHSVM include precipitation, air temperature, wind speed, relative humidity, downward solar, and downward longwave radiation.In this study, we ran the DHSVM at a 90 m scale and 15-min time step for 33 years from water years 1981-2011 with a 3-year spin-up period.The 15 min meteorological forcing data were generated by disaggregating the daily Livneh meteorological data (Livneh et al., 2013) using the Mountain Microclimate Simulation Model (MTCLIM) (Hungerford et al., 1989).Subdaily precipitation assumes daily precipitation occurred at a uniform rate throughout the day.Subdaily air temperatures were estimated using third-order Hermite polynomials spline based on daily minimum and maximum air temperatures.Wind speed is assumed to be constant throughout the day.Relative humidity was calculated based on subdaily temperatures, with the assumption that the dew point is equal to the daily minimum temperature.Downward shortwave radiation was calculated based on daily temperature range and dewpoint temperature using the Thornton and Running (1999) algorithm, where dewpoint temperature was estimated based on daily minimum temperature and precipitation.Downward longwave radiation was calculated using the method described by Prata (1996) and is dependent on subdaily temperatures.For more information on the MTCLIM algorithm, refer to Hungerford et al. (1989).In this study, because the meteorological forcing, SWE, and flow measurements are only available at 24-hr scale, we developed IDF curves solely at a 24-hr scale and then utilized the hyetograph to disaggregate the 24 hr precipitation/W into subdaily scales for TR-55 modeling.Moreover, all other meteorological factors influencing snowmelt within this basin exhibit variations occurring at 15-min intervals.Given that this basin is primarily influenced by snowmelt rather than rainfall, the effect of subdaily fluctuations in precipitation on peak flow simulations in the DHSVM model will be negligible.
Other data used for DHSVM model input and parameterization include the U.S. Geological Survey (USGS) digital elevation model (DEM) terrain data (Danielson & Gesch, 2011), the NRCS Soil Survey Geographic Database (SSURGO) soil data (http://soils.usda.gov/survey/geography/),and the Multi-Resolution Land Characteristics Consortium National Land Cover Database (https://www.mrlc.gov/).Because no snow or flow measurement exists within or nearby the test basin, we calibrated and validated the DHSVM model for the UWWB, using daily streamflow data from two USGS gauges (ID 10296500 and 10296000) and daily SWE data from the Sonora SNOTEL site (Figure 2a).Data from the SNOTEL site were first screened following a rigorous three-stage SNOTEL quality control filter (Yan et al., 2018) and subsequently bias corrected for snowfall undercatch (Sun et al., 2019).The resulting SNOTEL data are referred to as bias-corrected quality-controlled (BCQC) SNOTEL data and are available at https://climate.pnnl.gov/?category=Hydrology.DHSVM calibration and validation use the common period of available streamflow, SWE, and Livneh meteorological data from the water year 1984-2011.
The performance of the DHSVM for simulating daily streamflow and SWE is evaluated using three statistical metrics including the root-mean-square-error (RMSE), Nash-Sutcliffe efficiency (NSE) (Nash & Sutcliffe, 1970), and Kling-Gupta efficiency (KGE) (Gupta et al., 2009).The metrics RMSE and NSE focus on the modeling skills of high flow, while the KGE metric is a multiobjective metric that takes into account the water balance, flow variability, and correlation.The value of NSE varies from −∞ to 1 and a value of 1 indicates a perfect fit between observations and simulations.The KGE metric addresses several shortcomings in NSE (e.g., underestimation of the variability) and is now increasingly used for hydrologic model calibration and evaluation (Clark et al., 2021;Knoben et al., 2019;Mizukami et al., 2019).Like NSE, the value of KGE ranges between −∞ and 1, and a value of 1 indicates a perfect agreement between simulations and observation.

TR-55 Event-Based Rainfall-Runoff Modeling
The NRCS TR-55 guideline (Cronshey et al., 1986) provides a standard procedure for hydrologic design at small-scale watersheds.If the watershed is not divided and the channel routing is not taken into account, it is advisable to refrain from using TR-55 for basins larger than 250 km 2 (Ponce & Hawkins, 1996).The model described in TR-55 assumes a rainfall amount uniformly imposed on the watershed over a specified duration.A design storm depth per unit area (selected from IDF curves with a predefined design hyetograph) is converted to a runoff depth using the runoff curve number (CN) approach, which estimates runoff as a function of the antecedent moisture condition (AMC) and watershed physical characteristics (e.g., soil type, vegetation cover).Runoff is then transformed into a hydrograph by using the unit hydrograph (UH) routing method (Mockus, 1957) that depends on the runoff travel time through segments of the watershed (i.e., time of concentration).The dimensionless NRCS UH has two parameters, peak flood and time to peak, which are empirically estimated using the basin area and time of concentration.
The basin average CN is set to 90 which corresponds to the wet AMC based on the CN table of the TR-55 guideline (Table S1 in Supporting Information S1).The physical explanation behind the use of wet AMC is that snowmelt events usually last for days to weeks and are more like to infiltrate soils (except when the soil is frozen), therefore producing a high AMC for runoff generation (Jencso et al., 2009;Yan, Sun, Wigmosta, Skaggs, Hou, et al., 2019).The test basin time of concentration (t c ) is estimated to be 44 min following the TR-55 procedure (Cronshey et al., 1986).The critical design duration approach (Rogger et al., 2012;Yan, Sun, Wigmosta, et al., 2020) is used here to identify potential peak design flood to make a fair comparison with design flood estimates from DHSVM continuous simulations.More specifically, the critical design duration refers to the duration that produces the largest peak flow.In this study, for each selected average recurrence interval (ARI), such as the 50-year event, we calculate the corresponding flood peaks for 24-hr, 48-hr, 72-hr, and 96-hr durations.The duration that yields the highest flood peak is identified as the critical design duration and utilized for the assessments.

NG-IDF Curves Versus DHSVM Design Floods
To estimate design floods from NG-IDF curves, we follow the following steps.First, from DHSVM continuous simulations, we construct the basin mean time series of W with a 15-min interval through mass balance as W = P − ∆SWE + S, where P is precipitation, S indicates condensation or evaporation/sublimation of snowpack, and ∆SWE is the change in SWE.We then aggregated the 15 min W time series for constructing basin-scale NG-IDF curves at selected durations varying from 1 to 4 days (Perica et al., 2013).We did not include the subdaily duration because the input precipitation data has no diurnal variability, and DHSVM was calibrated for daily flow and SWE observations (i.e., to reduce uncertainties in estimated NG-IDF curves).For each duration, the annual maximum (water year) W data set was extracted using a moving window approach.As an illustration, when considering a 24-hr period, a moving window with a size of 96 is utilized to extract 96 sets of 15-min data points.The window advances 15 min at a time to estimate the maximum 24 hr W for a given year.
Following the NOAA Atlas 14 (Bonnin et al., 2011), the generalized extreme value (GEV) distribution was fit to the annual maximum W (AM-W) data set based on L-moments statistics (Hosking & Wallis, 1997).Before the frequency analysis, we used the nonparametric Mann-Kendall test (Kendall, 1975;Mann, 1945) to examine the stationarity assumption (Milly et al., 2008) of the AM-W data set.To investigate the independence and stationarity, we additionally utilized the nonparametric Wald-Wolfowitz test (Wald & Wolfowitz, 1943).The NG-IDF curves were then developed for four selected exceedance probabilities: 0.2, 0.1, 0.04, and 0.02, which correspond to extreme events with ARIs of 5, 10, 25, and 50 years.The ARI was cut off at 50-year to reduce uncertainties in NG-IDF curves from extrapolating longer return periods from 28 years of simulations.For design floods estimated from continuous DHSVM simulations, the approach is similar to the frequency analysis procedure for NG-IDF curve development.We first extracted the DHSVM continuously simulated AM-S (also based on water years) and examined the stationarity assumption using the Mann-Kendall test.We then used the same GEV distribution to fit the data set using the L-moments statistics and estimated the benchmark design floods for the ARIs of 5, 10, 25, and 50 years.It is noted that other methods like peaks-over-threshold (Coles, 2001) or r-largest order statistics (Smith, 1986) can expand the data set and reduce uncertainties in frequency analysis.To eliminate any discrepancies that could arise from dissimilar sample sizes or subjective threshold values in frequency analysis and focus solely on the snow process, we utilized the same GEV distribution for both NG-IDF, PREC-IDF, and flood frequency analysis.Furthermore, our evaluation of floods adhered to standard hydrologic design methods, like NOAA Atlas 14, which employs the GEV distribution.Considering that DHSVM basin-wide continuous flow simulations provide more reliable estimates of design floods, the difference between the two design flood estimates (DHSVM versus NG-IDF) is a good indication of the limitations underlying the NG-IDF technology, which can include simplified physical hydrologic process, assumption of equal ARI between design storm and design flood, and selection of hyetograph.As an example, many studies (Viglione & Blöschl, 2009;Viglione et al., 2009) have shown that flood ARI can be more or less frequent than the corresponding storm ARI depending on the storm duration, watershed time of concentration, and AMC.As previously mentioned in Section 1, although the IDF approach has its limitations, it is still sensible to expect that this method will continue to be essential in hydrologic design, particularly for smaller infrastructure projects, in the near future.
Besides a deterministic assessment of the relative difference between the two design flood estimates, we further provide a probabilistic assessment to test if these differences are statistically significant.A MC simulation suggested by Hosking and Wallis (1997) and NOAA Atlas 14 (Bonnin et al., 2011) was used to quantify the sample data uncertainty (i.e., the uncertainty of GEV parameters).A total of 1,000 synthetic ensembles were generated to quantify the GEV parameter uncertainties associated with NG-IDF curves and DHSVM flood frequency analysis.In this study, we used the "lmom" package (version 2.6) (Hosking, 2017) in "R" (version 3.4.3) to perform all L-moments and MC analyses.The Z statistic (Ganguli & Coulibaly, 2017;Madsen et al., 2009;Mikkelsen et al., 2005;Yan, Sun, Wigmosta, et al., 2020) was used to test the statistically significant differences of the design flood estimates between the NG-IDF technology (q ng ) and DHSVM continuous simulation method (q dhsvm ) where q dhsvm is the design flood estimated from DHSVM flood frequency analysis, q ng is the design flood obtained from the NG-IDF technology, s dhsvm is the DHSVM design flood standard deviation estimated from the 1,000 DHSVM design flood ensemble, and s ng is the standard deviation of the NG-IDF derived design flood, which is estimated from the 1,000 design flood ensemble generated from running TR-55 using each of the 1,000 NG-IDF ensemble members.

Hyetograph of Water Available for Runoff
We developed and compared the W hyetographs generated from two approaches.The first approach follows the NOAA Atlas 14 that develops standard hyetographs for rainfall cases only (i.e., assuming all W events are rainfall events), while the second and new approach introduced here develops hyetographs based on the dominant mechanism of W events.
The NOAA Atlas 14 method (Bonnin et al., 2011) was modified from the methodology originally proposed by Huff (1990).We computed W accumulation for specific periods (i.e., 1-4 days) to be consistent with the durations used in the NG-IDF curves.For each selected duration, the following steps were repeated.First, a moving window approach was used to estimate W accumulation over the selected duration based on the 15 min DHSVM basin mean W time series.The largest three W accumulations were then obtained for each month over the entire simulation period.Following the NOAA Atlas 14, the 2-year ARI W magnitude was used as the minimum threshold to select large W events for developing hyetographs.Different thresholds were evaluated, including the 25-year ARI, and found that the results were comparable to the 2-year ARI.As a result, the 2-year ARI was selected to generate more samples for the development of probabilistic hyetographs.Each event was then converted into a ratio of the cumulative 15 min W to the total W for that duration (i.e., percent of total W), and a ratio of the cumulative time to the total time (i.e., percent of duration).Thus, the last value of the summation ratios is always equal to 100% in the hyetograph.The obtained ensemble large W events were further subdivided into quartiles based on where in the hyetograph (i.e., temporal distribution) most W occurred to provide more specific information on the observed varying hyetographs (Bonnin et al., 2011).For example, the first-quartile data consists of events where the greatest percentage of the total W fell during the first-quartile of the duration, i.e., the first 6 hr of a 24-hr period.For the W events classified for each quartile, we can then estimate the cumulative probability of occurrences (i.e., quantiles).For example, the 10% hyetograph curve (90% quantile) indicates that 10% of the corresponding W events have temporal distributions above the curve.
Contrary to the NOAA Atlas 14 hyetograph method that considers rainfall events only, the new approach developed here takes into account the generating mechanism of W events, including rainfall, snowmelt, and ROS (Yan et al., 2018).As will be shown in Section 3.2, the hyetograph shapes of W driven by different mechanisms show substantial differences.For example, the temporal distribution of snowmelt-driven W events shows an explicit diurnal cycle associated with the diurnal variability of solar radiation, while the temporal distribution of rainfall events presents a symmetrically peaked, bell-shaped curve.Thus, it is questionable to develop a unified W hyetograph that simply combines the temporal patterns of all large W events generated from different mechanisms.A better approach to developing the W hyetograph is first to identify the dominant W generating mechanism for the study basin, and then generate the W hyetograph for events with the same dominant mechanism.Following Yan, Sun, Wigmosta, Skaggs, Leung, et al. (2019) and Sun, Yan, Wigmosta, Coleman, et al. (2022), we classified the daily W time series into three mechanism classes: 1. rainfall dominated: daily W of at least 10 mm and contains <20% snowmelt, 2. snowmelt dominated: daily W of at least 10 mm and the W contains <20% P, and 3. ROS dominated: daily P of at least 10 mm falling on snowpack of at least 10 mm SWE, and W contains at least 20% snowmelt.

Uncertainty in Design Floods
Besides proposing a hyetograph of W, another goal here is to disentangle and quantify uncertainty contributions of the W sample data and W hyetograph selection on the design flood estimates through the standard TR-55 procedure.The individual uncertainty contribution of an NG-IDF sample set or W hyetograph is estimated using a sequential sampling procedure similar to that used by Schewe et al. (2014) and Samaniego et al. (2017).For example, assume we quantify the W hyetograph uncertainty using 50 ensemble members and NG-IDF sample data uncertainty using 1,000 ensemble members.The component of the NG-IDF sample data uncertainty is characterized by calculating the range of design floods across all 1,000 NG-IDF MC samples separately for each selected W hyetograph, which is then averaged over all 50 W hyetograph ensembles.The component of the W hyetograph uncertainty is estimated in a similar fashion.We first calculate the design flood range across all 50 W hyetograph ensembles for each NG-IDF sample set and then average them over 1,000 NG-IDF samples.The above procedure was applied separately to each selected duration (i.e., 1-4 days) and ARI (i.e., 5-50 years).The range statistic is used here to understand the full range of the dispersion.The statistically significant difference between the two averaged range statics is tested using the aforementioned Z statistic.

Results and Discussion
In the following, the analyses performed for NG-IDF technology and DHSVM continuous simulation are reported.We first report the results of the DHSVM evaluation, the development of NG-IDF curves, and DHSVM flood frequency estimations in Section 3.1.Second, we discuss the water available for runoff (W) hyetographs and compare the design flood estimates derived from NG-IDF technology to the corresponding DHSVM benchmark in Sections 3.2 and 3.3.Last, we disentangle and quantify the uncertainty contributions of the W sample data and W hyetograph to TR-55 design flood estimates in Section 3.4.

DHSVM Evaluation and NG-IDF Curves
The historical records for both streamflow and SWE over water years 1984-2011 were split into two periods of 19 and 9 years in length: 1984-2002 for DHSVM calibration and 2003-2011 for validation.Model calibration was conducted manually by comparing the daily simulations with observations of SWE and streamflow, sequentially.Figure 3 presents the DHSVM model performance in streamflow and SWE simulations for both calibration and validation periods.Except for 1 year (1997), when the observed streamflow peak was significantly greater than the simulated value at the USGS gauge 10296000, other periods had a very good agreement between the simulations and observations.In the calibration period, statistical comparisons of measured versus simulated daily values resulted in KGE values of 0.84, 0.82, and 0.74, NSE values of 0.73, 0.72, and 0.77, and RMSE values of 6.25 m 3 /s, 6.25 m 3 /s, and 131 mm for the USGS gauges 10296500, 10296000, and the Sonora SNOTEL site, respectively.The performance of the calibrated model on the validation data set had slightly lower skill, with KGE values of 0.63, 0.82, and 0.76, NSE values of 0.53, 0.68, and 0.78, and RMSE values of 9.28 m 3 /s, 7.94 m 3 /s, and 141 mm for the USGS gauges 10296500, 10296000, and the Sonora SNOTEL site, respectively.The January 1997 observed peak flow in the Walker River was greater than that of previous and subsequent floods.The January 1997 flood was caused by ROS resulting from unseasonably warm rain in the Sierra Nevada.Accurate simulation of ROS flooding is challenging due to various factors such as rain intensity and amount, prevailing freezing level, and spatial distribution of snow cover.Either of these uncertainties could contribute to bias in predicting the peak flow during ROS events (Fehlmann et al., 2019).Although SNOTEL simulation indicated a good match in SWE simulation in 1997, one SNOTEL site may not represent all mountain ranges, and the SNOTEL network is biased toward specific types of terrain and vegetation (Mote et al., 2016).
Interpolated precipitation data at higher mountains may be subject to bias due to limited gauge coverage or gauge undercatch (Groisman & Legates, 1994;Lundquist et al., 2019;Serreze et al., 2001), which could also lead to underestimation of the 1997 flood.We also examined the 28-year annual maximum time series of daily streamflow and W from DHSVM simulations and observations.The mean absolute relative differences for the AM-S are 20.2% and 19.0% at the USGS gauges 10296500 and 10296000, for the AM-W is 19.6% at the Sonora SNOTEL, respectively.Given the hydrologically challenging context, it is good in practice if errors in estimated flood peaks are within 20% of the value derived from validation data (Calver et al., 2009;Yan, Sun, Wigmosta, et al., 2020).In summary, model calibration and validation results gave satisfactory and comparable performances on both streamflow and snow simulations.This validated DHSVM model was used as the benchmark for the following assessment of NG-IDF technology.
After extracting the basin mean AM-W time series, we used the nonparametric Mann-Kendall test to examine the stationarity assumption for frequency analysis.For each selected duration (24-96 hr), no statistically significant trend was identified (p-value >5%).The supplementary Wald-Wolfowitz test (p-value >5%) verified the assumption of stationarity and independence for frequency analysis as well.Figure 4 presents the basin-scale NG-IDF curves for the four selected durations varying from 24 to 96 hr.The associated basin-scale PREC-IDF curves are provided in Figure S1 in Supporting Information S1.The shaded areas characterized the 90% confidence intervals associated with the W sample data uncertainty in frequency analysis.It is observed that extreme events with longer ARIs had larger uncertainties; it comes as no surprise because we used 28 years of data to extrapolate the 50-year events.For example, for a 5-year 24-hr event (i.e., an event with an ARI of 5 years and duration of 24 hr), the range of the 90% confidence intervals was 6.2 mm; while for a 50-year 24-hr event, the range of 90% confidence intervals was 27.7 mm. Figure 5 presents the DHSVM simulated hydrograph and AM-S time series for the test basin.About 93% of AM-S data occurred between February and May, indicating the snowmelt-dominant flood-generating mechanism for the test basin.Based on the nonparametric Mann-Kendall and Wald-Wolfowitz tests, no statistically significant trend was identified for the AM-S time series.After confirming the stationary assumption, we estimated the DHSVM design flood benchmark in addition to their associated uncertainties for NG-IDF assessment.

Water Available for Runoff Hyetograph
Based on the classification criteria described in Section 2.6, we first identified the dominant mechanism for the selected large daily W events.During the 28-year simulation period, we found 125 snowmelt events, 3 rainfall events, and 0 ROS events.For the three large rainfall events, we confirmed that their occurrence dates (e.g., 24 July 1998) were far away from their associated AM-S dates (e.g., 22 April 1998), indicating that the large rainfall events did not lead to an AM-S.Thus, the W hyetograph developed in this study was generated from the snowmelt mechanism only.
Following the procedure used by the NOAA Atlas 14, for each selected duration, we extracted large W accumulations (i.e., >2-year ARI event) estimated from the basin mean 15 min W time series.Any large W accumulation that contained the three rainfall events was removed for the following hyetograph development.As a result, a total of 55, 46, 42, and 34 large W events generated from the snowmelt mechanism only were retained for the 24-hr, 48-hr, 72-hr, and 96-hr duration, respectively.Figure 6 illustrates the ensemble temporal distributions of W events (from snowmelt only), in terms of percent of total W versus percent of duration, at each selected duration.For illustration purposes only, Figure S2 in Supporting Information S1 presents the ensemble temporal distributions of W events including the three large rainfall events for the 24-hr duration case.Contrary to the rainfall temporal distribution, it is observed that the snowmelt temporal distribution showed a more rapid rise (i.e., higher intensity) and an explicit diurnal pattern controlled by solar energy input.For instance, at nighttime, there is no change in the W temporal distribution.Among the three large rainfall events, the largest percentage of the total rainfall fell during the first-quartile of the duration (i.e., first 6 hr of 24-hr duration) was only about 60% (Figure S2 in Supporting Information S1); while for snowmelt events, this value was 100% (i.e., all snowpack melted in the first 6 hr).After acquiring all ensemble temporal distributions of W, we further divided each distribution by quartiles based on where in distribution the most W occurred.Analysis shows that the 24-hr and 48-hr events were dominated in the first-quartile while the 72-hr and 96-hr events were dominated in the second-quartile.About 78% and 50% of the 55 and 46 large W events occurred in the first-quartile for the 24-hr and 48-hr durations, respectively; and about 43% and 50% of the 42 and 34 large W events occurred in the second-quartile for the 72-hr and 96-hr durations, respectively.
Based on the obtained ensemble temporal distributions at each selected duration, we then develop a probabilistic hyetograph of W by estimating the exceedance probabilities of W occurrence (i.e., quantiles) at each time step.Contrary to the probabilistic rainfall hyetograph developed in the NOAA Atlas 14 that combines all large rainfall cases, the unique diurnal pattern associated with the snowmelt-dominant W hyetograph requires special attention.Taking a 24-hr duration event e.g., if all obtained W temporal distributions are divided evenly into the first-quartile and fourth-quartile (e.g., the horizontal lines occur evenly at the 10% or 90% y-axis value), the estimated median (50% exceedance probability) curve will be close to the uniform distribution over the 24-hr duration (e.g., a 1:1 line), which substantially underestimates the snowmelt intensity.To address this issue, we developed the probabilistic W hyetograph based only on the large W events that occurred in the dominant quartile, similar to the alternative quartile-based rainfall hyetograph in the NOAA Atlas 14.
Figure 7 presents the probabilistic hyetographs of W for the test basin over the four selected durations.The graph represents the cumulative probability of occurrence at 20% increments and a moving window smoothing technique was performed on each curve (Bonnin et al., 2011).For the 24-hr and 48-hr durations, the probabilistic hyetographs were developed using ensemble W events where most W occurred in the first-quartile; for the 72-hr and 96-hr durations, they were developed using the W events where most W occurred in the second-quartile.For each duration, the 10% hyetograph curve indicates that 10% of the corresponding W events had temporal distributions that fell above the curve (i.e., 10% exceedance probability); the 50% curve represents the median temporal distribution.The broad range between these curves represents the broad uncertainties associated with the W hyetograph in design flood estimates.In this study, we tested the median (50% curve) hyetograph and an optimized hyetograph method in the NG-IDF modeling.In the optimized hyetograph method, all curves varying from 10% to 90% (at a 10% increment) were used in the NG-IDF modeling and the best results (i.e., closest to the DHSVM benchmark) were retained.The median hyetograph was used to test whether a hyetograph under the "average" condition can lead to acceptable results; the optimized hyetograph method was used to evaluate what are the best results we can achieve with the use of W hyetographs in the NG-IDF modeling.Note that our objective is not to overfit the model in order to make NG-IDF estimation align with DHSVM benchmark through optimized hyetograph selection.Rather, our aim is to measure the level of uncertainty in hyetograph selection during flood estimation and to emphasize the significance of taking hyetograph uncertainty into account when conducting flood risk analysis, as detailed in Section 3.4.

NG-IDF Modeling
Before using the developed snowmelt W hyetographs, we first tested the standard rainfall hyetographs in the NG-IDF modeling for comparisons.In our previous studies, we extensively investigated the difference between PREC-IDF and NG-IDF curves (Yan et al., 2018;Yan, Sun, Wigmosta, et al., 2020;Yan, Sun, Wigmosta, Skaggs, Hou, et al., 2019).However, in this study, our focus is solely on NG-IDF modeling.Specifically, we analyze NG-IDF modeling using a rainfall hyetograph in comparison to a developed snowmelt hyetograph.Figure 8 compared the developed snowmelt W hyetograph (using 24 hr median hyetograph as a proof-of-concept) against five standard rainfall hyetographs: uniform hyetograph proposed in the rational method and four types of NRCS rainfall hyetographs widely used over the United States.Results suggest that all rainfall hyetographs substantially underestimate the W intensity in the snow-dominated test basin.In the following comparisons, we used two rainfall hyetographs-the uniform hyetograph and the NRCS Type IA hyetograph that is recommended for hydrologic design in the Sierra Nevada and Cascade mountains (McCuen, 1998).The uniform hyetograph is included because only daily precipitation data are available, and we uniformly disaggregated the precipitation data over 24 hr.
Figure 9a compares the design flood estimates from the DHSVM continuous simulations (q dhsvm ) and NG-IDF modeling (q ng ) with the use of uniform hyetograph and NRCS Type IA hyetograph, respectively.Note that we used the critical design duration approach to identify potential peak design flood in the NG-IDF technique.In addition to the deterministic estimates, sample uncertainties associated with DHSVM flood and NG-IDF frequency analysis were also quantified and shown as the 90% confidence intervals (i.e., error bars) in Figure 9a.As described in Section 2.5, the Z statistic was used to test the statistically significant differences in the design flood estimates between the two methods.The associated p-values of the pairwise comparison between DHSVM and NG-IDF estimates were also shown in Figure 9a.It is observed that compared to q dhsvm , q ng with the use of uniform and NRCS hyetographs both showed statistically significant underestimations of design floods (p-value <1%) for all four ARIs, even though the wet AMC and critical design duration were used to reduce the potential underestimation of the design flood.In sum, these results suggested that standard rainfall hyetographs can lead to a substantial underestimate of flood risk and it is necessary to develop new hyetographs to enhance NG-IDF performance.Note that the larger confidence interval associated with the NG-IDF method are due to the TR-55 nonlinear rainfall-runoff process used to convert  Figure 9b compares the design flood estimates from q dhsvm and q ng with the use of median and optimized W hyetographs, respectively.To select the optimized hyetographs, various curves ranging from 10% to 90% were tested (with a 10% increment) in the NG-IDF modeling.The optimized hyetographs were then chosen based on their similarity to the DHSVM benchmark, with the best-fit curve being selected.Similar to Figure 9a, the error bars represent the 90% confidence intervals of design flood estimates associated with the sample data uncertainties in the NG-IDF curves; the p-values of the pairwise comparison indicate the Z statistics.It is observed that compared to the results with the use of standard rainfall hyetographs (Figure 9a), the use of either median or optimized W hyetographs substantially improved the design flood estimates.When using the median W hyetograph, the absolute relative errors between q dhsvm and q ng were about 23%, 8%, 9%, and 20% for the 5-year, 10-year, 25-year, and 50-year events, respectively.The absolute relative errors were further reduced to about 5%, 1%, 2%, and 0.5% for the 5-year, 10-year, 25-year, and 50-year events with the use of optimized W hyetograph, respectively.When considering the sample uncertainties in the NG-IDF curves, we found statistically insignificant differences (i.e., p-value >5%) between the q dhsvm and q ng for Figure 7. Probabilistic water available for runoff (W) hyetograph for the test basin.The graph represents the cumulative probability of occurrence at 20% increments based only on the W events that belonged to the dominant quartile.Using the 24-hr duration as an example, the first-quartile is the dominant quartile which means that most of the W events had their greatest percentage of the total W fell during the first-quarter, i.e., the first 6 hr.
all four selected ARIs with the use of median and optimized W hyetographs, respectively.These results suggested that W hyetographs offer good performance for NG-IDF technology in hydrologic design and the median W hyetograph may be appropriate.However, the broad range in the W hyetograph shown in Figure 7 can result in a broad range of design peak flow or volume estimates, leading to large uncertainties associated with the W hyetograph selection.We argue that the probabilistic W hyetograph should be used in a way that reflects the goals of the user, and  no single W hyetograph (e.g., median curve) works the best under all design conditions.A practical path forward is to quantify the uncertainty contribution of W hyetograph selection in NG-IDF design flood estimates and then perform a design risk analysis that depends on the risk tolerance of a particular asset or project.To illustrate, a wide variety of design floods or volumes can be estimated.To minimize the risk of failure in the design of critical infrastructure, it is advisable for users to focus on temporal distributions that are more likely to produce higher peaks instead of median cases.Additionally, a decision-analytic cost-loss-ratio model (Murphy, 1977) can be utilized to determine the total cost associated with a specific hyetograph.Furthermore, users should evaluate whether utilizing results from one of the quartiles instead of all samples would yield more suitable outcomes for their particular circumstances.
It should be noted that when conducting frequency analysis on AM-W and streamflow, an important assumption is made that all samples are stationary, independent, and identically distributed.In the case of the small basin examined in this study, this assumption holds true because all samples are generated from the snowmelt process and no statistically significant trends have been observed.However, for larger basins that involve multiple flood-generation processes, such as flood mixtures, additional attention is required when estimating flood quantiles.Previous studies (Barth et al., 2019;Murphy, 2001) have highlighted the need to account for such flood mixtures.Furthermore, with the expected impact of climate change, flood-generation processes may undergo changes, such as a decrease in ROS events at lower elevations and an increase at higher elevations (Li et al., 2019;Musselman et al., 2018).Consequently, it is crucial to employ a physically informed approach for flood frequency analysis, considering both the historical period and future projections.For instance, Barth et al. (2017) employed the mechanism of atmospheric rivers to partition annual peak flows in the western United States.They estimated flood quantiles by considering a mixed population.Additionally, Yu et al. (2022) demonstrated that neglecting mixture effects can lead to substantial uncertainties in estimating the magnitudes of extreme flood statistics.As explained in Section 2.6, the classification of W into rainfall, snowmelt, and ROS events can assist in partitioning flood peaks and conducting flood frequency analysis using mixed flood populations.

Sample and Hyetograph Uncertainty Contributions on Design Floods
In this NG-IDF technology assessment, both methods used simulations from validated DHSVM, and therefore the two major uncertainty sources associated with the NG-IDF technology assessment were the data sample uncertainty in NG-IDF frequency analysis and W hyetograph uncertainty in the TR-55 modeling.Following the procedure described in Section 2.7, we quantified the uncertainty contribution of each component to the mean range of design flood estimates using the range statistic in Figure 10.
Using the standard Z test, the differences between the two mean ranges were all statistically significant (i.e., p-value <5%) except for the 72 hr, 5-year event as shown in Figure 10.The magnitudes of both mean ranges tend to increase with ARI, suggesting larger uncertainty associated with less frequent extreme events.For events with lower ARIs such as 5-year or 10-year, both the data sample and W hyetograph uncertainties are important to the design flood uncertainties (relative difference <50%); for events with higher ARIs such as 50-year, the data sample uncertainty dominates (relative difference >100%), due to extrapolating to large ARIs using short-length to modest-length records (i.e., 28 years).This is mainly because the shape parameter of extreme value distribution determines the nature of the tail of the distribution and its value has significant impacts on the severity of large ARI events.Unfortunately, no matter what estimation method is selected, the shape parameter is difficult to estimate and sensitive to extreme events, especially given short records (Cooley & Sain, 2010;Cooley et al., 2007).These findings are also consistent with the latest federal flood frequency guideline Bulletin 17C (England et al., 2018), which recommended using regional information to reduce data sample uncertainty, particularly when records are short (<30 years).
It is worth noting that the contribution of the W hyetograph uncertainty, although smaller than the data sample uncertainty at larger ARIs, cannot be neglected, considering that the associated mean range of design flood estimates exceeded the absolute value of the DHSVM benchmark as shown in Figure 9.For example, for the 24 hr, 50-year event, the mean range of design flood contributed from the W hyetograph uncertainty was 4.66 m 3 /s, exceeding the absolute value of the DHSVM benchmark of 3.74 m 3 /s.Especially for small infrastructures such as drainage system design, many local/federal surface water design manuals (SCDM, 2016;SWMMEW, 2019;UFC, 2013) recommend ARIs of 5-25 year.The sample data uncertainty versus the W hyetograph uncertainty is actually the epistemic uncertainty versus aleatory uncertainty which involves a lack of knowledge about the response data or random natural variability (Beven & Smith, 2015).The epistemic uncertainty can be suitably resolved by improving the model while the aleatory uncertainty cannot be diminished.For example, we can use the peaks-over-threshold method, the regionalization method, or extend the simulation period to reduce the sample data uncertainty in NG-IDF curves; however, the W hyetograph uncertainty cannot be reduced because of the natural variability.These results support that we must consider W hyetograph uncertainty in NG-IDF design and nonstationary changes of W hyetograph in future flood projections, and the developed W hyetograph can provide information using a probabilistic approach to risk assessment that is adjustable based on design cost and changing operating conditions.As mentioned in Section 1, the IDF design approach necessitates not only a storm magnitude but also a predetermined hyetograph, with the shape of the hyetograph differing depending on the underlying mechanism, such as the standard rainfall hyetograph versus the snowmelt hyetograph illustrated in Figure 8. Numerous studies (Cheng & AghaKouchak, 2014;Hou et al., 2019;Ragno et al., 2018;Schlef et al., 2023) have examined the potential changes in IDF curves resulting from global warming.However, climate change will cause different regions to experience varying shifts in their dominant flood-generating mechanisms, with higher mountains potentially experiencing more ROS events.

Conclusions
In this study, we assessed the performance of the recently developed NG-IDF technology in design flood estimates for hydrologic design at a snow-dominated small basin located in the eastern Sierra Nevada Mountains, California where the snowmelt flows through the US DoD's MCMWTC.Based on evaluations of NG-IDF technology, Based on the results of this study, we have four major conclusions: 1.The standard rainfall hyetographs such as uniform distribution proposed in the rational method, or NRCS temporal distributions lead to substantial underestimation of design floods and therefore are inappropriate for the small snow-dominated basin in the Sierra Nevada.There is an emerging need to systematically develop W hyetograph over the CONUS.2. The median W hyetograph generates acceptable flood estimates but the probabilistic W hyetograph represents a broad range of uncertainty or aleatory uncertainty which is caused by random natural variability and cannot be diminished, resulting in a broad range of variability in design peak flow or volume estimates.A full risk analysis that includes W hyetograph uncertainty is recommended for risk-based hydrologic design and future climate change impact studies on flood risk (e.g., changes in hyetograph versus changes in storm magnitude).3. Instead of simply "pooling" all ensembles of large W events together in developing a probabilistic W hyetograph, it is important to investigate the underlying mechanism (e.g., rainfall, snowmelt, ROS) to gain a physical understanding of their behaviors.For example, the test basin in this study was dominated by the snowmelt mechanism and therefore the W hyetograph presented an explicit diurnal pattern controlled by solar energy input.4. Sample data uncertainty in the NG-IDF frequency analysis and W hyetograph uncertainty in the TR-55 modeling equally contribute to the design flood uncertainties at smaller ARIs such as 5-year event (which is commonly used in hydrologic design such as culverts); sample data uncertainty dominates the W hyetograph uncertainty at larger ARIs such as 50-year event.The sample data uncertainty or epistemic uncertainty can be suitably resolved by improving the model such as the regionalization method or extending the DHSVM simulation period.Nevertheless, it should be acknowledged that these findings are derived from a small snow-dominated basin in the Sierra Nevada, and as such, the outcomes may differ for larger snow-dominated areas with varying basin sizes, climates, and vegetation.
Despite the promising results with the use of the developed W hyetograph, we acknowledge that further assessments are still necessary to make NG-IDF technology ready for practicing design.The conclusion reached in this study pertains solely to the small study basin in the Sierra Nevada Mountains.Obviously, more case studies are necessary to develop regional W hyetographs generated from different physical mechanisms and validate them in different hydroclimate regions.The characteristics of a basin, including its size, topography, climate, and vegetation type, can have an impact on the shape of the hyetograph and the estimation of design flood.For instance, in a large basin with rainfall-dominated and snow-dominated regions, the standard rainfall hyetograph may be suitable if summer thunderstorms are the primary cause of flooding instead of spring snowmelt (Gochis et al., 2015).However, at lower elevations where snow has historically dominated, the standard rainfall hyetograph may become applicable in the future due to a shift in precipitation phase from snowfall to rain.Conversely, at higher elevations, the snowmelt hyetograph may currently be effective but could underestimate design flood estimation in the future due to an increase in the frequency of ROS events under climate change (Musselman et al., 2018).Additionally, the vegetation type in a basin can also affect the hyetograph shape, with evergreen forests having a higher occurrence of ROS events than open spaces (Mooney & Lee, 2022), and postfire lands potentially accelerating the rate of snowmelt (Gleason et al., 2013).Next, we plan to use the well-validated DHSVM model with the developed regionally coherent snow parameters (Sun et al., 2019) to construct regional W hyetographs over the CONUS and test them against different hillslope configurations and selected validation basins using the general method presented in this study.

Figure 1 .
Figure 1.The framework for developing water available for runoff hyetograph and evaluating the next-generation intensity-duration-frequency technology in design flood estimates using Distributed Hydrology Soil Vegetation Model continuous simulations.

Figure 2 .
Figure 2. The location of Upper West Walker Basin and the selected small snow-dominated basin that flows into the U.S. Department of Defense (DoD) Marine Corps Mountain Warfare Training Center (MCMWTC).

Figure 3 .
Figure 3.Comparison of measured and simulated daily streamflow (a and b) and snow water equivalent (c) for the Upper West Walker Basin.

Figure 4 .
Figure 4. Basin-scale next-generation intensity-duration-frequency curves with the associated 90% confidence intervals for the four selected durations varying from 24 to 96 hr.

Figure 5 .
Figure 5. (a) Distributed Hydrology Soil Vegetation Model simulated 15-min hydrograph for the small test basin.(b) Annual maximum (water year) flood time series (obtained from the 15-min hydrograph) for the small snow-dominated basin.

Figure 6 .
Figure 6.Ensemble hyetographs of water available for runoff (W) for each selected duration.All hyetographs were derived from large snowmelt events only.

W
magnitude into flood magnitude, whereas DHSVM directly generates design flood magnitude through frequency analysis of annual maximum flood.Similar results are found inYan, Sun, Wigmosta, Skaggs, Hou, et al. (2019).

Figure 8 .
Figure 8.The standard National Resource Conservation Service rainfall hyetographs and uniform rainfall hyetograph versus the snowmelt hyetograph.

Figure 9 .
Figure 9. (a) Comparison of design flood estimations from Distributed Hydrology Soil Vegetation Model and next-generation intensity-duration-frequency (NG-IDF) technology in which two standard rainfall hyetographs were used.The error bar represents the associated 90% confidence intervals due to sample data uncertainty.The pair bracket presents the p-value from Z statics in the significance test.(b) Similar to (a) but the developed snowmelt hyetographs were used in the NG-IDF technology.Median W hyetograph indicates the 50% hyetographs extracted from Figure 7, separately for each duration.Optimized W hyetograph indicates that we ran the TR-55 with different W hyetographs (e.g., varying from 10% curve to 90% curve) and reported the best design flood estimates we had achieved.

Figure 10 .
Figure 10.Mean range of design flood estimates depicting the uncertainty contribution of the sample data and water available for runoff hyetograph inTR-55 modeling for the four selected durations and average recurrence intervals.