Improved National‐Scale Above‐Normal Flow Prediction for Gauged and Ungauged Basins Using a Spatio‐Temporal Hierarchical Model

Floods cause hundreds of fatalities and billions of dollars of economic loss each year in the United States. To mitigate these damages, accurate flood prediction is needed for issuing early warnings to the public. This situation is exacerbated in larger model domains flood prediction, particularly in ungauged basins. To improve flood prediction for both gauged and ungauged basins, we propose a spatio‐temporal hierarchical model (STHM) using above‐normal flow estimation with a 10‐day window of modeled National Water Model (NWM) streamflow and a variety of catchment characteristics as input. The STHM is calibrated (1993–2008) and validated (2009–2018) in controlled, natural, and coastal basins over three broad groups, and shows significant improvement for the first two basin types. A seasonal analysis shows the most influential predictors beyond NWM streamflow reanalysis are the previous 3‐day average streamflow and the aridity index for controlled and natural basins, respectively. To evaluate the STHM in improving above‐normal streamflow in ungauged basins, 20‐fold cross‐validation is performed by leaving 5% of sites. Results show that the STHM increases predictive skill in over 50% of sites' by 0.1 Nash‐Sutcliffe efficiency (NSE) and improves over 65% of sites' streamflow prediction to an NSE > 0.67, which demonstrates that the STHM is one of the first of its kind and could be employed for flood prediction in both gauged and ungauged basins.

The main intent of this research is to develop a unified approach that corrects the errors in flood predictions for both gauged and ungauged locations over the Coterminous United States (CONUS).Predicting flood in ungauged basins (PUB) is an established area of research in hydrology (Hrachowitz et al., 2013).Correcting hydrological errors in ungauged basins is challenging (Mishra & Coulibaly, 2009) as flood variability is unknown.Methods to address this challenge to date include spatial proximity methods (Tamaddun et al., 2019;Y. Zhang & Chiew, 2009), physical similarity approaches (Narbondo et al., 2020), data-driven methods such as artificial neural networks (Heng & Suetsugi, 2013), and nonlinear regression models (Parajka et al., 2013).These studies mainly focused on regional scaled or seasonal flood prediction, but flood forecast is typically required at daily-to-weekly time scale.Further, with regard to flood prediction, most studies relate the flow attributes available at gauged sites (e.g., 25-year return period flood) with the hydroclimate and basin characteristics to develop a statistical model and then use that relationship to estimate the corresponding flood values at based on basin and climate characteristics available at ungauged locations (see Table 3 in Salinas et al., 2013).However, most of these two-step approaches have focused primarily on design flood as opposed to predicting daily flood flows, which are critically important for issuing early warnings.Further, these two-step regression modeling can be effectively integrated into a single step using a hierarchical model (Das Bhowmik et al., 2020;Devineni et al., 2013).To our knowledge, limited/no application of hierarchical model has been performed for estimating daily flows at ungauged locations over the CONUS.
Hierarchical modeling (aka., multilevel models) is commonly used to combine time-varying hydrologic information (e.g., observed flood) with the spatially varying basin and hydroclimatic characteristics (e.g., Ossandón et al., 2022).Hierarchical model frameworks have the advantage of considering both spatio-temporal predictors and categorical (i.e., spatial or temporal) predictors for estimating a predictand (Gelman & Hill, 2006).Flood prediction studies that used hierarchical models used the river basin dendritic structure to predict the flows at a downstream location based on predictors such as basin-level meteorological variables and observed or hydrologic model predicted flood (Ossandón et al., 2022;Ravindranath et al., 2019).However, these studies have focused on predicting flood at gauged locations.Given that the hierarchical model is a spatiotemporal model with multi-level predictors, a hierarchical model could in principle be extended for predicting flood at ungauged locations by considering predicted flood available from any hydrologic model and other basin characteristics (e.g., aridity index) that commonly influence the error structure in hydrologic model prediction.Based on these underpinnings, we propose a novel hierarchical modeling structure that uses spatio-temporally varying observed/predicted flood information and spatially varying basin characteristics (e.g., drainage area) and hydroclimatic information (e.g., aridity index) for estimating flood flows at ungauged basins over the CONUS.
Continental-scale hydrology studies have evaluated parsimonious mechanistic models (Archfield et al., 2015), lumped-hydrological models (Vogel & Sankarasubramanian, 2000), and hybrid (statistical-mechanistic) models for estimating flood at different time scales (Evenson et al., 2021).Utilizing distributed hydrological models provide a viable alternative to estimating daily flood at ungauged locations, but challenges remain in accurately predicting flood over continental scale (J.M. Johnson et al., 2023).Frame et al. (2021) evaluated the National Water Model (NWM) for selected virgin basins and found that the performance of NWM is poorer compared to the post-processing models.Post-processing can often decrease bias in hydrological model outputs and reduce systematic errors from forcing and other process representations (Li et al., 2017;Rezaie-Balf et al., 2019).Post Processing methods can be data-driven (e.g., J. M. Johnson et al., 2023) or physically informed (Wu et al., 2019).Recently, Frame et al. (2021) applied a long short-term memory (LSTM) machine learning approach to improve daily NWM flood prediction across CONUS and compared its performance with flood prediction using LSTM with just atmospheric forcings.More deep learning (DL) models have shown promising efforts in improving stream forecast, including Feng et al. (2020) and Kratzert et al. (2019).The role of DL models in predicting not only streamflow, but also in predicting other variables (e.g., groundwater), including developing insights on how DL models learn specific hydrologic processes (Nearing et al., 2020;Sundararajan et al., 2021).But, DL models perform similar to traditional regression models in time series forecasting, (Elsayed et al., 2021).Similarly, studies have evaluated the physics-informed DL methods, delta models, for predicting streamflow in the context of PUB and prediction in ungauged regions (K.Fang et al., 2022;Feng et al., 2022).However, most of this analysis has been performed only for virgin basins and did not focus on controlled basins (i.e., basins with anthropogenic activities such as reservoir regulation).J. M. Johnson et al. (2023) used random forest models to identify the basin characteristics and hydroclimatic information that influence the NWM performance in natural and controlled basins over the CONUS.They found that in arid basins and basins with moisture and energy being out of phase exhibit significant bias and reduced Nash-Sutcliffe Coefficient (NSE) (J.M. Johnson et al., 2023).Further, variables indicating anthropogenic activities-percent imperviousness and upstream storage in dams-also influence the bias and NSE in predicting the flood flows (J.M. Johnson et al., 2023).This indicates that the NWM performance depends on basin and hydroclimatic information, thereby exhibiting a regional/spatial error structure in predicting flood flows.We intend to consider these basin characteristics and hydroclimatic information as a hierarchy in predicting flood flows for natural and controlled basins within the proposed hierarchical model.
In this study, we propose a spatiotemporal hierarchical modeling (STHM) framework by using hydroclimatic information and basin characteristics as a hierarchy for improving flood prediction in ungauged basins under natural and controlled types over the CONUS.We used above-normal flow to test the proposed STHM framework.The motivation for employing STHM primarily stems from the hierarchical nature of the predictors, which are temporally varying NWM and spatially varying hydroclimatic and basin characteristics, for predicting above-normal flow in ungauged basins.For demonstration, we consider NWM above-normal flow prediction as a basin-specific predictor, but one could replace this with meteorological forcings (see the Section 4 for additional details) or any other hydrologic model predictions.The proposed hierarchical model is evaluated k-fold using a spatio-temporal validation based on its ability to predict in 2,674 gauges, which include both natural and controlled basins, over the CONUS.The manuscript is organized as follows: We first describe the data sets used and the formulation of the hierarchical model.Then the results are presented from overall model performance analyses through k-fold temporal and spatial validation, which is then followed by a seasonal analysis of the explained variance from each predictor and by a discussion.Finally, we summarize the key findings from the study along with implications for future work in improving above-normal flow prediction in ungauged basins.

Hydroclimatic Data and Hierarchical Model Setup
To develop a hierarchical model for predicting above-normal flow (>67th quantile) in ungauged basins, we have obtained several predictors that include daily above-normal flow from NWM, basin characteristics and hydroclimatic information.Details on the procedure for obtaining the predictors and USGS gaging stations are described below in the following sections.

National Water Model
The National Oceanic and Atmospheric Administration (NOAA) National Weather Service (NWS) Office of Water Prediction (OWP) have implemented the operational National Water Model (NWM) to support the operational activities of NWS River Forecast Centers, the Federal Emergency Management Agency and other government agencies (National Research Council, 2006).A primary goal of the NWM development is to provide flood predictions for any given riverine location within the coterminous United States (CONUS).The NWM is a continental-scale distributed high-resolution hydrologic model that produces streamflow predictions for 2.7 million stream reaches across the contiguous United States (CONUS), based on a variety of data ranging from radar-gauge observed precipitation to numerical weather prediction (National Research Council, 2006).The NWM relies on the Weather Research and Forecasting hydrologic model (WRF-Hydro) architecture (Ghotbi et al., 2020) and provides streamflow predictions extending up to 30 days in advance over the CONUS.NWM provides these predictions at gauged locations but still consists of errors, which depend on both hydrologic process representation and forcing errors (Viterbo et al., 2020).Furthermore, NWM predictions lack spatial correlation between predictions available at ungauged locations and nearby gauged locations, particularly in estimating above-normal flows because of the spatially uninformed model parameters (Brunner et al., 2020;Tijerina et al., 2021).J. M. Johnson et al. (2023) highlighted that the NWM exhibits systematic errors across space and depends on basin characteristics and hydroclimatic information.Due to these shortcomings, studies focusing on improving the NWM forecasts have been emerging recently using various post-processing methods (Frame et al., 2021).For this study, we consider daily flows from NWM (Q NWM ) as a predictor in the hierarchical model.

CONUS Basin Selection
Co-located NHDPlusV2 COMIDs and USGS National Water Information System (NWIS) gages are extracted from the Routelink file associated with NWM v2.0.The "dataRetrieval" R package (Hirsch & De Cicco, 2017) is then used to identify which of these basins have at least 10 years of observed daily flows between 1993 and 2018 for evaluating the proposed hierarchical model.Once identified, streamflow data are extracted from NWIS by gage ID using "dataReteival," and NWM v2.0 data are extracted by COMID using "nwmTools" (M.Johnson, 2022).For the NWM estimated flows, hourly data are converted to daily mean flows.Drainage area for each basin is obtained from the GAGESII USGS database.Additionally, the GAGESII data set classifies each USGS station's flow into controlled/natural based on the 2009 hydro-climatic network (HDCN) database and we consider that classification for developing the hierarchical modeling in the region.A coastal classification is applied to those catchments with more than 50% of the drainage areas within 150 km of the coastline, as these areas are susceptible to impacts from tides (Ramaswamy et al., 2004;F. Zhang et al., 2018).In this study, 74% of 330 coastal HUCs' drainage area fall totally within the 150 km definition.14% of 330 coastal HUC8s have more than 50% drainage areas within 150 km definition.7% of 330 coastal HUCs (26 HUC8s) have at least 33% of drainage area within 150 km definition, 4 (<1%) HUCs have drainage area less than 150 km definition.In total, we consider 2,674 controlled USGS gages, 451 natural gages, and 1,150 coastal gages spanning 1,508 basins (at HUC08 levels) and they are grouped into natural, controlled and coastal basins across the 18 HUC02 (Figure 1).As the focus of this study is on improving above-normal flows in ungauged basins, we consider only the above-normal streamflow condition, which is defined as the flow above the 67th percentile of daily flow in a given station.It is common in forecasting that tercile category-below normal (<33rd percentile), above-normal (>67th percentile) and normal (otherwise)-is used to provide prediction on the desired outcome of interest.Hence, we defined above-normal flow as the magnitude above 67% observed streamflow.Thus, we obtain observed daily above-normal flow (Q) (above 67th percentile of daily streamflow) from NWIS, which will be the predictand in setting up the hierarchical model.The corresponding day's daily streamflow from NWM reanalysis runs (v2.0) is considered as a predictor (Q NWM ) for the selected basin.

Upstream Reservoir Storage
Since streamflow is regulated by reservoirs to meet downstream water demand (Kumar et al., 2022), we consider cumulative upstream reservoir storage of a USGS gaging station as a predictor in the hierarchical model.Studies have shown that reservoir storage and their retention time significantly alter the downstream flow characteristics (Chalise et al., 2021).The dams associated with each gage are obtained from the 2019 United States Army Corp of Engineers National Inventory of Dams (NID) database (US Army Corps, 2019) and the cumulative upstream reservoir storage are obtained for each gage.We use the "Maximum Storage" from the NID database for calculating the cumulative upstream storage (S) (USACE, 2022).We also obtain the contributing area above each dam from NID.

Hydroclimate Data
In addition to dam attributes, we consider the following hydroclimatic attributes as predictors for the hierarchical model: (a) aridity index (b) mean monthly potential evapotranspiration and (c) phase difference between moisture (precipitation) and energy (potential evapotranspiration).
Monthly potential evapotranspiration (PET; kg/m 2 ) and precipitation (P; kg/m 2 ) are obtained from phase 2 of NLDAS for January 1993 through December 2018 (Mitchell et al., 2004).For the contributing area to each gage, the mean monthly PET and P are computed.The availability of moisture (i.e., precipitation) and energy (i.e., PET) together within the seasonal cycles influence the streamflow estimation (Petersen et al., 2012(Petersen et al., , 2018)).
The aridity index (AI) is calculated as the ratio of mean annual PET to mean annual P  (   ∕ P ) over each basin.Is used since arid basins are more difficult to calibrate and estimate streamflow compared to humid basins (Sankarasubramanian & Vogel, 2002).
The phase difference between moisture and energy is computed as the Spearman correlation between the monthly precipitation and potential evaporation (ρ(P, PET)).The Spearman correlation coefficient is determined for each NLDAS2 cell using the mean monthly PET and P over the 26 years and the basin-wide mean ρ is computed.If ρ is negative (positive), it indicates moisture and energy are out of phase (in phase), which could result in more (less) potential for runoff generation from the basin.We also consider mean monthly PET at basin level as an additional predictor in the hierarchical model.It is important to note that AI and ρ are not time-varying predictors and quantify the climatological interaction between moisture and energy.

Land Use Data
Since our interest is in developing flood predictions, land use, particularly urbanization, can play an important role in generating runoff and evapotranspiration (Merz et al., 2021).Urban imperviousness represents developed surface (e.g., roads, driveways, sidewalks, parking lots, rooftops) that limit the infiltration into the underlying soil and increase the frequency and intensity of downstream runoff (Caldwell et al., 2012).Thus, to reflect the development in the basin, urban imperviousness is derived from the U.S. Geological Survey (USGS) National Land Cover Database (NLCD) 2019 Impervious data layer that quantifies the percent developed impervious surface in each pixel (MRLC, n.d.).We also use the NLCD 2019 land cover layer to identify the percentage of each basin that is categorized as urban (Anderson level 1 value 2).Thus, urban basins are independent and are classified simply based on TIGER Census data (U.S. Census Bureau, 2020).These are small watersheds in urban areas as they are more impacted by the impervious land surface.So, we have identified 7% of NWIS sites as urban sites based on the Census Bureau definition with densely settled urbanized areas of 50,000 or more people using the 2020 Census data (U.S. Census Bureau, 2020).

Spatiotemporal Hierarchical Model (STHM)-Formulation
Hierarchical model are often used to analyze multilevel data, especially hierarchically structured with lower-level groups nested within higher-level groups.These higher-level groups can be spatial and/or temporal ones.The intercept and slopes at lower-level model become the outcomes of higher-level model.We use spatial and temporal hierarchies to develop the model for flood prediction in ungauged basins.We divide the day-of-year into 37 10-day windows and we denote each 10-day window, τ, for the temporal hierarchy.We have tested using different scales of time windows, including 5, 7, 10, 20, 30 days.Based on the performance and computational time, we picked the 10-day as a reasonable window for parameter estimation without losing the flood climatology.Figure S4 in Supporting Information S1 show the tradeoff between performance and computational time for different time windows for the natural basins.With regard to spatial grouping, we approached this so that the overall skill is maximized by exploiting the hierarchical structure and based on applicability in the PUB context over the CONUS.For instance, if define the region to be small compared to the CONUS (e.g., HUC2), there won't be much variability on the error structure of NWM as basins will be more homogenous which would make the model hierarchy to be redundant.Thus, basin and hydroclimatic characteristics should be varying enough so that errors arising from temporal hierarchy (i.e., NWM outputs and Q3t) could be explained with spatio-temporally varying predictors.Hence, we categorize the 18 HUC02 regions into three groups/levels with each group, j, denoting the spatial hierarchy (Figure 1).These three groups are based on empirical hydroclimate similarity considering aridity, regional landform, climate, and ecosystems (Heidari et al., 2020).Thus, all sites are nested under each spatial group k, and each time step is nested under the spatial group k.
To predict the flood (Q) at a specific site in a basin i at daily time step t within each temporal cluster τ and spatial cluster k, we have basin-specific terms/coefficients and fixed terms (i.e., coefficients are common to all sites under the same spatial and temporal cluster).The fixed terms include predictors aggregated mean values at the basin (HUC08, j) level, and can be separated into two groups: (1) no variation over time, and (2) varying over time.Predictors not varying temporally include the Spearman correlation indicating moisture and energy being in-phase or out-phase (ρ), the total dam storage (S), the aridity index (AI), and the percent impervious surface (Imp).These predictors share the same coefficients with the same 10-day widow and spatial group (i.e., same i and j value).Predictors varying over time include mean potential evapotranspiration in the corresponding basin (PET) within the same 10-day windows, and HUC08 level previous 3-day area weighted observed streamflow . For PET, the coefficients will also vary among different months within the same 10-day window and spatial group.For   3  , the coefficients will also vary among different basins (HUC08).Thus, for each τ-th 10-day window, (1) at each site level (i): The intercept term β 0(τ,i,j,k) in Equation 2is estimated for each τ-th time window at HUC08 level (j) using mean potential evaporation in the corresponding basin (PET) within the same τ-th 10-day windows (PET).Thus, at each HUC08 level (j): The intercept term β 00(τ,i,j,k) in Equation 3 is estimated at the grouped HUC02 level (k, spatial group shown in Figure 1) Q NWM is the NWM daily flow; ρ is the Spearman correlation indicating moisture and energy being in-phase or out-phase (ρ); PET is the mean 10-day potential evaporation as mentioned above; S is the upstream total dam storage; AI is the aridity index, and Imp is the percent impervious and ɛ is the residual.Thus, the proposed spatio-temporal hierarchical model has the final form as follows: Since we are interested in estimating the flow at ungauged basins, we represent the antecedent conditions based on previous 3-day average flow at the HUC08-level.Thus, given "m" gauged basins within the HUC08, for a given basin, then we obtain the depth of runoff for the previous 3 days for all the "m" basins.This 3-day average depth of runoff is then multiplied by the drainage area of the basin to get the basin-relevant average depth of runoff, and then average them 3 days over "m" basins to get previous average 3-day flows.For a gauged basin, of course, one can simply use the 3-day average flows as a predictor instead of the area-weighted flows.  3  is HUC08 level, previous 3-day area weighted observed streamflow.The moving average is calculated as: 3  on the same day is calculated as: where Q t,i is the streamflow at ith basin within the same HUC08 at date t, A is the corresponding drainage area of the ith basin.t is the corresponding date of the prediction.
Thus, Equation 5 is set up for nine groups for three basin types (i.e., natural, controlled and coastal) for three groups (Figure 1).For each of the nine models, coefficients are estimated over each 10-day window.To determine the best-fitted model for each group, we select the variables using the ℓ1-penalized maximum likelihood method proposed by Groll and Tutz (2014) and computed the coefficients using the R package "glmmLasso" developed by Schelldorfer et al. (2014).

Model Assessment and Validation
Given that we are interested in assessing the performance of the hierarchical model for estimating flows in ungauged locations, we consider both temporal and spatial validation procedures for assessing model performance.The temporal validation is performed to evaluate the STHM performance over a period different from the calibration, whereas the spatial validation is performed to evaluate the STHM for application in ungauged basins.
The temporal validation is performed by calibrating the STHM model using the data from 1993 to 2008 with the remaining data from 2009 to 2018 being considered for validation.For spatial validation, we use the k( 20)-folder cross-validation method (Browne, 2000).We treat 5% of locations as ungauged within their hierarchical group, fit the STHM for the remaining 95% of stations for the period 1993 and 2018, and then evaluate the SHTM performance for the period 2009-2018 for the left-out 5% of the basins.This process of leaving out the 5% basins is repeated until all the considered basins are left out and evaluated in a cross-validation mode.
The Nash-Sutcliffe efficiency (NSE) is widely used to assess the predictive skill of hydrological models (McCuen et al., 2006).In a perfect model with an estimation error variance equal to zero, the resulting NSE equals 1; a model with an estimation error variance equal to the variance of the observed time series, the corresponding NSE equals 0. Conversely, an NSE less than zero occurs when the observed mean is a better predictor than the model.The model performance criteria recommended by Moriasi et al. (2007) are used for evaluating the improvement's performance.Model prediction is considered "acceptable" if NSE scores are greater than 0.5, and considered "good" if the NSE is above 0.67.

𝛽𝛽
) of observed above-normal flow to each predictor.Model performance from the temporal validation alone is considered for analyzing the significance of each selected predictor.In general, the whole model variance   2 ( (1,)  2 ,. . . ,  ) is the sum of   2 (  | 1 ,. . . , −1 , +1 ,. . . ,  ) which is the correlation between y and that portion of x j which is uncorrelated with the remaining predictors of jth predictor x j with the remaining predictors.Thus,

Results and Analysis
We first evaluated the STHM performance from the temporal validation (Figure 2) and then provided a detailed analysis on the role of each predictor (Figure 3).Following that, we analyzed the performance of STHM in predicting above-normal flow in ungauged basins based on spatial validation (Figures 4-8).

STHM Performance on CONUS Above-Normal Fow Prediction
The STHM parameters were estimated over the calibration period (1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) only for above-normal flows, which was defined as the 67th percentile of observed daily streamflow for a given day obtained from the STHM predicted flows from 2009 to 2018.Thus, all the reported results were only for above-normal flows by considering when the observed flows are above the respective day's 67th percentile flow.The validation results, presented as the cumulative distribution of NSE, showed significant improvement in high streamflows/flood flows across the CONUS compared to the flood flows estimated from NWM streamflow reanalysis (Figure 2). Figure 2 also provided the improvements in NSE from STHM for the three groups of basins-natural, controlled and coastalover the CONUS.Overall, STHM improved NSE by 0.1 for more than 65% of sites under temporal validation (Figure 2).This suggested that STHM not only reduced the error in NWM systematically but also improved the prediction using a limited number of parameters estimated by pooling NWM data and other characteristics from the sites in the grouped region (Figure 2).Overall, during the validation period, 62.7% of controlled basins and 68.4% of natural basins improved better in predicting above-normal flows compared to NWM reanalysis prediction of above-normal flows (Figure 2).The improvement in natural basins was mainly dependent on climatic factors, hence exhibiting better performance.In contrast, controlled basins were more complex as their observed streamflow depends on the reservoir operation policies (Turner et al., 2020;Zhao & Cai, 2020).Coastal basins also showed limited improvements from the STHM as the observed above-normal flows are influenced by high tides (Ramaswamy et al., 2004;F. Zhang et al., 2018).It is important if a natural watershed is present in the coastal extent, then they are still in natural watersheds category.The performance of STHM in improving NWM reanalysis runs was summarized over the CONUS and by season (Figure S1 in Supporting Information S1).Overall, our model improved mean NSE by at least 0.1 during the validation period for more than 60% of the sites (Figure S1 in Supporting Information S1).

Importance of Predictors in the STHM
Overall, the STHM model with all selected predictors (Q NWM , ρ, PET, AI, Q 3d , Imp) explained more than 74.8% of the variance of the observed above-normal flows during the validation period (Figure 3).To be specific, the STHM explained the observed flow variance by 74.4%, 76.3%, and 71.1% for controlled, natural, and coastal basins respectively (overall average values).The explained variance by each predictor in the STHM Equation 1 in improving NWM above-normal flow can be decomposed using the relative importance estimator proposed by Grömping (2007).Based on the decomposed explained variance (   2  ), NWM reanalysis streamflow accounts for  more than 55% of the variance of the observed above-normal flow on average across the CONUS.Other predictors at the HUC08 level explained an additional 18%-35% of observed above-normal flow variance (Figure 3).

Previous Three-Day Areal-Weighted Flow (Q 3d )
Previous 3-day streamflow (Q 3d ) was the most important predictor (excluding Q NWM ) in controlled and coastal basins, explaining an average 18% and 16% variability of flood, respectively (Figure 3).In natural basins, Q 3d is the second most important predictor, and explained 12% of the variability in above-normal flow but showed strong seasonality over all the regions (Figure 3).Q 3d showed higher importance in controlled basins than in natural basins as dam operations highly regulate the observed flow (Gierszewski et al., 2020).Based on the variance explained by Q 3d , coastal basins showed pronounced seasonality particularly in the West region.Further, Q 3d was more important in warmer regions than in colder regions as higher evapotranspiration results in a more varying antecedent conditions.The coefficient of Q 3d showed strong seasonal changes in the warmer regions, but did not have clear seasonality in colder regions since they did not experience much change in antecedent conditions due to reduced evapotranspiration.

Aridity Index
Aridity index (AI) proved to be an important variable in improving the model predictions in warmer regions (e.g., Group 1, Figure 3).AI represents the long-term balance between water and energy and showed significant seasonality in explaining the variance of above-normal flows for all three regions (Figure 3).Among the natural basins, Group 3 (Group 2) basins had the least (most) improvement when accounting for aridity index, since most basins were humid and had little (significant) spatial variability in the aridity index.AI also played an important role in the west, particularly during the spring, which reflects the seasonal water availability from snowmelt (Gudmundsson et al., 2016).In the case of controlled basins, a similar seasonality pattern was observed in all three regions, but the explained variance was relatively less, which was to be expected as the controlled basins dampen the natural hydroclimatic variability (Figure 3).In coastal basins, the seasonal variation in AI was minimal.Overall, AI explained NWM streamflow variability as around 12%-15% for the Group 1 basins all through the year and 2%-4% variability for the other two groups.

Phase Correlation Between Moisture (Precipitation) and Energy (PET)
The phase correlation between energy and moisture (ρ) explained a smaller amount of above-normal flow variability but showed strong seasonal dynamics across the CONUS.For controlled and natural basins in Group 1 and 2, the phase correlation was less important in summer months, while in Group 3, the phase correlation was more important in summer months.Since ρ was estimated by the correlation between monthly PPT and PET, it represents the co-occurrence of energy (PET) and moisture (precipitation) and the explained variance by it indicated their role in influencing the runoff.The ρ was generally negative during the summer months, and it was positive over the Southeast during the winter months (Petersen et al., 2012(Petersen et al., , 2018)).Basins having negative correlation (i.e., moisture and energy being out of phase) exhibit strong seasonality in above-normal flow with increased potential for runoff and they were difficult to high streamflows.Further, the spatial variability in phase correlation was largest (least) during the winter and fall months over Group 1 and 2 (Group 3).On the other hand, phase correlation variability was the largest during the summer months over the Group 3 basins.Hence, it exhibits higher explained variance in improving the NWM streamflow during the summer.Explained variance by ρ over the coastal basins was around 2%-4% across all the regions and does not seem to play a significant role in improving the above-normal flow prediction.One potential reason for this was that the most coastal basins are controlled, hence they had limited role due to phase correlation.

Mean 10-Day PET
Mean 10-day PET (PET) represented the amount of energy available at a given time and it displays significant seasonality in the variance explained in improving the NWM above-normal flows (Figure 3).As expected, for controlled basins, (PET) had a minimal role in improving the NWM prediction seasonally over all three regions.In Group 3, using the mean monthly PET in the STHM had minimal impact on improving NWM streamflow, only explaining 2% of the variance for all three different types of basins.This is expected since Group 3 covers the northern regions, which exhibited minimal spatial (PET) variability in natural basins.The other two regions (1 and 2) exhibited significant spatial variability in PET, and as a result, including it in the STHM explained around 12% of the variance of observed above-normal flow for natural basins with significant seasonality, particularly during the summer.In the case of coastal basins, PET explained the observed above-normal flows better during the winter and in the fall over Group 1 and 2. This was primarily due to the latitudinal gradient in (PET) over these two regions during those seasons.

Total Storage and Impervious Surface
As expected, the total upstream artificial/reservoir storage in the basin played an important role in controlled and coastal basins, but not in natural ones.Regression coefficients of total upstream storage showed a positive correlation with above-normal flow throughout the year for the three grouped regions (Figure 3).In controlled basins, total storage in both Group 1 and 2 showed significant seasonality in explaining observed above-normal flow particularly during the summer months as these are the months significant above-normal flows occur due to in-phase seasonality (Midwest), snowmelt and hurricanes.However, in the coastal basins, explained variance from total upstream storage indicated strong seasonality in explaining NWM above-normal flow, particularly in the summer, explaining around 3%-9% variance.Explained variance by the basin imperviousness did not show strong seasonal variability except in the Group 3 controlled basins.Overall, imperviousness accounted for 2%-4% variance in explaining above-normal flows for coastal basins.

Potential for STHM in Predicting Above-Normal Flows in Ungauged Basins
Given NWM daily streamflows are available for any ungauged locations within the CONUS and other predictors of the STHM (i.e., basin characteristics and hydroclimatic information) could be estimated for any location based on openly available data sources, we evaluated the potential for above-normal flow prediction for any ungauged basins using STHM.Since STHM could not be evaluated for ungauged basins with no streamflow data, we performed k( 20)-fold cross-validation under which 5% of the basins were left out and the remaining 95% of the basins were used for parameter estimation of the STHM.This process was repeated until all the basins are evaluated at least once in the "ungauged" prediction mode.Thus, this spatial cross-validation experiment was similar to evaluating the STHM in an "ungauged" prediction mode.The cross-validation experiment showed promising results for the "ungauged" basin, with 61% of the natural basins improved to an "acceptable above-normal flow prediction" (NSE > 0.5), and 63% of natural basins improved to a "good above-normal flow prediction" (NSE > 0.67) (Figure 4).

Seasonal Performance of STHM Under "Ungauged" Prediction Mode
Spatially, STHM improved the majority of "ungauged" basins across all groups (Figure 5).The biggest improvement occurred in colder regions, with improvements in average NSE by at least 0.2 for more than 30% of the sites.The largest improvements occurred in the northern basins and along the Appalachian Mountains, however significant systematic error persisted in the southeastern basins (Figure 5a).The differences in the hierarchical model and the NWM performance also showed seasonal differences (Figure 5b; Figure S2 in Supporting Information S1).The highest overall improvement of the STHM occurred in the winter (December/January/ February, or DJF) and spring (March/April/May, or MAM) seasons across the CONUS, accounting for 71% of the improvement throughout the year.HUC02-regions 14 and 17 (Upper Colorado and Pacific Northwest) had shown the smallest springtime improvement (MAM, Figure S2 in Supporting Information S1), while basins in HUC02-regions 7 and 9 (Upper Mississippi and Souris-Red-Rainy) had the worst winter performance (DJF, Figure S2 in Supporting Information S1).The largest improvement of NSE in Group 3 occurred in summer (June/July/August or JJA, Figure S2 in Supporting Information S1).The largest improvement of NSE in Group 1 occurred in fall (September/October/November, SON, Figure S2 in Supporting Information S1) and the coastal basins had the highest NSE improvement overall.However, Southeast coastal basins showed limited improvement, as most basins did not improve over all the seasons.Furthermore, there was no significant improvement in skill between controlled and natural basins among different seasons (Figure S2 in Supporting Information S1).But, in Group 3, urban and coastal basins showed significant improvement in skill in all seasons but fall (SON, Figure 5b).

Performance of STHM for Different Basins Types Under "Ungauged" Mode
Among different regions, Group 3 had the highest percentages of basins with significant NSE improvement; Group 1 had the least percentage of sites improved for controlled and natural basins (Figure 5).The primary reason for limited improvement over the west (except Region 17-Pacific Northwest) was that most basins are arid, and the runoff has strong seasonality, hence they are difficult to model (Sankarasubramanian & Vogel, 2002).Group 2 had the minimum improvement among coastal and urban basins.Contrastingly, Group 1 outperformed other regions under coastal and urban basins, respectively (Figure 5).Overall, except for Group 2 coastal basins, the STHM improved the NWM above-normal flow prediction for over 56% of the basins in that category (Figure 6).Basically, STHM performance improved NWM prediction from 56% to 75% of the basins under each category.
It was important that STHM did not improve NWM performance in the remaining 25%-45% of the basins.In the discussion section, we provided details and experiments on how to improve STHM in those basins along with challenges in improving the performance of STHM in coastal and urban basins.
Cross-validating the STHM in predicting above-normal flows in "ungauged" mode showed 63% of natural basins, 39% of coastal basins, and 26% of the controlled sites have NSE above 0.67 from the STHM (Figure 6).This was a significant improvement in comparison to the NWM reanalysis runs.The cross-validation results also showed a similar spatial pattern (Figure 5) with NSE improving in 63% of overall basins (Figure 6).The highest NSE improvement was along the Appalachian Mountain range (Figure 5).The hierarchical model improved the overall NSE for more than 68%, 63%, and 49% of sites in controlled, natural, and coastal basins respectively (Figure 6b).This showed potential in utilizing STHM for above-normal flow prediction in ungauged locations as the hierarchical model uses both NWM streamflow, basin characteristics, and hydroclimatic information by leaving out the basin in the parameter estimation process.It was important to note that the previous 3-day streamflow (Q 3d ) was the only predictor that depends on observed streamflow.We had considered a simple drainage-area method to estimate the previous 3-day streamflow for an ungauged location for quantifying antecedent conditions.We discussed alternate approaches for improving that estimate in the next section.

Potential for Improving the Performance of STHM for Gauged and Ungauged Basins
The temporal (Figures 2 and 3) and spatial (Figures 4-6) validation results shown from the application of STHM predicted the above-normal flow using higher-level three spatial groups as shown in Figure 1, which was implemented primarily to support a continental scale above-normal flow prediction and to reduce the required computation time.(Note that computing all the groups' cross validation costs approximate 6 hr) Even though the STHM improved the NWM prediction for more than 55% of the basins, it was important to note that the STHM did not improve the above-normal flow prediction for the remaining 45% of basins in few groups (e.g., Group 1 natural basins, and Group 2 coastal basins) (Figure 6).This was primarily because the temporal and spatial hierarchies defined under those groups did not aid in improving the model performance as the spatially and temporally varying intercept term (β 0(τ,i,j,k) ), in Equation 1was not explained by the predictors defined in next two-level hierarchies (Equations 2 and 3).This implies that the spatio-temporal variability of β 0(τ,i,j,k) was too large or the predictors in the next hierarchies do not correspond to that variability.To demonstrate this point, we considered two experiments for three moderately performing categories-Group-1 controlled and natural and Group 2 coastal-under k(20) spatial cross-validation.The first experiment was performed by fitting the STHM across basin of similar type (i.e., natural/controlled/coastal) in a given HUC02 (eight HUC02s in Group 1 and four HUC02s in Group 2) under spatial validation and the results were aggregated to the group level (Figure 7).The second experiment was performed by fitting the STHM for each basin under spatial validation and the results were aggregated to the group level (Figure 8).Thus, in the second experiment, there won't be any hierarchies as defined by Equations 2 and 3 and the intercept term, β 0(τ,i,j,k) , was simply left as a basin specific-term varying every 10 days.
From Figure 7, the performance of STHM fitted at HUC02 levels significantly improved model prediction performance as a percentage of improved sites for the three selected groups compared to the performance shown in Figure 6.One could argue this comes from the increased number of parameters fitted in explaining the spatio-temporal variability of β 0(τ,i,j,k) .However, this was daily above-normal flow prediction, the number of data points (i.e., daily above-normal flow) available for fitting the STHM across the fitting period (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) was quite large, hence we didn't interpret this as a result of overfitting.It implied that the spatio-temporal hierarchies defined in Equations 2 and 3 are not explained by the selected predictors at the broad group level under that category.Under the second experiment, the performance of STHM further improved when the STHM was fitted with no hierarchies, but the performance of the basin was still evaluated under k( 20)-folder spatial cross-validation (Figure 8).This implied that observed above-normal flow at a particular site was not used for fitting the parameters in Equation 1, only the remaining 80% of those basins in that category was used for fitting the STHM for evaluating the model at a given site.From Figure 8, it was clear that the performance of STHM further improved compared to Figure 7. Figure 8 also could also be improved further by fitting the STHM directly at each site without spatial validation and the hierarchies in Equations 2 and 3. We did not perform that experiment as that model will not have applicability for ungauged basins.Thus, by redefining the group or regions in fitting the STHM, the performance of STHM could be improved further.However, this improvement also comes with additional limitations.The computational time for running the STHM also increased by 6 times and 15 times for obtaining the NSE in Figures 7 and 8 respectively.Another limitation of at-site evaluation of STHM (Figure 8) was that it limits the model applicability for ungauged basins as it will require obtaining the parameters by grouping of basins that are similar to the ungauged basin and developing such grouped basin to estimate the STHM for any ungauged basin will be a humungous computational task.Overall, the improved performance of STHM that we observed in Figures 7 and 8 come as a trade-off in fitting the model for regional performance versus at-site performance.Such issues had been addressed in the context of regional versus at-site calibration of hydrological models (Fernandez et al., 2000).We discussed these issues further in the Discussion.

Discussion
This study focused on developing a spatio-temporal hierarchical model (STHM) for above-normal flow prediction in gauged and ungauged basins across the CONUS.For this purpose, the study used a hydroclimatic (e.g., AI, PET) information and basin characteristics (e.g., imperviousness) along with NWM predicted streamflow to estimate the above-normal flow in natural, controlled and coastal basins.The proposed STHM was evaluated under split-sample temporal validation (Figure 2) to understand the role of different multi-level predictors (Figure 3) and using k(20)-fold spatial validation (Figure 4) for understanding the utility in above-normal flow prediction in ungauged basins over three grouped regions.Both temporal and spatial validation indicated the STHM ability to improve NWM streamflow prediction among different basin types (Figures 2 and 4).The spatial cross-validation results indicated the robustness of the model in predicting ungauged basins.The STHM greatly improved the performance of NWM predicted high streamflow for more than 52% of basins, resulting in a 0.1 improvement in NSE.This improvement is important for future flood forecast systems looking to provide accurate and reliable information to the public.The model improved most in control and natural basins, particularly, in Group 2 and 3 during colder seasons (SON, DJF).The underperformance in coastal basins could be influenced by lunar tides forcing a lagged runoff, particularly on the east Coast (Cerveny et al., 2010).However, compared to previous studies focusing on improving long-term mean annual streamflow predictions (Alexander et al., 2019a(Alexander et al., , 2019b) ) and natural basin alone (Frame et al., 2021), our model showed strong performance in improving finer-scale, daily streamflow across the CONUS.Instead of including many lagged variables from NWM as predictors (e.g., Woznicki et al., 2019), our model only selected a few key drivers of hydro-climate that are well founded in the literature on streamflow prediction and considers the concurrent NWM streamflow prediction.Further, the proposed STHM also provides improved predictions for both controlled and coastal basins.This gives our model the flexibility to be easily expanded to predict above-normal flows at the CONUS scale for both gauged and ungauged basins.
It is important to note that STHM predictions performed worse than the original NWM streamflow in 25%-45% of basins (Figure 4).To address concerns regarding this, we performed two experiments that fitted the STHM at each HUC02 (Figure 7) and for individual basins (Figure 8) for Group 1 (controlled and natural) and Group 2 coastal basins.This resulted in a significant improvement with most basins performing better than NWM performance.This indicates that poor performance of STHM in Figure 4 is primarily due to the trade-off in improving the regional performance of the model at the cost of at-site performance (Fernandez et al., 2000).However, fitting the STHM purely for at-site would limit its ability to predict in ungauged predictions.In a traditional hierarchical modeling approach, this would be considered as an "unpooled regression" model (Das Bhowmik et al., 2020;Devineni et al., 2013) as such a model will result with no regional modeling terms.The main advantage of STHM is its ability to interpret the role of each predictor and the hierarchical nature of the model helps in predicting above-normal flows in ungauged basins using both temporally varying NWM flows and antecedent conditions along with spatially varying basin and hydroclimatic characteristics.It can be easily understood that post-processing a model's flow would naturally result in improved model performance as regression is expected to reduce the model bias.Thus, our proposed STHM could be fitted after a reasonable grouping of basins or  HUC02-regions so that the resulting regional model parameters (i.e., Equations 2 and 3) would provide useful information for ungauged basins prediction.
It is important to note that our model mainly relies on basin characteristics (e.g., imperviousness) and hydroclimatic information (e.g., AI, phase correlation), which could be obtained based on widely available database mentioned in the data section for any ungauged basin.However, obtaining antecedent streamflow conditions, Q 3d , of the basin is difficult for ungauged basins.Hence, in our study, we obtained Q3d purely based on gauged basins available at the HUC08 level to get the depth of runoff and convert it to runoff based on the ungauged basins' drainage area (Equations 5 and 6).However, this step could be eliminated for gauged basins as one could use the observed 3-day streamflow itself for estimating the antecedent conditions.We also would like to mention that this 3-day could be improved using the stage information available from remote-sensing satellites, for example, the Global Flood Detection System (GFDS) (Kugler & De Groeve, 2007).This also could also potentially extend the STHM into a forecasting model if one were to use the real-time NWM forecasts.Thus, STHM could be utilized for real-time flood forecasting for both ungauged and gauged basins.
The performance of the STHM can also be further improved for controlled basins by considering reservoir operation characteristics (Ford & Sankarasubramanian, 2023) as well as reservoir purpose.We did not consider purpose of the reservoir as some of our previous publications, Chalise et al. (2021Chalise et al. ( , 2023)), found limited role of reservoir purpose in influencing flow alteration as it is primarily due to the limited conflicting nature of the multi-purpose reservoirs.Synthesizing reservoir operation characteristics using data-based methods is still at its infancy by many investigators (Ford & Sankarasubramanian, 2023;Turner et al., 2020).As this synthesis matures, the proposed STHM could be revised to incorporate reservoir operation characteristics for estimating high-flow in controlled basins.These are key opportunities to extend the accuracy of PUB.  1) for selected groups.Data defined by treating 5% of the locations sites as "ungauged" and using a k( 20)-folder cross-validation during the period .The results are grouped to be comparable with Figure 6.
Figure 8. Performance of the NWM v2.0 reanalysis data compared to the hierarchical model for daily above-normal flows using a site-specific model (as a proof of concept) by treating 5% sites as "ungauged" locations using k(20)-folder cross-validation during the period (1993-2018), the results are grouped to be comparable with Figure 6.
We utilized NOAA's NWM reanalysis runs for evaluating the proposed STHM ability in predicting the ungauged basins as these are immediately available over the entire CONUS.However, in principle, STHM could be fitted with any other hydrologic model outputs such as Variable Infiltration Capacity model (Liang et al., 1996) or SWAT (Arnold et al., 2012).Similarly, the NWM prediction could also be replaced with basin-level precipitation.
Recently, Frame et al. (2021) utilized atmospheric forcings alone, instead of NWM streamflow predictions, as a predictor for predicting streamflow and found that the Long Short-Term memory model (LSTM) performed equally well as that of LSTM trained with NWM streamflow.The proposed STHM modeling structure is also hierarchical and semi-parametric as its parameters vary over 10-day moving window, which makes it to estimate the non-linear dependencies between streamflow and the relevant predictors consisting of basin characteristics and hydroclimatic information.This indicates that there is potential for extending the STHM with other distributed hydrologic model outputs and/or with atmospheric forcings that drive the hydrologic models.

Summary and Conclusions
We describe a hierarchal spatial-temporal postprocessing model for improving above-normal flow prediction in both gauged and ungauged basins across the CONUS.The proposed STHM is hierarchical and semi-parametric, thereby having the ability to predict non-linear dependencies between streamflow and the predictors-NWM streamflow, basin characteristics, upstream reservoir storage and hydroclimatic information-for estimating above-normal flows in natural, controlled and coastal basins over the CONUS.Performance evaluation of the hierarchal model showed that increased predictive skill in over 50% of sites' by 0.1 NSE, and improved over 65% of sites' above-normal flow prediction to "good" (NSE > 0.67).For controlled basins, the primary improvement was due to the inclusion of areal averaged previous 3-day flow, which accounts for 18% of the variability of above-normal flow over all regions.But the explained variability of above-normal flow for coastal basins are only limited to 10% due to other unconsidered factors, for example, tidal influence.For natural basins, the biggest improvement by the models is due to the inclusion of predictors such as aridity index and phase correlation for extending the STHM for ungauged prediction.We also demonstrated that the reduced performance of STHM in several basins also stem the trade-off in parameter estimation between at-site improvement versus the regional performance, which is required particularly for ungauged basins prediction.Performance evaluation of the STHM under temporal and spatial the cross-validation results has shown robustness in predicting above-normal flow combines physical under "ungauged" prediction mode.
In addition to improved above-normal flow prediction, the developed model was evaluated in predicting above-normal flow for ungauged basins through k-fold spatial validation.Even though the STHM used NWM streamflow as a predictor, the model could be recalibrated with any other hydrologic model outputs or with precipitation and relevant atmospheric forcings.Further, the proposed STHM also post-processes the NWM prediction, thereby reducing the systematic biases in the model prediction.Since the STHM predictors are widely available for both any given site (e.g., NWM prediction and previous 3-day streamflow) with spatially and temporally varying predictors, we can apply the estimated model coefficients to any ungauged site using the corresponding HUC08 level parameters over the CONUS.It is important to note that the estimating 3-day streamflow for an ungauged basin is not possible, but it could be estimated based on other gauged basins within the HUC8 as all 1451 HUC8 basins have at least one USGS gauged station and 1178 out of 1451 HUC8 basins have at least two or more USGS gauged stations.Hence, the procedure is applicable within the CONUS for PUB without any further modification.Given that the NWM for real-time streamflow forecasts are available for any locations within the US, the proposed STHM could be employed for real-time flood forecasts for both gauged and ungauged basins.The proposed modeling approach is also hybrid as it combines physical attributes outputs with statistical modeling for developing flood prediction across the CONUS.These hybrid approaches are essential as real-time weather forecasts always have considered both dynamical model predictions with statistical correction scheme, which is popularly known as Model Output Statistics, in the weather forecasting community (Antolik, 2000).Thus, the STHM could be eventually employed for both ungauged flood prediction as well as for issuing real-time flood forecasts.

Figure 1 .
Figure1.Spatial distribution of 3,640 USGS stream gages classified as "controlled" and "natural" based on Hydro-Climatic Data Network (HCDN).These sites are also classified as coastal sites if they are within 150 km distance of a coastline.The HUC02 regions are grouped into three regions based on regional hydroclimatology.

Figure 2 .
Figure 2. Cumulative distribution of above-normal flow NSE from the NWM and NWM-HM approaches, separated by basin classification, during the validation period (2009-2018).

Figure 3 .
Figure 3. Relative importance of selected predictor variables, expressed as % variance of above-normal flow explained by the hierarchical model, for each month for (a) controlled, (b) coastal, and (c) natural basins over the grouped hydroclimatic regions (Group 1, 2, 3).Note that NWM flow was not included here.

Figure 4 .
Figure 4. Cumulative distribution of above-normal flow NSE from the NWM and NWM-HM approaches, separated by basin classification, over the period from 1993 to 2018.These were computed by treating each basin as an ungauged basin under k-fold cross-validation for each basin type.

Figure 5 .
Figure 5. (a) Maps the difference in NSE between HM and NWM daily flow by treating 5% sites as "ungauged" using k(20)-folder cross-validation from 1993 to 2018.Positive (Negative) values indicate the hierarchical model (NWM) better predicated above-normal flows.(b) Plots seasonal performance, indicated as % of sites with improved NSE, shown in for the three regions over the four basin classifications.Note that DJF, MAM, JJA, and SON are the initials for month, represents spring, summer, fall, and winter.

Figure 6 .FANG
Figure 6.Performance of the NWM v2.0 reanalysis data compared to the hierarchical model for daily above-normal flows by treating 5% sites as "ungauged" using k(20)-folder cross-validation from 1993 to 2018.In (a) controlled, (b) natural, (c) coastal, and (d) urban basins in each region.Urban basins are defined as basins containing 50,000 or more people based on the 2020 U.S. Census.Above-normal flows are defined when the observed daily flows are greater than the 67th percentile of daily flow.

Figure 7 .
Figure 7. Performance of the NWM v2.0 reanalysis data compared to the hierarchical model for daily above-normal flows based on HUC02 levels (instead of spatial groups shown in Figure1) for selected groups.Data defined by treating 5% of the locations sites as "ungauged" and using a k(20)-folder cross-validation during the period.The results are grouped to be comparable with Figure6.