Neglect of Potential Seasonal Streamflow Forecasting Skill in the United States National Water Model

Using data from the NASA Soil Moisture Active/Passive mission, Koster et al. (2023, https://doi.org/10.1038/s41467‐023‐39318‐3) conclude that, for medium‐scale basins in the contiguous United States, a quarter of interannual variability in springtime streamflow is explained by interannual anomalies in late‐fall soil moisture. This lagged relationship can be leveraged for seasonal hydrologic forecasting, but only if effectively captured by existing prediction models. Here, we extend the analysis in Koster et al. (2023, https://doi.org/10.1038/s41467‐023‐39318‐3) to diagnose systematic errors present in the United States National Water Model (NWM). Results demonstrate that the NWM tends to underestimate both the trans‐winter temporal memory of 0–1 m soil moisture as well as the correlation between 0 and 1 m soil moisture and streamflow—thereby reducing the NWM's ability to leverage vertically averaged soil moisture as a source of hydrologic predictability.


Introduction
Recent research has provided new insight regarding the link between antecedent soil moisture conditions and subsequent runoff generation.For example, while the role of pre-storm soil moisture for controlling storm-scale runoff response to non-extreme rain events is well documented (e.g., Song & Wang, 2019), antecedent soil moisture remains highly relevant for additional cases where it has previously been seen as playing a diminished role (e.g., seasonal streamflow forecasting on lead times of several months (Wyatt et al., 2020), regional flood forecasting (Tramblay et al., 2021), and spring streamflow prediction in snow-dominated basins (Harpold et al., 2016)).Likewise, even in cases where the importance of antecedent soil moisture has long been recognized, new observational capabilities have revealed an even larger than expected role.For example, surface soil moisture data sets provided by the NASA Soil Moisture Active/Passive (SMAP) mission suggest near-perfect rank correlation between pre-storm soil moisture and storm-scale runoff coefficients (i.e., total storm-scale runoff divided by total storm-scale precipitation) for rain events in the eastern United States (Crow et al., 2022).
Given the need to accurately model the relationship between antecedent soil moisture and future streamflow for hydrologic forecasting, an obvious question is the degree to which the role played by soil moisture is accurately captured by existing land surface models (LSMs).For example, Koster et al. (2023) used SMAP Level 2 soil moisture retrievals to demonstrate that nearly one quarter of the inter-annual variance in springtime runoff totals within medium-scale (i.e., 2,000-to 10,000-km 2 ) basins in the contiguous United States can be explained by profile soil moisture anomalies estimated for November 30 of the previous year.Such long-term predictability has obvious benefits for streamflow forecasting-but will be squandered if forecasts are based on an LSM that fails to adequately represent the lagged relationship between soil moisture and streamflow.Unfortunately, despite decades of LSM development, substantial uncertainty still exists with regards to their treatment of basic relationships between surface water states and fluxes (e.g., Crow et al., 2019;Dirmeyer et al., 2018).
Here, the forecasting system of interest is the National Water Model (NWM) currently being developed for nextgeneration, operational hydrologic forecasts by the United States National Weather Service.Our analysis builds on the earlier work of Koster et al. (2023) by applying a similar approach to evaluate the NWM's ability to reproduce the observed lagged relationship between antecedent soil moisture and future streamflow.

Approach
Results are based on basin-scale, inter-annual Pearson correlation coefficients (R) sampled between root-zone (i.e., surface to 1-m depth) soil moisture (RZSM) values on the first day of each month (RZSM 0 ) and corresponding variations in lagged seasonal streamflow (Q L )-where L [days] is the time elapsed between the acquisition of RZSM 0 and the start of the seasonal (i.e., 90-day) averaging period for streamflow (Q).Note that, due to large amounts of temporal autocorrelation present in 1-m RZSM values, results are insensitive to either the choice of any other day of the month or the application of monthly averaging to define RZSM 0.
Benchmark values of R(RZSM 0 , Q L ) are based on basin-outlet United States Geological Survey (USGS) Q observations and basin-averaged RZSM estimates obtained from the SMAP Level 4 Surface and Root-Zone Soil Moisture (SMAP_L4) product.Analogous basin-scale, LSM correlation results instead use Q and RZSM estimates obtained from a retrospective simulation of the National Water Model (NWM).Hereinafter, NWM-based estimates of Q and RZSM are indicated via the superscript "NWM" (i.e., Q NWM and RZSM NWM ).
Due to the relatively short temporal overlap (i.e., April 2015 to January 2023) between the SMAP data acquisition period and the latest-available NWM retrospective simulation, inter-annual estimates of R(RZSM 0 , Q L ) must (generally) be sampled from only seven annual data points.To mitigate the large sampling errors associated with such a small sample size, results are based on median values of basin-scale R(RZSM 0 , Q L ) sampled across many medium-scale basins within the contiguous United States (CONUS).In this way, we improve the statistical power of sampled correlations by sacrificing our ability to describe their spatial variation.Additional methodological details are provided below.

Study Basins and USGS Streamflow Observations
Study basins are obtained from a list of unregulated CONUS basins described in Lohmann et al. (2004) based on expert judgment by the NOAA NWS Office of Hydrologic Development and the visual inspection of upstream areas (Xia et al., 2012).Within this original list, we consider only basins between 100 and 10,000 km 2 in size.This leads to a revised list of 870 basins-see Figure S1 in Supporting Information S1.Note that our minimum basin size of 100 km 2 is significantly smaller than the 2,000-km 2 threshold applied in Koster et al. (2023).Results presented below are generally insensitive to variations in this minimum size threshold, and our use of a smaller threshold allows us to expand our basin sample size by roughly a factor of three.
Daily USGS streamflow observations are acquired at the outlet of each basin (USGS, 2016).While basins have been screened for major anthropogenic impacts, the absence of minor impoundment or streamflow modifications cannot be guaranteed.Likewise, due to irregular data gaps in USGS Q observations, the actual number of basins used to sample a single CONUS-median value of R fluctuates between 740 and 790.For all results, the basin-level support for observed and NWM-based correlations is made identical.That is, any basin removed from the observed sample due to USGS Q data gaps is also removed from the corresponding sample for NWM-based correlations.

SMAP Soil Moisture
RZSM time series values are obtained from top-1-m RZSM estimates provided in Version 7 of the SMAP_L4 product (Reichle et al., 2022).This product is based on the assimilation of SMAP brightness temperature data into the NASA Catchment LSM (Reichle et al., 2017(Reichle et al., , 2023)).RZSM 0 estimates are obtained by averaging 9-km/3-hourly SMAP_L4 RZSM into daily (0-24 UTC) basin-scale values corresponding to the first day of each calendar month.

10.1029/2023GL105649
Unless otherwise specified, RZSM 0 is obtained from the SMAP_L4 product.However, to examine the impact of the LSM component of the SMAP_L4 system, we also consider a case where RZSM is instead generated from exponential filtering (Albergel et al., 2008;Wagner et al., 1999) of the (retrieval-only) 36-km SMAP Enhanced Level 3 (SMAP_L3E v5;O'Neil et al., 2021) surface soil moisture (SSM) product for both ascending and descending orbits.In this alternative approach, a basin averaged RZSM proxy for day t is estimated as: where SSM t n is a basin-averaged SMAP_L3E SSM estimate obtained on day t n, and τ [days] is a fixed selected time scale.The applied upper limit of 5τ for n is arbitrary but-once made sufficiently large-has only a very minor impact on RZSM estimates.Note that the discrete form of Equation 1 allows for the continuous estimation of RZSM even in the case of intermittent SMAP_L3E SSM availability.SMAP_L3E SSM retrievals are obtained from the baseline dual-channel SSM retrieval algorithm described in O'Neill et al. ( 2021).All retrievals lacking recommended quality flags are masked.Basin-scale averages are based on the averaging of all 9-km SMAP_L4 and 36-km SMAP_L3 grid cells whose geographic centers fall within each basin.

National Water Model (NWM)
The NWM is currently being designed and implemented as the next-generation hydrologic forecasting system within the United States (Cosgrove et al., 2024).At its core, it relies on the Noah-Multi-Parameterization (NoahMP) LSM to estimate baseflow and partition precipitation into surface runoff and infiltration.The NWM then uses a routing procedure to convert 1-km surface runoff and baseflow estimates into localized streamflow predictions for 2.7 million separate CONUS river-reach segments.It is currently envisioned to serve as the basis for hydrologic forecasting at a variety of lead times; therefore, evaluating the NWM's ability to accurately characterize the lagged relationship between antecedent RZSM and subsequent Q is an important objective.
NWM results are based on a retrospective simulation using v3.0 of the NWM (accessed on 1 December 2023 from https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/index.html) and meteorological forcing data acquired from the Office of Water Prediction Analysis of Record for Calibration data set.The v3.0 NWM retrospective simulation was run continuously on a 1-km/hourly resolution over CONUS from 1 January 1979 through 31 January 2023.Here, we focus on the period 1 April 2015-31 January 2023 that overlaps with SMAP data production.

Attenuation Bias Correction
Correlation coefficients sampled between independently acquired variables (e.g., SMAP_L4-based RZSM 0 and USGS Q L ) must be corrected for attenuation bias before they are applied to benchmark equivalent LSM predictions (Crow et al., 2015).Attenuation bias refers to a reduction in sampled correlation between variables attributable to mutually independent random error present in either (Hutcheon et al., 2010).For example, consider a hypothetical basin where true inter-annual RZSM 0 and Q L anomalies are perfectly correlated.However, if SMAP_L4 RZSM 0 estimates are highly imprecise, sampled R(RZSM 0 , Q L ) based on these estimates will badly underestimate its corresponding true value.
If we know the anomaly correlation (vs.truth) of both SMAP_L4 RZSM 0 and USGS Q L , and can assume their errors are mutually independent, the correlation attenuation bias factor (α) is: where R RZSM and R Q are daily anomaly correlation coefficients for the SMAP_L4 RZSM 0 estimates and USGS Q L observations versus their corresponding true values.Multiplying the sampled R(RZSM 0 , Q L ) by α will correct for the spurious degrading impact of random, independent errors present in SMAP_L4 RZSM 0 and USGS Q L .
Daily validation across multiple ground sites with a spatial extent comparable to the basins considered here suggests an average R RZSM near 0.80 (Reichle et al., 2023).Likewise, even following aggregation to 90 days, USGS Q totals are not expected to be perfectly precise.Nevertheless, we choose-relatively optimistic-values of R RZSM = 0.85 and R Q = 1.00 that, when inserted into Equation 3, yield α = 1.18.Note that such optimism is conservative in the sense that it likely causes us to underestimate the true value of α and therefore make a smaller correction for the impact of attenuation bias than is otherwise justified.Below, we also consider cases where RZSM is summed into a 90-day average (RZSM).For this case, we define a modified bias attenuation factor (ᾶ) that is assigned an even lower value (i.e., ᾶ = 1.10 vs. α = 1.18) to reflect expected improvements in the anomaly precision of SMAP_L4 RZSM estimates realized upon averaging within a 90-day period.
Note that we do not apply a comparable correction factor to internal NWMbased correlations because, while RZSM NWM ) correlation estimates when sampled from NWM output.

Bootstrapping Approach
A two-step bootstrapping approach is applied to test the statistical significance of sampled bias in CONUS-median NWM R. First, seven specific years within our entire 7-year analysis period are randomly sampled with replacement.Such sampling is repeated to produce 500 replicates of this seven-member set.To account for reduced sampling power due to spatial autocorrelation, the same seven-member set of randomly selected years is applied across all study basins in each replicate.
Next, for each of these 500 temporal replicates, 870 individual basins within our original set of 870 basins (see Figure S1 in Supporting Information S1) are randomly sampled with replacement.This sampling is repeated to produce 1,000 spatial replicates of the 870-member basin set.
A bootstrapped distribution of NWM median R bias results, representing the combined impact of both temporal and spatial sampling limitations, is then obtained by sampling bias results across all 500,000 replicates (i.e., 500 replicates in time × 1,000 replicates in space).Crosses ("X") denote negative biases that are significant at 95% confidence based on a bootstrapping analysis.

Lagged Correlation Bias in NWM
indicates a 90-day average of Q starting L-days from the acquisition of RZSM 0, and plotted correlation values capture the median R(RZSM 0 , Q L ) sampled across the set of medium-scale (i.e., 100-10,000 km 2 ) CONUS basins shown in Figure S1 of the Supporting Information S1.
As seen in Koster et al. (2023), quiescent vadose zone hydrology conditions over the wintertime enhance αR (RZSM 0 , Q L ) in Figure 1a when RZSM 0 is sampled during the late-fall/early winter (i.e., on or around December 1) and compared to observed Q L during the subsequent spring (i.e., 45 days < L < 120 days).) in Figure 1c, particularly for the case of RZSM NWM 0 estimates obtained during the late fall/early winter, suggests that the NWM is missing, or perhaps mis-parameterizing, processes that could otherwise be exploited to leverage late-fall/early winter RZSM values for seasonal Q forecasts extending into the following spring.
Naturally, the application of the attenuation bias correction factor (i.e., α = 1.18) in Figure 1b increases the magnitude of the negative NWM bias found in Figure 1c.However, significant negative bias is still evident, albeit for a smaller set of RZSM 0 and L combinations, in an alternative version of Figure 1 constructed without attenuation bias correction (i.e., α = 1.00)-see Figure S2 in Supporting Information S1.In addition, since α has been chosen conservatively (see Section 2.4), it is reasonable to assume that Figure 1 provides a more accurate representation of true NWM bias than Figure S2 in Supporting Information S1.
Given the importance of depth on RZSM serial memory, it is also worth considering the impact of the discrete weighting applied to multi-layer NWM soil moisture estimates in Equation 2to obtain RZSM NWM estimates.However, regeneration of Figure 1 using extreme cases of both very shallow (i.e., w 0-10 cm = 1.00, w 10-40 cm = 0.00, and w 40-100 cm = 0.00) and very deep (i.e., w 0-10 cm = 0.00, w 10-40 cm = 0.00, and w 40-100 cm = 1.00) weighting yields qualitatively similar results-see Figures S3 and S4 in Supporting Information S1.

Decomposition of NWM Bias
As discussed in Koster et al. (2023), two separate processes contribute to the observed correlation seen in Figure 1a.The first is the temporal autocorrelation (i.e., memory) of RZSM anomalies versus a mean seasonal cycle.The level of water storage capacity in the top meter of the soil column is potentially sufficient for such autocorrelation to persist over lags of multiple months (Entin et al., 2000).The second process is the link between future RZSM levels and the concurrent infiltration capacity of the soil column.As a result of this link, the ability of the land surface to absorb rainfall or snow melt is progressively decreased as RZSM increases.
When considered in series, these two processes suggest a simple, two-step causal chain model, whereby RZSM on a particular day (RZSM 0 ) is linked to time-averaged RZSM in some future 90-day averaging interval (RZSM L ) that, in turn, is linked to time-averaged Q during the same period (Q L ).If these three variables are linearly related, and mutually linked only through the unidirectional causal chain RZSM 0 → RZSM L → Q L , the total correlation R (RZSM 0 , Q L ) can be decomposed as: (4) The decomposition in Equation 4 is valid only if an appropriate intermediate variable exists such that the assumptions underlying Equation 4 are respected.Our choice of RZSM L as that intermediate value is validated by demonstrating the approximate validity of Equation 4for both the NWM-modeled and observed USGS/ SMAP_L4 cases-see Figure S3 in Supporting Information S1.
Figure 2a plots bias in NWM RZSM/Q cross-coupling versus USGS/SMAP_L4-based observations (i.e., R . Note that we apply a reduced attenuation bias correction factor (i.e., ᾶ = 1.10 instead of α = 1.18) to reflect our expectation that the time averaging of SMAP_L4 RZSM improves its temporal precision (see Section 2.4).Results demonstrate that, across all potential RZSM 0 dates and lags L, there is a significant tendency for the NWM to underestimate the strength of the positive relationship between RZSM L and Q L .This is generally consistent with earlier work demonstrating a tendency for LSMs (including NoahMP) to underestimate the strength of the positive coupling between pre-storm soil moisture and storm-scale runoff coefficients during non-winter storm events in humid climates (Crow et al., 2019).
However, the RZSM/Q cross-coupling bias (Figure 2a) by itself does not explain the seasonal pattern of R(RZSM 0 , Q L ) biases seen earlier in Figure 1c.Given the decomposition in Equation 4, this suggests an important secondary role for the bias in NWM RZSM anomaly memory.Such bias is shown in Figure 2b relative to SMAP_L4 RZSM memory (i.e., R(RZSM NWM 0 , RZSM NWM L ) R(RZSM 0 , RZSM L )).Note how the seasonality of NWM RZSM memory bias (i.e., the observed variation seen in Figure 2b when RZSM 0 is obtained on the first day of different months) correlates relatively well with the NWM underestimation of R(RZSM 0 , Q L ) seen in Figure 1c.
In summary, NWM underestimation of R(RZSM 0 , Q L ) appears to be driven by both the general underestimation of RZSM/Q cross-coupling (Figure 2a) and a seasonally varying NWM bias in RZSM serial memory (Figure 2b).

Alternative Results Based on SMAP Level 3 Surface Soil Moisture
All results above are based on the use of SMAP_L4 RZSM estimates as a benchmark.This contrasts with Koster et al. (2023) who instead obtained a proxy profile soil moisture time series via exponential filtering of the SMAP Level 2 (SMAP_L2) surface soil moisture (SSM) product.
Figure 3a describes the change in observed R(RZSM 0 , Q L ) values (plotted earlier in Figure 1a) when the source of observed RZSM is changed from the SMAP_L4 RZSM product to the exponential filtering (see Section 2.2) of the SMAP Enhanced Level 3 SSM product (SMAP_L3E; O'Neill et al., 2021).(Note that, for our purposes here, the distinction between the SMAP_L2 and SMAP_L3E SSM products is immaterial.)Following Koster et al. (2023), τ is set equal to 38 days.
As shown in Figure 3a, moving from a SMAP_L4 to a SMAP_L3E RZSM reference generally leads to a reduction in sampled R(RZSM 0 , Q L ).Since there is no reason to assume any significant level of error cross-correlation between SMAP_L4 RZSM (or SMAP_L3E SSM) estimates and USGS Q, Figure 3a implies that SMAP_L4 RZSM is generally more precise than competing RZSM proxy values generated via the exponential filtering of SMAP_L3E SSM retrievals.Qualitatively similar results are found for values of τ ranging between 5 and 50 days (not shown).The decrease in correlation associated with the use of SMAP_L3E SSM retrievals supports our prior decision to utilize the SMAP_L4 RZSM product as the RZSM 0 benchmark.Nevertheless, our use of SMAP_L4 RZSM as a benchmark reference in Figure 2b is potentially problematic in that, as a data assimilation analysis, it partially relies on an LSM (i.e., the Catchment Model) to represent serial RZSM memory.To remove the impact of a model, Figure 3b re-calculates NWM bias in both R(RZSM 0 , Q L ) and R(RZSM 0 , RZSM L ) using exponentially smoothed SMAP_L3E SSM retrievals-in place of SMAP_L4 RZSM-and a range of τ values.Recall that NWM R(RZSM 0 , RZSM L ) bias reflects the inability of the NWM to adequately capture RZSM serial memory (see Figure 2b).For clarity, results are shown only for the case of December 1 RZSM 0 and L = 60 days-corresponding (roughly) to the late-fall RZSM 0 /early spring Q case examined in Koster et al. (2023).In Figure 3b, negative NWM biases are noted for both correlations across a range of τ values.This is consistent with earlier SMAP_L4 RZSM results and suggests that, despite the ) R(RZSM 0 , RZSM L )).Note the application of the ᾶ = 1.10 attenuation bias correction factor in panel (a).Crosses ("X") denote negative biases that are significant at 95% confidence based on a bootstrapping analysis.
differences seen in Figure 3a, our qualitative assessment of NWM bias is relatively robust to variations in our choice of a RZSM reference.

Sampling Limitations
Results presented above are all based on median R results sampled across many basins.An obvious next step would be a breakdown of NWM R (RZSM 0 , Q L ) bias as a function of basin characteristics-which, in turn, would aid in the targeting of specific NWM processes.However, the relatively short historical record (since April 2015) of the SMAP_L4 RZSM product is a significant obstacle.
Based on the sampling of inter-annual correlation across only 7 years of SMAP data, Figure S6 in Supporting Information S1 shows NWM R(RZSM 0 , Q L ) bias for individual basins and for RZSM 0 obtained on 1 December and L = 60 days.Positive bias in NWM R(RZSM 0 , Q L ) is evenly distributed across CONUS and-at present-lacks any clear regional structure.Longer RZSM data sets are likely required before sub-CONUS-scale variations in NWM R(RZSM 0 , Q L ) bias can be reliably mapped.Therefore, while we can characterize the seasonal structure of the NWM coupling bias (Figure 1) and clarify its roots in specific components of Q predictability (Figure 2), longer data records are required to identify specific NWM processes at fault.
Other longer RZSM data sets are currently available that could theoretically address this limitation.For example, it is possible to replace SMAP_L4 RZSM with the-potentially much longer term-RZSM NWM product in our analysis and calculate its lagged correlation with USGS Q L .However, within our current April 2015-January 2023 analysis period, this substitution leads to a substantial reduction in observed correlation with Q L and therefore eliminates much of the observed correlation bias seen in Figure 1c (compare Figure S7 in Supporting Information S1 using RZSM NWM to Figure S2 in Supporting Information S1 using SMAP_L4 RZSM).This is consistent with past research identifying the SMAP_L4 product as a uniquely skillful leading predictor of Q (Crow et al., 2017).

Summary and Conclusions
The availability of SMAP-based RZSM data sets provides an opportunity to assess existing LSMs applied to hydrologic forecasting applications.Here, we apply the SMAP_L4 RZSM product to diagnose the under-coupling of RZSM and lagged Q averages within the NWM (Figure 1c).This under-coupling is attributable to NWM underestimation of both static RZSM and Q coupling (Figure 2a) and serial RZSM memory (Figure 2b).These results are robust to reasonable variations in the relative vertical weighting applied to express RZSM NWM values (Figures S3 and S4 in Supporting Information S1) and alternative sources of SMAP-based RZSM time series (specifically, the exponential filtering of SMAP_L3E SSM retrievals-see Figure 3).Therefore, we recommend further scrutiny of processes controlling soil moisture auto-correlation and the impact of antecedent soil moisture on runoff during on-going attempts to refine and improve the NWM.Further study is also needed to determine how these coupling errors relate to known biases in NWM Q estimates described by Abdelkader et al. (2023).
Our analysis utilizes a fixed correction factor (α) for the impact of attenuation bias on sampled correlation-see Equation 3.While the true size of this factor is uncertain, our estimation of it is conservative in nature (Section 2.4), and key manuscript results are not altered when it is neglected (Figure S2 in Supporting Information S1).A more serious shortcoming is the limited historical record of SMAP data products, which prevents us from robustly sampling possible regional variations in coupling bias (Section 4.2).A potential approach for addressing this issue is the merger of existing longer-term RZSM data sets with shorter-term, but more precise, SMAP-based products.
perfectly precise, their random errors are made mutually dependent via the relationship between RZSM NWM and Q NWM stipulated by internal NWM physics.As a result of this error dependence, reduced precision in RZSM

Figure
Figure1aplots inter-annual Pearson correlation coefficients between SMAP_L4 RZSM 0 and USGS Q L (i.e., αR (RZSM 0 , Q L )) values across a range of both days-of-year for RZSM 0 and time lags L. As discussed above, Q L

Figure
Figure 1b replicates Figure 1a but for the case of RZSM NWM 0

Figure 3 .
Figure 3. (a) Change in αR(RZSM 0 , Q L ) (shown earlier in Figure1a) when, instead of being based on the SMAP_L4 RZSM product, RZSM 0 is based on SMAP_L3E SSM retrievals converted into RZSM estimates using an exponential filter and τ = 38 days. 1 April RZSM 0 results are missing due to our inability to initialize the exponential filter during April 2015 (i.e., the first complete month of SMAP data availability).(b) For the case of RZSM 0 obtained on 1 December and L = 60 days, NWM correlation bias in both R (RZSM 0 , Q L ) (black line; corresponding to Figure1c) and R(RZSM 0 , RZSM L ) (red line; corresponding to Figure2a) relative to observed correlations based on the exponential filtering of SMAP_L3E SSM retrievals and a range of τ values.Filled symbols indicate significantly negative values at 95% confidence based on a bootstrapping analysis.