Version 4 of the SMAP Level‐4 Soil Moisture Algorithm and Data Product

The NASA Soil Moisture Active Passive (SMAP) mission Level‐4 Soil Moisture (L.4_SM) product provides global, 3‐hourly, 9‐km resolution estimates of surface (0–5 cm) and root zone (0–100 cm) soil moisture with a mean latency of ~2.5 days. The underlying L4_SM algorithm assimilates SMAP radiometer brightness temperature (Tb) observations into the NASA Catchment land surface model using a spatially distributed ensemble Kalman filter. In Version 4 of the L4_SM modeling system the upward recharge of surface soil moisture from below under nonequilibrium conditions was reduced, resulting in less bias and improved dynamic range of L4_SM surface soil moisture compared to earlier versions. This change and additional technical modifications to the system reduce the mean and standard deviation of the observation‐minus‐forecast Tb residuals and overall soil moisture analysis increments while maintaining the skill of the L4_SM soil moisture estimates versus independent in situ measurements; the average, bias‐adjusted root‐mean‐square error in Version 4 is 0.039 m3/m3 for surface and 0.026 m3/m3 for root zone soil moisture. Moreover, the coverage of assimilated SMAP observations in Version 4 is near global owing to the use of additional satellite Tb records for algorithm calibration. L4_SM soil moisture uncertainty estimates are biased low (by 0.01–0.02 m3/m3) against actual errors (computed versus in situ measurements). L4_SM runoff estimates, an additional product of the L4_SM algorithm, are biased low (by 35 mm/year) against streamflow measurements. Compared to Version 3, bias in Version 4 is reduced by 46% for surface soil moisture uncertainty estimates and by 33% for runoff estimates.

Plain Language Summary Soil moisture is important because of its impact on the land surface water, energy, and nutrient cycles. The low-frequency microwave observations collected by the NASA Soil Moisture Active Passive (SMAP) satellite are suitable for estimating soil moisture globally. Their sensitivity, however, is limited to the top few centimeters of the soil, and observations are only available every other day depending on location. The SMAP Level-4 Soil Moisture (L4_SM) data product addresses these limitations by merging the satellite observations into a numerical model of the land surface water and energy balance while considering the uncertainty of the observations and model estimates. The resulting L4_SM data product is publicly disseminated within~2.5 days from the time of observation and provides global estimates of surface and deeper-layer (or "root zone") soil moisture at 3-hourly temporal and 9-km spatial resolution. This study presents an overview of recent updates in the L4_SM algorithm and an assessment of the quality of the resulting Version-4 soil moisture estimates. The algorithm updates reduce bias in L4_SM surface soil moisture and runoff estimates compared to previous versions while otherwise maintaining the product skill, which meets the accuracy requirement specified prior to SMAP's launch.

Introduction
The Soil Moisture Ocean Salinity (SMOS; Kerr et al., 2010;Mecklenburg et al., 2016) and Soil Moisture Active Passive (SMAP; Entekhabi et al., 2010) satellite missions measure Earth's L-band (1.4 GHz) passive microwave brightness temperature (Tb), which is highly sensitive to the water content in the top few centimeters of the soil and in the vegetation (Jackson & Schmugge, 1991;Schmugge et al., 1974). With their global coverage and 2-to 3-day revisit time, the L-band observations have revealed new insights into soil moisture drydown characteristics and basin-scale water balance Koster, Crow, et al., 2018;Shellito et al., 2016Shellito et al., , 2018, drought Mishra et al., 2017;Pablos et al., 2017;Rajasekaran et al., 2018;Sadri et al., 2018), soil moisture memory and its impact on land-atmosphere coupling McColl et al., 2017), the relation between surface soil moisture and subsoil characteristics (Dirmeyer & Norton, 2018), and vegetation optical depth and plant activity Feldman, Short Gianotti, et al., 2018;Konings et al., 2017). Moreover, L-band observations have been used to estimate precipitation Koster et al., 2016;Román-Cascón et al., 2017), to enhance estimates of evapotranspiration and its partitioning into bare soil evaporation and transpiration (Purdy et al., 2018;Rigden et al., 2018), and to improve the characterization of the runoff ratio for flood forecasting and land surface model calibration and evaluation Koster, Liu, et al., 2018). Recent results also confirm that L-band soil moisture retrievals, and in particular retrievals from SMAP, are generally more accurate than retrievals from C-and X-band instruments Kumar et al., 2018).
Among the global soil moisture data sets partly based on L-band observations is the SMAP Level-4 Surface and Root Zone Soil Moisture (L4_SM) product (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017;. The L4_SM product is derived from the assimilation of SMAP Tb observations into a land surface model driven with observed precipitation forcing to the extent possible. The L4_SM algorithm thus merges the information from the L-band Tb observations with the water and energy balance constraints encapsulated in the land surface model and the information in the surface meteorological forcing data, which is derived from gauge-based precipitation observations and a large number of atmospheric measurements. Based on the merged information, the L4_SM product provides global, 3-hourly, 9km resolution soil moisture estimates for the surface (0-5 cm), root zone (0-100 cm), and profile (0 cm to bedrock) layers with a mean latency of~2.5 days from the time of observation. As of January 2019, L4_SM data have been downloaded by more than 3,000 individual users (based on unique Internet Protocol host addresses) according to the National Snow and Ice Data Center.
The value of the L4_SM product has been demonstrated in a variety of ways. A case study in Australia showed that assimilating SMAP observations successfully corrected soil moisture for short-term errors in the L4_SM rainfall forcing . Validation with in situ measurements indicated that Version 2 of the L4_SM product met its soil moisture accuracy requirement, with a bias-adjusted (or unbiased) root-mean-square error (ubRMSE, also known as standard deviation of the error) of 0.038 m 3 /m 3 (0.030 m 3 /m 3 ) for surface (root zone) soil moisture at the 9-km scale (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017). Root zone soil moisture estimates from the L4_SM product also performed well in a multiproduct comparison versus in situ measurements in Spain (Pablos et al., 2018). Moreover, when compared to in situ measurements across the globe, the L4_SM soil moisture estimates were more skillful than model-only estimates that do not benefit from the assimilation of SMAP Tb observations (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017). L4_SM soil moisture estimates could also better predict the streamflow response to subsequent precipitation than model-only estimates and L-band (or C-band) soil moisture retrievals by themselves . Finally, L4_SM data have been used successfully to evaluate and calibrate the coupling strength between soil moisture and the subsequent, storm-scale runoff ratio in several different land surface models .
The observation-minus-forecast (O-F) Tb residuals of the Version 2 L4_SM product had a small global mean bias of~0.37 K, modest regional (absolute) biases under~3 K, and typical instantaneous (absolute) values of 6 K . The modeled Tb ensemble spread, however, overestimated the actual Tb errors in deserts and densely vegetated regions and underestimated them in agricultural regions and transition zones between dry and wet climates  Journal of Advances in Modeling Earth Systems Reichle, De Lannoy, Liu, Ardizzone, et al. (2017) reported a general wet bias in Version 2 L4_SM soil moisture estimates, particularly for surface soil moisture. Finally, the calibration and validation of the Version 2 algorithm was necessarily based on shorter data records than are available today.
The more recent L4_SM Versions 3 and 4 include substantial modifications of the land model and Tb analysis that address some of the shortcomings of Version 2 (section 2). Moreover, the L4_SM product also provides estimates of soil moisture uncertainty and runoff that have never been validated. The primary objectives of the present study are thus to (i) validate the current Version 4 soil moisture, soil moisture uncertainty, and runoff estimates alongside the previous Version 3 data (section 4), (ii) evaluate the L4_SM data assimilation diagnostics, including the statistics of the O-F Tb residuals and the soil moisture analysis increments (section 5), and (iii), throughout the analysis, relate the differences in performance and key diagnostics to the algorithm changes. The validation data and methods used here are described in section 3. We focus here on comparing the current Version 4 to Version 3 because we have three full years of overlapping data and because the differences between the Version 2 and 3 data are minimal at all in situ measurement sites used for validation (section 2.2.1). Thus, in effect, we quantify the performance of the Version 4 product relative to the documented performance of the Version 2 data (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017;.

Overview
The L4_SM algorithm consists of the assimilation of SMAP Tb observations into a land surface model (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017, their Figure 1). The essential ingredients of this system are the following: 1. horizontal-and vertical-polarization Tb observations from ascending and descending half-orbits of the 36-km resolution SMAP Level-1C product , 2. the NASA Catchment land surface model Koster et al., 2000), supplemented with a tau-omega radiative transfer model for L-band Tb (De Lannoy et al., 2013, 3. surface meteorological forcing data from the NASA Goddard Earth Observing System (GEOS) Forward-Processing (FP) system (Lucchesi, 2013), 4. the Center for Climate Prediction Unified (CPCU) 0.5°, daily precipitation product (Chen et al., 2008;Xie et al., 2007) rescaled so that its climatology matches that of the Global Precipitation Climatology Project (GPCPv2.2; Huffman et al., 2009) data, and 5. the GEOS land assimilation system consisting of a spatially distributed ensemble Kalman filter (EnKF; . The L4_SM estimates are generated on the global, 9-km resolution Equal Area Scalable Earth, version 2 (EASEv2) grid (Brodzik et al., 2012). In addition to soil moisture, the L4_SM product includes a variety of land model variables and assimilation diagnostics (Reichle, Lucchesi, et al., 2018).
Briefly, the L4_SM algorithm consists of driving an ensemble of Catchment model simulations with surface meteorological forcing data from the GEOS FP system after correcting the GEOS precipitation with the CPCU observational product. The ensemble spread represents model uncertainty and is maintained by adding perturbations to the model forcing and prognostic variables as described in Reichle, De Lannoy, Liu, Ardizzone, et al. (2017). Once every 3 hr at 00:00 UTC, 03:00 UTC, etc., the available SMAP Tb observations within a 3-hr window centered on the analysis time are compared with the corresponding model forecast Tb estimates, separately for horizontal and vertical polarization (after first averaging the corresponding foreand aft-looking SMAP Tb observations). Based on the O-F Tb differences, and taking into consideration the relative uncertainties of each, the EnKF analysis generates soil moisture and temperature increments. These increments are then added to the model estimates in order to make the modeled soil moisture and temperature fields more consistent with the SMAP Tb observations. While the Tb observations are typically only sensitive to soil moisture in the top few centimeters of the soil, the EnKF analysis also generates increments for root zone soil moisture. This is possible by including unobserved Catchment model variables related to deeper-layer soil moisture in the EnKF state vector and calculating increments based on the simulated (ensemble) error correlations between the surface and root zone layers .
During the assimilation process, the L4_SM algorithm interpolates and extrapolates information from the SMAP Tb observations and the model forecast estimates in time and in space; the resulting L4_SM data product represents the merged information. The degree to which L4_SM soil moisture estimates at a given time, location, and depth are impacted by SMAP Tb observations depends on the distance (in space and in time) from the observations, the sensitivity of the observations to surface soil moisture, and the coupling between the surface and root zone layers. Under dense vegetation, for instance, or when the ground is frozen or snowcovered, SMAP Tb observations have little or no immediate impact on the L4_SM soil moisture. Likewise, if the surface and root zone layers are decoupled, SMAP Tb observations have little immediate impact on L4_SM root zone soil moisture.
An integral part of the calibration of the L4_SM system is the removal of seasonally varying bias in the modeled Tb. To this end, seasonally varying and spatially distributed Tb "scaling parameters" were computed as the multiyear mean difference between L-band Tb observations and corresponding model estimates. Prior to each 3-hourly Tb analysis, the appropriate scaling parameters are subtracted from the O-F Tb residuals such that only Tb anomaly information is retained in the EnKF analysis . That is, the system is designed to correct only for errors in short-term and interannual variations and not for errors in the climatological seasonal cycles of the modeled soil moisture or other land surface fields.

Changes From Version 2 to Version 4
Version 2 of the L4_SM algorithm and product is discussed in great detail by Reichle, De Lannoy, Liu, Ardizzone, et al. (2017) and . Using Version 2 as a baseline,  provide a comprehensive project report of the system changes leading to Version 4. The present section summarizes only the most important of these changes. Unless noted otherwise here or by , the configuration of the Version 4 system matches that of the Version 2 system. For example, the ensemble size of 24 members and the model and observation error settings are preserved. Likewise, the overall approach for the Catchment model and ensemble spin-up is unchanged (except for the changes in the model version and forcing inputs discussed below).

Changes From Version 2 to Version 3
Version 2 of the L4_SM algorithm, owing to a calibration based solely on SMOS Tb data, was limited to assimilating SMAP Tb observations only in regions where radio frequency interference (RFI) does not adversely impact SMOS data. In Eastern Europe, the Middle East, and East Asia, SMOS observations are corrupted by RFI (Oliva et al., 2012) to such an extent that SMOS cannot provide the L-band climatology necessary to compute the Tb scaling parameters for the L4_SM algorithm. Consequently, SMAP observations were not assimilated in the Version 2 algorithm in these regions (see Figure 4 of . Owing to a variety of hardware and software tools to detect and correct for RFI, SMAP generally provides Tb observations nearly everywhere on the globe (Piepmeier et al., 2014). For the Version 3 system, Tb scaling parameters were computed from 2 years of SMAP Tb observations in those locations where SMOS Tb observations are insufficient due to RFI, resulting in near-global assimilation coverage. Consequently, the Version 3 L4_SM product differs considerably from the Version 2 data in the regions of expanded SMAP assimilation coverage, but the differences between the Version 2 and Version 3 data are minimal elsewhere, including at all in situ measurement sites used for validation here and by Reichle, De Lannoy, Liu, Ardizzone, et al. (2017).

Changes From Version 3 to Version 4
Additional and more pervasive changes to the L4_SM system were implemented with Version 4. The most consequential model change is the calibration of the Catchment model parameter (α) that governs the recharge of soil moisture from the model's root zone excess reservoir into the surface excess reservoir.
(The "surface excess" and "root zone excess" are Catchment model prognostic variables that capture the deviation of the modeled soil moisture profile from equilibrium in the 0-to 5-cm surface and 0-to 100-cm root zone layers, respectively.) Specifically, the replenishment of soil moisture near the surface from below under nonequilibrium conditions was reduced substantially in the new model version. This change was motivated by Koster, Liu, et al. (2018), who calibrated α locally by maximizing the correlation with SMAP Level-2 soil moisture retrievals across the continental United States (CONUS). Whereas the original model of the Version 2 and 3 systems uses a spatially constant value of α = 1, the calibrated α values of Koster, Liu, et al. (2018) range from 0.001 to 1, with a mean (median) value of 0.18 (0.004) (their Figure 2). In contrast, the Version 4 L4_SM system uses a spatially constant value of α = 0.04. This value is a compromise between high correlation and low ubRMSE of model soil moisture against in situ measurements from the Soil Climate Analysis Network (Schaefer et al., 2007) and the U.S. Climate Reference Network Diamond et al., 2013) across CONUS (not shown). Despite their more limited spatial coverage, in situ measurements are used here for model calibration because SMAP data are ultimately assimilated in the L4_SM algorithm. Note that the in situ measurements used for model calibration are distinct from those used for L4_SM validation (section 3.1).
Two important changes in the surface meteorological forcing data were made in Version 4 of the L4_SM system. First, the forcing for the pre-SMAP period in Version 4 is based on the Modern-Era Retrospective Analysis for Research and Applications (MERRA)-2 reanalysis (Gelaro et al., 2017), which enabled the use of a longer period (1980-2014) for L4_SM system spin-up (initialization). Second, and more importantly, the model background precipitation in the Version 4 L4_SM system is rescaled prior to applying the CPCU-based corrections so that its climatology also matches that of the GPCPv2.2 data. In earlier L4_SM versions, only the CPCU data were rescaled to the GPCPv2.2 climatology. Consequently, the precipitation climatology in Version 4 has changed substantially in Africa and the high latitudes, where the CPCU data are not used due to poor gauge coverage (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017).
An important change in the Tb analysis of the Version 4 L4_SM system is the removal from the EnKF state vector of the "catchment deficit" variable, the Catchment model prognostic variable related to the equilibrium soil moisture profile. This change was motivated by the results of Girotto et al. (2019), who similarly eliminated this variable from the EnKF state vector when assimilating SMOS Tb observations into the Catchment model and found good performance for surface and root zone soil moisture estimation without the skill degradation versus groundwater measurements that occurred when the catchment deficit was included in the EnKF state. The Version 4 SMAP Tb observations over land are warmer by~3 K on average compared to earlier SMAP versions (Peng et al., 2017(Peng et al., , 2018, and neither matches the version 620 SMOS Tb climatology. To address these calibration differences, Tb scaling parameters derived from SMOS observations are further corrected, separately for each grid cell, using the 3-year (April 2015 to March 2018) mean differences between SMOS and SMAP Tb observations.
Additional changes in Version 4  include revised values for the surface soil heat capacity, a revised formulation to compute the surface aerodynamic roughness as a function of leaf area index, a change in the model parameter that governs the model's snow accumulation and depletion curve , and more recent and improved ancillary data sets (Mahanama et al., 2015), including for topography (Slater et al., 2006;Verdin et al., 2007), vegetation height (Simard et al., 2011), and land cover (Bontemps et al., 2011). The latter change resulted in a slightly revised land mask in the Version 4 product compared to earlier versions, notably a better representation of lake areas in East Africa (not shown). Each of these changes should improve some aspect of the L4_SM algorithm, but examining their individual impact is beyond the scope of the present paper.

L4_SM Data and Processing
SMAP science data collection began on 31 March 2015 at~16:28 UTC. Throughout this paper, we use the 3-year period from 1 April 2015 to 31 March 2018 for validation and analysis, except for the runoff validation, which is restricted to Northern Hemisphere warm-season months (June-September of 2015-2017) to avoid periods when runoff is dominated by snowmelt. Because production of Version 3 data ended with Data Day 4 June 2018, the validation period is the longest period consisting of full years (or complete warm-season months) that also allows us to compare the skill and characteristics of the Version 4 product directly against those of the Version 3 data.
In the present paper, we use 3-hourly instantaneous surface and root zone soil moisture forecast, analysis, and uncertainty data from the L4_SM "analysis-update" files for Version 3 (Science Version ID Vv3030; Reichle et al., 2017a) and Version 4 (Science Version ID Vv4030; Reichle et al., 2018a). Furthermore, we use 3-hourly time-average total runoff data from the L4_SM "geophysical" files for Vv3030 (Reichle et al., 2017b) and Vv4030 (Reichle et al., 2018b). Besides these L4_SM data, we use soil moisture and runoff output from additional simulations with the L4_SM system in its model-only "Nature Run" configuration, that is, single-member model simulations without ensemble perturbations and without the assimilation of SMAP Tb observations (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017). Since the Nature Run simulations are also used for Catchment model development beyond the SMAP L4_SM system, they follow a different versioning system. Specifically, Nature Run versions NRv4.1 and NRv7.2 correspond to the Catchment model versions and forcing data used in Versions 3 and 4 of the L4_SM algorithm, respectively. To make the discussion below easier to follow, we hereinafter refer to NRv4.1 as NR3030 and to NRv7.2 as NR4030.

Soil Moisture In Situ Measurements and Validation Metrics
The primary validation reference for L4_SM soil moisture estimates are in situ measurements from SMAP core validation sites (or watersheds; Colliander, Berg, et al., 2017. These measurements serve as a fully independent validation data set; again, they were not used in the calibration of the Catchment model parameter α (section 2.2.2). Each core site has a locally dense network of sensor profiles, thereby mitigating the typical upscaling errors encountered in the validation of model output (Crow et al., 2012), which here comprises the L4_SM estimates representing the average soil moisture conditions in the 81-km 2 area of an EASEv2 grid cell. Specifically, in the present paper we use surface (root zone) measurements from 18 (6) different core sites located in a variety of climate regimes and land cover conditions (Table 1). Across these sites, we use surface (root zone) measurements from a total of 31 (12) so-called "reference pixels" (or grid cells) at the 9-km scale. For additional characteristics see supporting information Table S1. Reichle, De Lannoy, Liu, Ardizzone, et al. (2017; their section 3) provide further discussion of processing details. See Chan et al. (2018) for details on the definition of coarser, 33-km reference pixels for which results are referenced below.
In addition to the ubRMSE, correlation (R), and bias metrics computed in Reichle, De Lannoy, Liu, Ardizzone, et al. (2017), here we also compute the anomaly R metric. For this, monthly climatological values of in situ soil moisture at a given reference pixel are computed. For a given calendar month, we required at least 80 (3-hourly) measurements in each of the three calendar months falling within the 3-year validation period. Anomaly time series are then computed by subtracting the corresponding monthly climatological seasonal cycle from the raw data. The procedure is repeated for the L4_SM data (after cross-masking against the available in situ measurements to ensure compatible temporal support of the estimated climatology). The anomaly R metric is the correlation coefficient of the L4_SM and in situ anomaly time series. Averages of metrics are computed by first averaging the metrics of all references pixels at a given site and then averaging across all sites.
Statistical uncertainty in the ubRMSE, R, and anomaly R metrics is estimated using 95% confidence intervals, which are computed at each site based on the number of samples in the time series (with a correction for temporal autocorrelation). We refer to changes in metrics as statistically significant whenever the confidence intervals do not overlap. It is important to keep in mind that the confidence intervals are themselves uncertain and only provide practical guidance as to whether the skill differences are meaningful. Statistical uncertainty estimates for the bias are not provided because the upscaling error is considerably larger for the bias than for the second-order metrics .
As in Reichle, De Lannoy, Liu, Ardizzone, et al. (2017) and , we assume that the upscaled core site observations have much smaller errors than corresponding L4_SM estimates. The bias, ubRMSE, and correlation metrics obtained from directly comparing L4_SM estimates and in situ measurements are thus assumed to be close to the corresponding metrics against the (unknown) true soil moisture.
Since it is reasonable to assume that the errors that do exist in the in situ measurements are not correlated with errors in the L4_SM data, measurement uncertainty in the in situ data leads to some overestimation of the true ubRMSE and some underestimation of the true correlation . Consequently, the skill metrics presented here are conservative estimates of the actual skill.

Streamflow Measurements
For the validation of the L4_SM runoff estimates (section 4.3) we use streamflow measurements from 239 unregulated hydrological basins of intermediate size (2,000-10,000 km 2 ) within CONUS. These measurements are published by the U.S. Geological Survey (USGS) and were previously used to assess the skill of model and data assimilation output Kumar et al., 2014). The streamflow measurements were normalized by basin area to convert their units into millimeters per day. To account for the lack of runoff routing in the L4_SM system, the USGS streamflow measurements were smoothed using a 10-day running mean and directly compared to similarly smoothed runoff estimates from L4_SM and Nature Run data. The model estimates were spatially aggregated with consideration of the fraction of the irregular basin area contained within each 9-km EASEv2 grid cell. As mentioned above, only the warm season (June-September) is included in the runoff validation.

Data Assimilation Diagnostics
By routinely confronting model estimates with observations, a data assimilation system generates valuable internal diagnostics that can be used to assess its proper functioning. In the present paper, we investigate the statistics of the O-F Tb residuals, the observation-minus-analysis (O-A) Tb residuals, and the soil moisture analysis increments. We provide only a brief narrative summary of these diagnostics here and refer the reader to section 2c of  for a more detailed description and formal definitions. Because the L4_SM algorithm assimilates Tb observations, the O-F and O-A diagnostics are  (2018) Note. Further details are provided in the supporting information (Table S1).

10.1029/2019MS001729
Journal of Advances in Modeling Earth Systems in terms of Tb at the 36-km resolution of the assimilated observations, while the analysis increments are at the 9-km resolution of the Catchment model soil moisture. Note that the observations used to compute the O-F and O-A residuals are the climatologically rescaled Tb observations (section 2.1). It is also important to keep in mind that the O-F Tb residuals compare the model forecast Tb with the corresponding Tb observation before the observation is assimilated.
In a well-calibrated assimilation system, the long-term mean of the O-F residuals and increments at any given location is 0, indicating that there is no long-term net addition (or subtraction) of water or energy by the analysis. The time series standard deviation of the O-F residuals is a measure of the typical misfit between the model forecast Tb and the (rescaled) SMAP observations. Similarly, the time series standard deviation of the increments is a measure of the typical adjustment of the model forecast soil moisture at any given analysis time. A particularly valuable metric is the standard deviation of the O-F residuals after normalization with their simulated (expected) error standard deviation, which includes the prescribed observation error and the model forecast error resulting from the prescribed perturbation characteristics. This metric should be unity. If the metric exceeds unity, the prescribed errors underestimate the actual errors reflected in the O-F residuals. Conversely, if the standard deviation of the normalized O-F residuals is less than unity, the prescribed errors overestimate the actual errors. Finally, all second-order assimilation diagnostics are spatially averaged in quadrature. In this subsection we first illustrate the soil moisture skill for Version 4 relative to Version 3 at South Fork, Iowa, before turning to the average performance across all sites in section 4.1.2. The results at South Forkparticularly the changes seen in going from Version 3 to Version 4-are representative of typical results across all sites. Note that at some core validation sites (e.g., Little River, Georgia), the revised model resulted in greater differences (Balsamo et al., 2018, their Figure 8a;   Figure 6), whereas at other sites (e.g., Little Washita, Oklahoma), the model changes had less impact on the skill of the soil moisture estimates   Figure 5). For reference, supporting information Tables S2-S5 provide a complete listing of the performance metrics at all sites. Owing to the high clay content of the soils in this region, agricultural fields are increasingly equipped with tiles to improve drainage, a feature that is not captured in the Catchment model. There are three 9-km reference pixels at South Fork (Table 1), all showing similar relative performance of the Version 3 and 4 Nature Run and L4_SM data (supporting information Tables S2-S5). Here, we discuss pixel #16070911 because it has the most complete time series of measurements. Figure 1a shows surface soil moisture for Vv4030 (thin black lines), Vv3030 (thick light green lines), and the in situ measurements (red dots). Overall, the estimates from both L4_SM product versions track the in situ measurements reasonably well. The bias in surface soil moisture is 0.034 m 3 /m 3 for Vv4030, down from 0.053 m 3 /m 3 for Vv3030. The respective Nature Run time series, shown in Figure 1b, have bias values of 0.042 m 3 /m 3 (NR4030; thin blue lines) and 0.082 m 3 /m 3 (NR3030; thick cyan lines). The reduction in the surface soil moisture bias from the Version 3 to the Version 4 system stems primarily from the model change that reduced the upward recharge from the root zone to the surface excess reservoir (section 2.2.2), which results in generally drier (and thus less biased) surface soil moisture. Another consequence of this model change is the increased dynamic range of surface soil moisture in the Version 4 system compared to that of Version 3, which better reflects the wetting and drydown characteristics of the in situ measurements (Figures 1a and 1b) For root zone soil moisture (Figure 1c), Vv4030 exhibits a bias of 0.020 m 3 /m 3 , whereas Vv3030 was essentially unbiased (−0.003 m 3 /m 3 ). The ubRMSE, however, is better for Vv4030 (0.025 m 3 /m 3 ) than Vv3030 (0.031 m 3 /m 3 ), albeit not significantly. This ubRMSE improvement in the new version is also mirrored in the Nature Run results (Figure 1d), with a root zone soil moisture ubRMSE of 0.032 m 3 /m 3 for NR4030 compared to 0.044 m 3 /m 3 for NR3030. The smaller ubRMSE values in the Version 4 system are most likely a reflection of the reduced dynamic range in root zone soil moisture caused by the reduction in the upward recharge in the Catchment model. While the root zone soil moisture R value is slightly worse for Vv4030 (0.72) than Vv3030 (0.74), the anomaly R value is better (0.82 versus 0.76).

Validation Versus In Situ Measurements
A comparison of Figures 1a and 1b illustrates that the assimilation of SMAP Tb observations improves the surface soil moisture skill considerably in both L4_SM versions. For example, the ubRMSE improves from 0.066 m 3 /m 3 for NR4030 to 0.050 m 3 /m 3 for Vv4030. Moreover, the R value (anomaly R value) improves from 0.45 (0.41) for NR4030 to 0.70 (0.72) for Vv4030. The assimilation of SMAP Tb observations similarly improves the ubRMSE and R skill for root zone soil moisture at the South Fork site (Figures 1c and 1d).  Figure 2 summarizes the average performance metrics across all 9-km reference pixels from the 18 core validation sites (Table 1). Perhaps the most important validation result is that the Vv4030 surface and root zone soil moisture data meet the L4_SM accuracy requirement (ubRMSE ≤ 0.04 m 3 /m 3 ) established prior to the launch of SMAP (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017; their section 3a). This result is now based on three full years of core site measurements, compared to just 2 years as used by Reichle, De Lannoy, Liu, Ardizzone, et al. (2017). Specifically, as shown by the black circles in Figure 2a, the ubRMSE is 0.039 m 3 /m 3 for Vv4030 surface soil moisture and 0.026 m 3 /m 3 for Vv4030 root zone soil moisture. There is a slight (but not statistically significant) increase in the surface soil moisture ubRMSE from 0.038 m 3 /m 3 for Vv3030 (light green triangles) to 0.039 m 3 /m 3 for Vv4030 (Figure 2a). A similar increase is seen in the underlying modeling systems from 0.042 m 3 /m 3 for NR3030 (cyan plus signs) to 0.043 m 3 /m 3 for NR4030 (blue crosses; Figure 2a). As illustrated in the South Fork time series (Figure 1), the slight degradation in skill is related to the generally larger dynamic range in surface soil moisture in the Version 4 modeling system following the reduction in the upward recharge of the surface excess reservoir (section 2.2.2). In fact, the surface soil moisture time series standard deviation averaged across all reference pixels, which is 0.053 m 3 /m 3 for the in situ measurements, improves from 0.048 m 3 /m 3 for Vv3030 to 0.055 m 3 /m 3 for Vv4030 (not shown). The same model change results in a reduction of the dynamic range of root zone soil moisture in the Version 4 system, causing the slight decrease in root zone soil moisture ubRMSE from 0.032 m 3 /m 3 for NR3030 to 0.030 m 3 /m 3 for NR4030, and from 0.027 m 3 /m 3 for Vv3030 to 0.026 m 3 /m 3 for Vv4030 (Figure 2a). Very minor differences (likewise not statistically significant) between the two versions are also seen for the correlation metrics (Figures 2c and 2d).

Average Performance Across All Reference Pixels
Among the metrics considered here, the bias exhibits the largest change between the two product versions. Specifically, the average bias in surface soil moisture decreased considerably from 0.048 m 3 /m 3 for Vv3030 to 0.029 m 3 /m 3 for Vv4030, while the average bias in root zone soil moisture increased from 0.017 m 3 /m 3 for Vv3030 to 0.030 m 3 /m 3 for Vv4030 (Figure 2b). A different perspective is gained by looking at the average absolute bias, which decreased from 0.052 m 3 /m 3 in Vv3030 to 0.047 m 3 /m 3 in Vv4030 for surface soil moisture and remained unchanged at 0.048 m 3 /m 3 for root zone soil moisture (not shown). These relative changes in bias are mirrored in the underlying model estimates (Figure 2b), suggesting again that the changes in the bias of the L4_SM product are driven by the changes in the Catchment model, and more specifically, by the change in the upward recharge into the surface excess reservoir (section 2.2.2). Three issues complicate the interpretation of the bias estimates. First, the upscaling uncertainty in the bias estimates is much larger than that in the ubRMSE and correlation metrics , which makes it difficult to determine the significance of the bias changes between Versions 3 and 4. Second, estimates of model (Nature Run or L4_SM) bias are more sensitive than ubRMSE and correlation metrics to the vertical placement of the sensors. Specifically, the 0-5 cm modeled surface soil moisture is validated using sensors typically placed at 5 cm depth, and the 0-100 cm modeled root zone soil moisture is validated using sensors placed within 50 cm from the surface (see supporting information Table S1 for the depth of the deepest sensor used at each reference pixel; for further details see Reichle, De Lannoy, Liu, Ardizzone, et al., 2017).
Considering this mismatch in vertical support, perfect model estimates of surface (root zone) soil moisture should have a dry (wet) diagnosed bias versus the in situ measurements. For surface soil moisture, the reduction in the diagnosed wet bias from Version 3 to Version 4 is thus definitely an improvement. For root zone soil moisture, the increase in the diagnosed wet bias does not necessarily imply a degradation. Finally, the assumption of a constant Tb sensing depth in the L4_SM analysis likely causes bias in the L4_SM soil moisture estimates. Estimating the actual sensing depth and using this information in soil moisture assimilation systems are subjects of ongoing research (Lv et al., 2018(Lv et al., , 2019. Figure 2 further demonstrates that the Version 4 L4_SM estimates are better than estimates from the corresponding (model-only) Nature Run simulation, which again highlights the beneficial impact of the SMAP Tb data. For example, the surface soil moisture ubRMSE is 0.039 m 3 /m 3 for Vv4030, down from 0.043 m 3 /m 3 for NR4030, and the root zone soil moisture ubRMSE is 0.026 m 3 /m 3 for Vv4030, down from 0.030 m 3 /m 3 for NR3030 (Figure 2a). These ubRMSE improvements, however, are not statistically significant.
Statistically significant improvements from assimilating SMAP data are seen for the surface soil moisture R ( Figure 2c) and anomaly R (Figure 2d) values, which improve from 0.63 and 0.62, respectively, for NR4030 to 0.72 and 0.75, respectively, for Vv4030. The R and anomaly R metrics for Vv4030 root zone soil moisture are increased by similar margins over NR4030. However, the root zone improvements are not statistically significant, primarily because there are fewer reference pixels with root zone measurements and because root zone soil moisture is more strongly autocorrelated, resulting in larger confidence intervals (section 3.1). The average bias for surface soil moisture is only slightly worse for Vv4030 than for NR4030 (Figure 2b). In comparison, the root zone bias increased from 0.021 m 3 /m 3 for NR4030 to 0.030 m 3 /m 3 for Vv4030 ( Figure 2b). As noted above, this does not necessarily imply degraded performance. Overall, the impact of SMAP data assimilation on the skill versus in situ measurements is very similar in the Version 3 and 4 systems ( Figure 2).
Finally, the skill of the published Vv4030 data (Figure 2) closely matches that of the preliminary Version 4 data assessed by their Figures 8 and 9). This is true across all metrics investigated by , who also assessed the skill of surface soil temperature data, metrics computed at the 33-km reference pixel scale, and the performance versus measurements from more than 400 "sparse" network sites (not shown). We therefore refer the reader to  for additional Version 4 validation results. Taken together, the above results suggest that, apart from the changes in bias, there is no net improvement or degradation in the performance versus in situ measurements between the Version 3 (and thus, effectively, Version 2) and Version 4 soil moisture estimates.

Soil Moisture Uncertainty Estimates
The L4_SM surface and root zone soil moisture estimates are computed as the mean across the 24-member ensemble of Catchment model replicates (section 2). The L4_SM product also provides the ensemble standard deviation (or ensemble spread) of the analyzed surface and root zone soil moisture at the same 3-hourly, 9-km resolution. This ensemble spread is a measure of the uncertainty in the L4_SM soil moisture estimates.
In this section, we assess the quality of these uncertainty estimates by comparing them to the actual errors as computed versus in situ measurements. A related metric is the standard deviation of the normalized O-F Tb residuals, which is available (nearly) globally and will be examined in section 5.4. Figure 3 shows scatter plots of the (time-average) uncertainty estimates at the 9-km core site reference pixels (Table 1) versus the corresponding (time series) ubRMSE for surface and root zone soil moisture. In a perfectly tuned system, the two metrics should match; that is, the symbols in the scatter plot should be close to the 1:1 line. In the L4_SM system, however, nearly all of the symbols fall well below the 1:1 line, indicating a severe underestimation of the actual errors in the L4_SM product. For the surface soil moisture uncertainty estimates, this bias is −0.013 m 3 /m 3 for Vv4030 (Figure 3a), which is an improvement over the bias of −0.024 m 3 /m 3 for Vv3030 (Figure 3b). For the root zone soil moisture uncertainty estimates, the bias is −0.017 m 3 / m 3 for Vv4030 compared to −0.020 m 3 /m 3 for Vv3030. Since the ubRMSE values are very similar for both product versions (section 4.1), the bias in the uncertainty estimates is reduced from Version 3 to Version 4 mostly because the ensemble spread increased considerably in Version 4. The increase in spread is particularly pronounced for surface soil moisture because the reduction in the modeled upward recharge (section 2.2.2) also suppresses the relaxation of the perturbed surface soil moisture to equilibrium conditions. The overall magnitude of the actual errors is therefore better captured by the Vv4030 uncertainty estimates than by those from Vv3030. The R values for the scatter plots of Figure 3, however, are not significantly different from zero (not shown). This means that-for surface and root zone soil moisture in both product versions-the (time-average) uncertainty estimate for any given grid cell cannot usefully predict the actual error (i.e., the time series ubRMSE) at that grid cell relative to the error at other grid cells.
Finally, we also examined, separately for each 9-km reference pixel, the spread in the distribution of the actual instantaneous errors (computed versus the in situ measurements) as a function of the associated instantaneous uncertainty estimates (not shown). No relationship emerged, however, in either product version for either surface or root zone soil moisture. That is, for a given location, the time series of the uncertainty estimates do not contain useful information about the evolution of the typical magnitude of the actual errors.
Corresponding results for core site reference pixels at the 33-km scale and for sparse networks are qualitatively similar (not shown). It is encouraging that the revised Catchment model in the Version 4 system is better able to represent average actual errors in surface soil moisture. Some of the remaining low bias in the L4_SM uncertainty estimates may also be explained by the likely overestimation of the ubRMSE values due to errors in the in situ measurements themselves (section 3.1). Nevertheless, the above results and further evidence to follow in section 5 indicate that more work is needed to refine the error characterization in the L4_SM system.

Runoff
To date, the validation of the L4_SM product versus in situ measurements has focused on soil moisture and temperature (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017). Here, we investigate for the first time the warm season (June-September) skill of the L4_SM runoff estimates versus USGS streamflow measurements in 239 unregulated hydrological basins (section 3.2).
Key performance metrics for both product versions are shown in Figure 4. For Vv4030 runoff estimates, the bias versus observed streamflow is −0.095 mm/day (or −35 mm/year) on average across the 239 basins ( Figure 4a). The underestimation is modest across much of the Midwest and more severe in many northwestern basins and in a few basins in Missouri and Arkansas. There are also, however, many basins in the Great Plains, along the East Coast, and particularly in Florida, where Vv4030 runoff overestimates the observed streamflow. Nevertheless, the mean runoff in Vv4030 is less biased than that of Vv3030, for which the bias is −0.141 mm/day (or −52 mm/year; Figure 4b).
The overall underestimation of streamflow in the L4_SM system is consistent with that in the MERRA-2 reanalysis , which also uses the Catchment model to describe the land surface water and energy balance. During the development of the revised Catchment model for Version 4 of the L4_SM system, we tested the strategy of Koster and Mahanama (2012) to improve the Catchment model's representation of the soil moisture-evapotranspiration and soil moisture-runoff relationships. While this indeed improved the model's mean runoff, these modifications also degraded the soil moisture skill versus in situ measurements (even with soil moisture assimilation) and were therefore deemed unsuitable for use in the L4_SM system.
The runoff correlation skill for Vv4030 is 0.54 (Figure 4c). The highest skill values are found in the far northwestern basins and in most of the central and eastern United States. The lowest skill values occur primarily in the western mountains, the north central United States, and along the coast of Georgia and the Carolinas. There are only minor differences in the runoff correlation skill between Vv4030 and Vv3030, for which the average skill is minimally higher at 0.55 (compare Figures 4c and 4d). While both L4_SM product versions have the same runoff correlation skill, the corresponding Nature Run skills are somewhat different. Specifically, the improvement in Vv4030 over the corresponding Nature Run simulation is minimal on average (ΔR = 0.01), with larger changes seen only in a handful of basins along the Texas-Oklahoma border, along the Nebraska-Iowa border, and in Florida (Figure 4e). The average improvement in Vv3030 over NR3030 is slightly bigger (ΔR = 0.03), with larger and predominantly positive changes seen in about two dozen basins in the central United States (Figure 4f). That is, the revised Catchment model better captures temporal variations in runoff than did the old model version, while the assimilation of SMAP Tb observations contributes less to the runoff skill in Vv4030 than in Vv3030, suggesting that the model revisions and the Tb assimilation correct similar deficiencies. This is consistent with the fact that in the Version 4 system, the root zone soil moisture analysis increments are generally smaller than in Version 3 (section 5), owing to the removal of the catchment deficit variable from the EnKF state vector in Version 4 (section 2.2.2). The overall relatively small runoff improvements from data assimilation seen here are consistent with the findings of Mao et al. (2019) that soil moisture assimilation alone is not sufficient to substantially improve streamflow estimates.

Data Counts
The total number of SMAP L1C_TB observations that were assimilated into Vv4030 during the 3-year assessment period is shown in Figure 5a. This count includes all horizontal-and vertical-polarization observations from ascending and descending orbits. The global average count is 1,149 assimilated SMAP observations (per 36-km grid cell) during the 3-year period, or~1 per day, across all land areas excluding permanently glaciated and permanent open water surfaces. Outside of the high latitudes, the figure exhibits the typical diamond pattern associated with SMAP's regular 8-day repeat orbit. There are gaps in coverage in the vicinity of lakes (such as in northern Canada) and along major rivers (e.g., the Amazon), where the open water area fraction often exceeds 5% and the observations are excluded from assimilation (Reichle, De Lannoy, Liu, Ardizzone, et al., 2017) because the L4_SM system does not simulate the contribution of open water to the Tb, so that the O-F Tb residuals in these locations are not suitable for use in the L4_SM analysis. Data counts are lower than average in the high latitudes and in high-elevation and mountainous areas (including the Rocky Mountains, the Andes, the Himalayas, and Tibet). In these regions, the L4_SM analysis is limited to a much shorter warm (unfrozen) season (section 2.1). In the high latitudes, the polar orbit of the SMAP spacecraft results in more frequent revisit times, thereby balancing somewhat the lower counts resulting from the shorter warm season there. Note that despite the gaps in assimilation coverage, the L4_SM product provides soil moisture estimates everywhere, even if in some regions the L4_SM estimates rely solely on the information in the land model and forcing data.
Perhaps most importantly, Figure 5a illustrates that SMAP observations were assimilated into Vv4030 throughout most of Eastern Europe, the Middle East, and East Asia, where L-band RFI is common (Oliva et al., 2012). This was not possible in Version 2 of the L4_SM algorithm their Figure 4). At the time of Version 2, only SMOS Tb observations were available to address the bias in the modeled Tb climatology, and SMOS data cannot be used for this purpose in the RFI-impacted regions (section 2.2.1).
Next, Figure 5b shows that more SMAP observations were assimilated in Version 4 than in Version 3 primarily in the high latitudes and in high-elevation regions (e.g., the Tibetan and Andean plateaus). This increase in coverage in Version 4 is the result of using longer SMOS and SMAP Tb records to compute the (seasonally varying) L-band climatology (section 2), which consequently has fewer gaps in spatial and temporal coverage and thus supports the assimilation of more SMAP Tb observations. Conversely, the count of assimilated SMAP observations in Version 3 exceeds that of Version 4 across several distinct, circular-shaped regions, including southern Colombia and Venezuela, Egypt, southern China, and portions of Zambia, Angola, Namibia, and Botswana (Figure 5b). These elevated data counts reflect an error in the L4_SM algorithm software through Version 3 and comprise~2% of the global average count. In the affected locations, fore-and aftlooking Tb observations from the same half-orbit were inadvertently assimilated at two different L4_SM analysis times when the fore and aft observation times fell on opposite sides of the boundary between two subsequent 3-hr analysis windows. This error was corrected in the Version 4 algorithm software, which now always averages pairs of fore-and aft-looking Tb observations prior to assimilation. Consequently, the Version 4 data counts (Figure 5a) no longer show the spatial artifacts seen in the earlier versions their Figure 4). Finally, there are relatively large positive and negative differences in data counts in small, isolated patches across much of the globe (Figure 5b) because the improved land mask in Version 4 (section 2.2.2) changes the exact location and size of inland open water surfaces and glaciers and thereby impacts where Tb observations are assimilated. Furthermore, many isolated patches of higher data counts in Version 4, for example, across the southern United States, are again the result of better coverage of the L-band Tb climatology. Figure 5c shows the average number of analysis increments that the L4_SM algorithm generated per day during the validation period. The global mean is 0.78 for Version 4, which means that for a given location, there are approximately four increments applied every 5 days on average, either from an ascending or a descending overpass. The overall spatial pattern of the increments count follows that of the count of the assimilated observations (Figure 5a), with the highest counts in the subtropics where overpasses are more frequent than in the tropics and assimilation is not subject to the temporal gaps owing to frozen or snow-covered surface conditions. The spatial coverage of the increments is somewhat greater than that of the assimilated observations, especially near major water bodies, due to the spatial interpolation and extrapolation of the observational information in the distributed analysis update of the L4_SM algorithm. The differences in the number of analysis increments between Vv4030 and Vv3030 ( Figure 5d) and the reasons for them match those of the assimilated observations (Figure 5b).

Mean of O-F Tb Residuals and Soil Moisture Increments
Nonnegligible time series mean values of the O-F Tb residuals and soil moisture increments can reveal shortcomings in the L4_SM algorithm calibration (section 3.3). Figure 6a shows the global distribution of the time series mean of the O-F Tb residuals in Vv4030. The values are typically small and mostly range from −3 to 3 K. The largest regional bias values are found across the Southern Hemisphere. This is because the L4_SM algorithm calibration (that is, the removal of seasonally varying bias in the modeled Tb; section 2.1) relies heavily on the climatological consistency between the various surface meteorological forcing data sets used during the SMAP period and the retrospective (calibration) period, including (i) the SMAP-period forcing data from the 1/4°GEOS FP system (GEOS-5.13.0 through GEOS-5.17 during the 3-year validation period), (ii) the historic forcing data from the 1/2°MERRA-2 system (GEOS-5.12.4), and (iii) the gauge-based CPCU precipitation data (outside of Africa and the high latitudes). In the Southern Hemisphere, the GEOS data are based on far fewer conventional measurements than in the Northern Hemisphere and are thus more prone to the typical discontinuities in global atmospheric analyses that result from changes in the satellite observing system (Robertson et al., 2014). Similarly, the changes over time in the precipitation gauge network that supports the CPCU product are more severe in the Southern Hemisphere (Reichle, Liu, et al., 2017, their Figure 8), resulting in greater discontinuities in the L4_SM precipitation forcing there.
In the global average, the bias (absolute bias) in the O-F Tb residuals is just 0.02 K (0.56 K) for Vv4030 ( Figure 6a) and 0.12 K (0.58 K) for Vv3030 (not shown). Regionally, the difference map of the absolute bias ( Figure 6b) indicates that the bias is improved (closer to zero) in Vv4030 compared to Vv3030 in central North America, the Amazon, and in the Horn of Africa but worse in southern Africa.
The time mean values of the surface soil moisture analysis increments in Vv4030 vanish in the global average ( Figure 6c). Regionally, however, the 3-year mean (net) increments constitute a nonnegligible fraction of the water balance, with maximum (absolute) values reaching as high as 0.01 m 3 /m 3 in some regions. Central North America, eastern South America, southern Africa, and portions of central Asia experience net drying increments, whereas West Africa and Australia experience net wetting increments. This pattern roughly matches the 3-year mean bias in the O-F residuals (Figure 6a). The mean increments are much smaller in root zone soil moisture, where maximum (absolute) values of the net increments are within 0.001 m 3 / m 3 (Figure 6e).
The global average of the absolute values of the net increments for surface soil moisture in Version 4 is 0.0013 m 3 /m 3 (not shown), which is almost twice as large as that in Version 3 (0.0007 m 3 /m 3 ; not shown), with fairly uniform increases seen across the globe (Figure 6d). Conversely, the absolute net increments for root zone soil moisture are smaller in Vv4030 than in Vv3030 by a factor of 2 ( Figure 6f). Since the root zone layer is much deeper than the surface layer, the overall water imbalance is therefore much smaller in Version 4 than in Version 3.
In summary, Figure 6 suggests that the L4_SM algorithm is reasonably unbiased and that Version 4 is less biased than Version 3 (and thus the documented Version 2; . The system, however, could obviously still benefit from further calibration.

Standard Deviation of O-F Tb Residuals and Soil Moisture Increments
The time series standard deviation of the O-F Tb residuals and soil moisture increments measures the typical magnitude of the adjustments resulting from the assimilation of the SMAP Tb observations. The time series standard deviation of the O-F residuals ranges from a few Kelvin to around 15 K in Vv4030 (Figure 7a). The highest values are found in central North America, southern South America, southern Africa, the Sahel, portions of central Asia, India, and (particularly) Australia. These regions have sparse or modest vegetation cover and typically exhibit strong variability in soil moisture conditions. The O-F Tb residuals are generally smallest in more densely vegetated regions, including the eastern United States, the Amazon basin, and tropical Africa. Small values are also found in the high latitudes, including Alaska and Siberia, and in the Sahara Desert.
The global average of the standard deviation of the O-F Tb residuals is about 5.7 K in Vv4030 (Figure 7a), which is slightly reduced from about 5.9 K in Vv3030. The O-F standard deviations are reduced primarily in central North America and across midlatitude Eurasia (Figure 7b), where the Vv4030 (and Vv3030) values are among the largest. The smaller standard deviations in Vv4030 suggest that the Version 4 modeling system is better able to predict the observed Tb just prior to each analysis. This improvement is also reflected in the spatially averaged time series standard deviation of the O-A residuals, which is 3.7 K in Vv4030 and 4.0 K in Vv3030 (not shown). Note that the 3.7 K standard deviation of the Vv4030 O-A residuals relative to the respective 5.7 K value of the O-F residuals reflects the impact of the SMAP observations in the L4_SM system. (Note also that the global average O-F standard deviation of the preliminary Version 4 data used by , is 5.8 K. Owing to a processing error, the 5.1 K number quoted in their Figure 15 is wrong.) The time series standard deviation of the soil moisture increments is a measure of the typical magnitude of instantaneous increments. This metric is shown in Figures 7c and 7e for Vv4030 surface and root zone soil moisture increments, respectively. The spatial patterns mostly match that of the standard deviation of the O-F Tb residuals (Figure 7a). Typical values for surface soil moisture increments are on the order of 0.02-0.03 m 3 /m 3 in central North America, southern South America, southern Africa, the Sahel, portions of central Asia, India, and most of Australia (Figure 7c). In the same regions, root zone soil moisture increments are typically around 0.004 m 3 /m 3 (Figure 7e). Compared to Vv3030, the Vv4030 surface soil moisture increments in these regions are~50% larger (Figure 7d), while the root zone soil moisture increments are~50% smaller (Figure 7f). The smaller root zone increments are primarily the result of removing the catchment deficit model prognostic variable from the EnKF state vector (section 2.2.2). The larger surface increments are consistent with the increased dynamic range of surface soil moisture in Vv4030 resulting from the reduction in the modeled upward recharge of the surface layer from below (section 2.2.2). In densely vegetated regions, in particular in tropical forests, surface and root zone soil moisture increments are generally negligible (Figures 7c and 7e); in those areas, SMAP Tb observations are mostly sensitive to the dense vegetation and can thus provide only minimal corrections to the model forecast soil moisture.

Magnitude of Simulated Versus Actual Tb Errors
The standard deviation, ν, of the normalized O-F Tb residuals measures the consistency between the simulated (modeled) uncertainty and the actual Tb errors (section 3.3). If the perturbation parameters that determine the simulated error standard deviations are chosen such that the simulated errors are statistically consistent with the actual errors, this metric, shown in Figure 8a for Vv4030, should be unity everywhere. The global average of the metric in Version 4030 is 1.13, suggesting that, on average, the simulated errors only slightly underestimate the actual errors. The metric, however, varies greatly across the globe. Typical values are either too low or too high. In the Amazon basin, the eastern United States, tropical Africa, the eastern Sahara Desert, and portions of the high northern latitudes, values range from 0.25 to 0.5, and thus the actual errors there are considerably overestimated. Conversely, in central North America, southern South America, southern Africa, the Sahel, portions of central Asia, India, and most of Australia, values range from 1.5 to 3, meaning that the actual errors in these regions are considerably underestimated. This is consistent with the finding that the L4_SM uncertainty estimates are biased low versus ubRMSE values core validation sites (Figure 4 and section 4.2), which are located within these regions (Table 1).
The global pattern of low and high values of the normalized O-F standard deviation for Vv3030 is very similar to that of Vv4030, but Vv3030 has a larger global average of 1.28 (not shown). The larger average in Vv3030 results primarily from larger values (compared to Vv4030) in central North America and across midlatitude Eurasia, where ν is closer to the ideal value of 1 in Vv4030 than in Vv3030 (Figure 8b). Put differently, the underestimation of the actual errors was more pronounced in Vv3030 and is thus improved in Vv4030. This improvement stems primarily from the larger ensemble spread in the Vv4030 model forecast surface soil moisture (which implies a greater ensemble spread in model forecast Tb) and, to a lesser extent, from the slightly smaller errors in the Vv4030 model forecast Tb. However, more work is clearly needed to further improve the calibration of the input parameters that determine the model and observation errors in the L4_SM system. (Note that the global average of the normalized O-F standard deviation for April 2015 to March 2017 Version 2 data is 1.26 and not 1.0, as erroneously stated by , because of a processing error. The same error also applies to the 0.77 average quoted for the preliminary Version 4 data in Figure 15 of , for which the corrected value is 1.14.)

Summary and Conclusions
The SMAP L4_SM algorithm assimilates L-band Tb observations from SMAP into the NASA Catchment land surface model driven with gauge-based precipitation observations. The resulting L4_SM data product provides global, 3-hourly, 9-km resolution estimates of surface and root zone soil moisture and land surface fluxes. This paper describes the changes imposed in Version 4 of the L4_SM algorithm and quantifies their impact on the quality of the L4_SM data product. The most important changes in the L4_SM system are (i) the reduction of the modeled upward recharge into the surface layer from below under nonequilibrium conditions, (ii) the removal from the analysis state vector of the model prognostic variable (catchment deficit) that governs the equilibrium soil moisture profile, and (iii) the use of SMAP Tb records together with longer SMOS Tb records for algorithm calibration.
The model change (i) results in generally drier and more variable surface soil moisture in the most recent product (Vv4030) compared to earlier versions, whereas root zone soil moisture in Vv4030 is generally wetter and less variable. Consequently, the surface and root zone soil moisture time series are more distinct from each other in Vv4030 (Figure 1). Across all 31 of the 9-km reference pixels from 18 SMAP core validation sites (Table 1), the surface soil moisture bias is smaller by 0.018 m 3 /m 3 and the root zone soil moisture bias is larger by 0.013 m 3 /m 3 in Vv4030 compared to Vv3030 (Figure 2). When discrepancies in vertical support of the model and in situ data are considered, the reduced bias in surface soil moisture is definitely an improvement, while the increased bias in root zone soil moisture may not represent a degradation. In any case, because of these changes in the soil moisture climatology, Version 3 and Version 4 data should not be mixed in applications. The skill in terms of ubRMSE, R, and anomaly R versus in situ measurements is essentially the same in both product versions. Perhaps most importantly, with an ubRMSE of 0.039 m 3 /m 3 for surface soil moisture and 0.026 m 3 /m 3 for root zone soil moisture, the Vv4030 estimates meet the L4_SM accuracy requirement of 0.04 m 3 /m 3 on average across the 9-km core site reference pixels. As in previous versions, the Version 4 L4_SM product has better ubRMSE, R, and anomaly R skill than the corresponding Nature Run simulation, reflecting the added value of assimilating SMAP Tb observations.
The change in the Tb analysis (ii), in combination with the model change (i), resulted in~50% larger typical surface soil moisture increments and~50% smaller typical root zone increments in Vv4030 compared to Vv3030 (Figures 7d and 7f). Similarly, the 3-year mean surface (root zone) soil moisture increments are larger (smaller) in magnitude in Vv4030 (Figures 6d and 6f). Because the 5-cm surface layer is much thinner than the 100-cm root zone layer, the net water imbalance is smaller in Vv4030. Moreover, the typical O-F Tb residuals are slightly smaller in Vv4030 (Figure 7b). Finally, the simulated Tb error variance matches the variance of the actual O-F Tb residuals better in Vv4030 (Figure 8b). Taken together, these results indicate that the Vv4030 algorithm achieves the same product skill as the Vv3030 algorithm but with generally smaller O-F Tb residuals and smaller analysis increments. This is desirable because it means that-without loss of skill-the Version 4 L4_SM estimates are more consistent with the water-and energy-balanced process dynamics of the land surface model and less driven by nonphysical analysis increments.
The use of more extensive Tb records from SMAP and SMOS for L4_SM algorithm calibration (iii) greatly enhanced the locations and times for which SMAP Tb observations are assimilated in Version 4. SMAP data are now assimilated with near-global coverage, including in most of Eastern Europe, the Middle East, and East Asia, where SMAP observations could not be assimilated with the Version 2 algorithm. At the time of Version 2, SMOS data provided the only sufficiently long L-band record to support L4_SM algorithm calibration but could not be used in these regions because of strong and persistent RFI there. The longer SMOS and SMAP records used to calibrate Vv4030 also enhanced the assimilation coverage compared to Vv3030 across the high latitudes and in high-elevation regions ( Figure 5).
The present paper also evaluates, for the first time, the soil moisture uncertainty and runoff estimates provided in the L4_SM product. The Vv4030 surface and root zone soil moisture uncertainty estimates are too low compared to actual errors (computed versus in situ measurements), although the underestimation is less severe than in Vv3030 (Figure 3). Moreover, the spatial and temporal patterns in the uncertainty estimates do not reflect those in the actual errors. Across 239 unregulated basins in the contiguous United States, the Vv4030 runoff data underestimate streamflow gauge measurements by 35 mm/year, which is a 33% improvement over the 52 mm/year underestimation in Vv3030 (Figure 4). The long-standing runoff bias in the Catchment model was examined by Koster and Mahanama (2012); their suggested approach to improve the mean runoff would have degraded the model's soil moisture skill and was thus deemed unsuitable for use in the L4_SM algorithm. The runoff correlation skill is slightly improved in the Version 4 modeling system, but the skill of the Vv4030 and Vv3030 runoff estimates after SMAP data assimilation ends up being essentially the same.
In summary, the data assimilation system underlying Version 4 of the L4_SM product includes a number of model and analysis changes. These changes resulted in several small improvements in the Vv4030 product, including more realistic surface soil moisture dynamics and smaller total soil moisture increments, all without compromising the accuracy of the product versus in situ measurements. There are, however, still opportunities for further improvement, both in the skill of the L4_SM estimates and in the measurements and methods used for validating the product. For example, recalibrating the parameters for the model forcing and prognostics perturbations might result in a more optimal analysis and thus further improve the skill of the L4_SM product. Moreover, evapotranspiration estimates should be evaluated against remote sensing products or validated against flux tower measurements once sufficiently long tower records for the SMAP era become publicly available. Compared to the~1.6 million L4_SM model grid cells, however, there are precious few soil moisture core sites, streamflow gauges in unregulated basins, and flux tower sites. Our validation against in situ measurements should thus be supplemented with novel, spatially distributed evaluation approaches. For example,  used independent satellite soil moisture observations to quantify the impact of assimilating SMAP observations into the L4_SM system and demonstrate that the greatest improvements are in otherwise data-sparse regions.