An ensemble Kalman filter dual assimilation of thermal infrared and microwave satellite observations of soil moisture into the Noah land surface model


Corresponding author: C. R. Hain, Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20742, USA. (


[1] Studies that have assimilated remotely sensed soil moisture (SM) into land surface models (LSMs) have generally focused on retrievals from microwave (MW) sensors. However, retrievals from thermal infrared (TIR) sensors have also been shown to add unique information, especially where MW sensors are not able to provide accurate retrievals (due to, e.g., dense vegetation). In this study, we examine the assimilation of a TIR product based on surface evaporative flux estimates from the Atmosphere Land Exchange Inverse (ALEXI) model and the MW-based VU Amsterdam NASA surface SM product generated with the Land Parameter Retrieval Model (LPRM). A set of data assimilation experiments using an ensemble Kalman filter are performed over the contiguous United States to assess the impact of assimilating ALEXI and LPRM SM retrievals in isolation and together in a dual-assimilation case. The relative skill of each assimilation case is assessed through a data denial approach where a LSM is forced with an inferior precipitation data set. The ability of each assimilation case to correct for precipitation errors is quantified by comparing with a simulation forced with a higher-quality precipitation data set. All three assimilation cases (ALEXI, LPRM, and Dual assimilation) show relative improvements versus the open loop (i.e., reduced RMSD) for surface and root zone SM. In the surface zone, the dual assimilation case provides the largest improvements, followed by the LPRM case. However, the ALEXI case performs best in the root zone. Results from the data denial experiment are supported by comparisons between assimilation results and ground-based SM observations from the Soil Climate Analysis Network.

1. Introduction

[2] Major advances have recently occurred in the retrieval of soil moisture (SM) using satellite-based remote sensing platforms. Such observations have great potential for enhancing our ability to globally monitor agricultural drought [Bolten et al., 2010; Hain et al., 2009] defined as the lack of sufficient soil moisture for adequate crop and forage production. To date, satellite-based SM retrievals have been widely applied to active and passive microwave (MW) observations [e.g.,Jackson, 1993; Njoku and Li, 1999; Owe et al., 2001; Paloscia et al., 2001; Njoku et al., 2003]. The most extensively validated MW SM retrieval data sets have been generated using brightness temperature observations acquired from the Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E; July 2002 to October 2011). One widely distributed AMSR-E SM retrieval product is derived according to the Land Parameter Retrieval Model (LPRM) as developed by the VU University Amsterdam (VUA) and NASA [Owe et al., 2008]. Previous validation studies revealed LPRM correlations with in situ SM measurements of 0.6 to 0.8 [Wagner et al., 2007; Draper et al., 2008; Rudiger et al., 2009], suggesting the VUA product can serve as a suitable benchmark to represent possible skill of passive MW products in general.

[3] An alternative methodology for diagnosing SM exploits satellite data collected in the thermal infrared (TIR) atmospheric window (10 to 12 μm) used to estimate land surface temperature (LST). A number of past studies have attempted to develop strategies to exploit TIR LST for monitoring SM [Carlson et al., 1981; Price, 1983; Carlson, 1986; Taconet et al., 1986; Carlson et al., 1994; McNider et al., 1994; Gilles and Carlson, 1995; van den Hurk et al., 1997; Jones et al., 1998]. Here, normalized evapotranspiration (ET) data computed using the diagnostic Atmosphere Land Exchange Inverse (ALEXI) surface energy balance model using Geostationary Operational Environmental Satellites (GOES) TIR data can be related to SM conditions in the surface layer and root zone SM signal [Anderson et al., 2007, 2011; Hain et al., 2009, 2011]. SM deficits result in decreased ET rates and increased soil and canopy temperatures. The TIR-based methodology is advantageous because it provides surface SM (surface stress) information over sparsely vegetated pixels and, more importantly, root zone SM information over pixels with moderate to dense vegetation through the detection of vegetation stress. Additionally, satellite-based TIR sensors operate at spatial resolutions (100 m to 10 km for polar orbiting and geostationary imagers) that are higher than those of current MW sensors (>25 km).

[4] Hain et al. [2011] conducted a multiyear (2003–2008) intercomparison of LPRM and ALEXI SM retrievals and SM predictions from the Noah land surface model (LSM) [Chen and Dudhia, 2001; Ek et al., 2003] over the contiguous United States (CONUS). In general, it was found that TIR- and MW-based SM data sets provide complementary information about the current SM state. TIR methods can provide SM information over dense vegetation, a deficiency in current MW methods, while providing an additional independent source of information over low to moderate vegetation. Passive MW retrievals, in contrast, are more physically tied to surface SM, and are available under both clear and cloudy sky conditions.

[5] Based on these findings, Hain et al. [2011]argues that the complementary nature of MW and TIR-based SM retrievals can be effectively exploited within a data assimilation system assimilating both types of retrievals. Recent research in land surface data assimilation has demonstrated the value and feasibility of assimilating remotely sensed retrievals SM retrievals to improve LSM surface and root zone SM predictions [Walker and Houser, 2001; Margulis et al., 2002; Crow and Wood, 2003; Reichle and Koster, 2005; de Lannoy et al., 2007; Reichle et al., 2007; Kumar et al., 2008; Ryu et al., 2009; Draper et al., 2009]. Arguably, the most widely used assimilation technique in land surface data assimilation is the ensemble Kalman filter (EnKF). Work within the past decade has demonstrated that the EnKF is an efficient algorithm for assimilating remotely sensed retrievals into moderately nonlinear land surface models [see, e.g., Zhou et al., 2006].

[6] Many of the previous studies have focused on the assimilation of synthetic SM retrievals. However, few studies [Reichle et al., 2007; Bolten et al., 2010] have evaluated the assimilation of real remotely sensed SM retrievals on a continental scale, and fewer still have evaluated the merits of assimilating multiple types of remote sensing data [see, e.g., Li et al., 2010]. In this paper, we outline a procedure for single and dual assimilation of both MW and TIR-based SM retrievals into a continental-scale LSM, and attempt to demonstrate the value of dual assimilation over single assimilation of either SM retrieval product in isolation. The SM retrievals are assimilated into the Noah LSM and added value is quantified using a data denial procedure based on evaluating the ability of SM retrievals to correct for model SM errors incurred by using a degraded precipitation data set as input to Noah. Complementary results based on comparisons with soil moisture observations obtained with the USDA Soil Climate Analysis Network (SCAN) are also discussed. Since our general aim is enhanced agricultural drought monitoring, the analysis will focus on (Northern Hemisphere) warm season results obtained between April and October.

2. Soil Moisture Retrieval Techniques

[7] This section provides a brief description of the methodologies used to retrieve SM information using both TIR and MW remote sensing and highlights synergies between these two SM retrieval strategies.

2.1. Thermal Soil Moisture Retrieval (ALEXI)

[8] ALEXI was formulated as an extension to the two-source energy balance (TSEB) model ofNorman et al. [1995], which in turn was developed to address many of the documented challenges in monitoring surface energy fluxes using thermal remote sensing data. The two-source approximation treats the radiometric temperature (TRAD) of a vegetated surface as the weighted average of the individual temperatures of soil (Ts) and vegetation (Tc) subcomponents, partitioned by the fractional vegetation cover (fc)) apparent from the sensor view angle (ϕ), which can be expressed in a linearized form as

display math

where fc(ϕ) is derived from the observed leaf area index (LAI) using Beer's law:

display math

The TSEB separately balances energy budgets for the soil and vegetation components of the system, and also solves for total system fluxes of net radiation (RN), latent heat (LE), sensible heat (H), and soil heat conduction (G). In ALEXI, regional application is achieved by coupling the TSEB with an atmospheric boundary layer (ABL) model to internally simulate land-atmosphere feedbacks on near-surface air temperature [Anderson et al., 1997, 2007]. In this coupled mode, the ALEXI model simulates air temperature (Ta) at the blending height internally within the ABL model, ensuring that Ta is consistent with the modeled surface fluxes from the TSEB. The TSEB is applied at two times during the morning hours, at approximately 1.5 h after local sunrise (t1) and 1.5 h before local noon (t2). The ABL component of ALEXI relates the rise in Ta within the mixed layer over the time interval (t1 to t2) to the time-integrated influx ofH from the surface, thus providing a means for surface energy closure [McNaughton and Spriggs, 1986; Anderson et al., 1997]. By using a time differential temperature signal as input (the change in TRAD between t1 and t2), sensitivity to errors in absolute temperature retrieval is significantly reduced [Anderson et al., 1997]. A complete description of ALEXI is given by Anderson et al. [1997, 2007].

[9] The TSEB component of ALEXI partitions the total system LE flux into soil evaporation (LEs) and canopy transpiration (LEc) subcomponents. These fluxes, in turn are largely controlled by SM in the surface layer and the root zone layer, respectively. In general, wet SM conditions lead to increased LE (decreased H) and a depressed morning surface temperature amplitude, while dry SM conditions lead to decreased LE (increased H) and an increased diurnal surface temperature amplitude. Unlike a LSM, ALEXI is a purely diagnostic model lacking any prognostic state calculations. Therefore, in a data assimilation context, it can be mathematically treated as a retrieval algorithm which converts higher-level satellite products into a geophysical variable suitable for assimilation. In particular,Anderson et al. [2007] and Hain et al. [2009, 2011] outline a technique for simulating the effects of SM on LE estimates from ALEXI through a SM stress function based on a derived estimate of the fraction (fPET) between actual ET and potential ET (PET):

display math

[10] In many LSMs, a semiempirical linear or nonlinear relationship is defined for the relationship between fPET and SM to account for the effects of SM depletion on surface evaporative fluxes. Following Hain et al. [2011], we assume this relationship to be of the form

display math

where math formula is the retrieved SM value (in volumetric m3 m−3 units) based on ALEXI estimates of fPET, and θfc and θwp are the volumetric SM contents at field capacity and permanent wilting point, respectively. This linear relationship is similar to the formula for direct soil evaporation used in the Noah LSM [Chen and Dudhia, 2001] and is supported by previous validation work comparing math formulaand Noah LSM-predicted SM [Hain et al., 2011], passive MW surface SM retrievals [Hain et al., 2011], and ground-based SM observations acquired from the Oklahoma Mesonet [Hain et al., 2009]. Derived values of math formula from diagnosed evaporative fluxes (the inverse of the prognostic approach) should provide a reasonable estimate of true SM when SM is between θfc and θwp. However, ALEXI can lose sensitivity when SM is above θfc and below θwp.This upper region of low SM sensitivity, in particular, is likely to cause reduced retrieval accuracy outside of the growing season. Furthermore, it is assumed that the retrieval of SM from ALEXI provides an instantaneous estimate of current SM conditions. This assumption will be explicitly tested by examining the ability of assimilated ALEXI SM to correct for short-term precipitation errors in the data denial strategy presented below.

[11] While (4)requires specification of soil texture-specific values ofθfc and θwp, these local constants (usually obtained from a soil texture map) cancel out in the computation of grid-cell-based standardized anomalies describing deviations from mean conditions at each pixel and for each day of the study period. Standardized anomalies of math formula can therefore be computed directly from fPET without soil texture information, which is a benefit for global applications. Sensitivity to θfc and θwp is also circumvented via the common land data assimilation preprocessing step of scaling remotely sensed SM retrievals to match the climatology of modeled SM prior to assimilation [see, e.g., Reichle and Koster, 2004, 2005].

[12] Values of math formula have a distinctly different vertical support than surface SM retrievals acquired from MW sensors. In this study, it is assumed that the contributions to math formula from the surface and root zone soil layers are related to the observed fraction of green (e.g., actively transpiring) vegetation (fc; see Figure 1) derived from 8 day MODIS LAI composites [Myneni et al., 2002] using (1) at nadir view angle [Anderson et al., 2007]. For example, over bare soil, ALEXI LE is dominated by direct soil evaporation and reflects SM conditions in only the first several centimeters of the soil profile (θsfc), similar to an effective sensing depth of MW sensors. However, over dense to full vegetation cover (fc> ∼60%) under well-watered conditions, ALEXILE is predominantly partitioned to canopy transpiration, and soil evaporation becomes negligible in comparison. In this case, ALEXI LE is governed by SM conditions in the root zone (θrz). Between these two extremes, math formula should be interpreted as a composite of both surface and root zone SM. For purposes of assimilation into an LSM with multiple soil layers, we assume the relative influence of the surface and root zone moisture on the bulk ALEXI SM retrieval is linearly related to fc following

display math

Therefore, the two crucial differences between ALEXI and MW-based SM is that the vertical support of ALEXI SM (1) varies seasonally as a function of vegetation phenology and (2) is at least partially derived from root zone moisture conditions.

Figure 1.

Averaged over the period 2003–2008, MODIS-derived fraction of green vegetation cover (fc) for (a) April, (b) July, and (c) September.

[13] Since ALEXI SM retrievals are currently available only during the warm season months (April–October), our focus here is on their contribution to a drought monitoring system operating only during the agricultural growing season. Unresolved issues remain regarding the retrieval of SM from ALEXI surface fluxes during the cold season months: (1) an energy-limited ET regime is a more likely to be observed during the winter months, while water-limited ET estimates from summer are more directly related to SM, (2) the time window for the linear land surface temperature rise during the morning hours is shorter during the winter months, leading to increased uncertainty in the estimated surface fluxes used to retrieve SM, and (3) a thermal-based energy balance algorithm for snow-covered surfaces is required. Addressing these issues and expanding ALEXI SM retrieval outside the growing season is an area of ongoing research.

[14] Another important issue is the relative merit of assimilating (relatively high level) ALEXI SM estimates versus either lower-level ALEXI ET predictions or the TIR-based LST observations which underlie the ALEXI model. The assimilation of TIR-based ET observations into LSM has previously been attempted with some success [see, e.g.,Pipunic et al., 2008]. However, Anderson et al. [2011]demonstrate that anomalies in ALEXI ET/PET are better correlated with a suite of standard precipitation-based drought indicators than are anomalies in ET itself. The normalization by PET serves to better isolate impacts of soil moisture deficiencies on ET from those of variable radiation load. It is also important to note that the ALEXI-based SM retrievals are simply ALEXI-based ET predictions which have been normalized by ALEXI potential ET. Therefore, the difference between ALEXI SM and ET assimilations is confined to this normalization step. In this regard, there is a strong argument for assimilating ALEXI SM since it ensures consistency between the source of ET and PET information rather than attempting to pair an ALEXI-based ET retrieval with a potentially inconsistent PET calculation obtained from an LSM.

[15] Likewise, while past studies have focused on the direct assimilation of TIR LST into LSMs [see, e.g., Reichle et al., 2010; Bosilovich et al., 2007], there are potential shortcomings with this strategy and compelling reasons to seek alternative approaches. For instance, Crow et al. [2008]found the relationship between the root zone SM and LST to be both highly nonlinear and sensitive to the correct specification of land surface and micrometeorological variables. They describe a (plot-scale) case where the assimilation of a TIR-based SM proxy outperforms the direct assimilation of tower-based LST observations. Issues with LST assimilation are further compounded by known absolute biases in satellite-based LST measurements. These biases pose a challenge for direct LST assimilation into a LSM [see, e.g.,Bosilovich et al., 2007] yet are effectively circumvented by ALEXI, which is sensitive only to the time rate of change of midmorning LST [Anderson et al., 1997]. Finally, the assimilation of ALEXI SM also circumvents the difficult comparison between satellite and model-based LST. Such comparisons are complicated by the lack of consideration of look angle effects in LSMs (which strongly impact TIR-based LST) and ambiguities in LSM specification of thermal soil capacities [Holmes et al., 2012].

2.2. Microwave Soil Moisture Retrieval (LPRM)

[16] Active and passive MW remote sensing are the only remote sensing methodologies that allow for a truly quantitative and physically based retrieval of SM, through the exploitation of the large contrast between the dielectric constant of dry soil and water [Schmugge, 1985; Owe et al., 2001]. The thermal sampling depth for MW retrievals of SM is a function of observing wavelength, with longer wavelengths (such as L band; 1 GHz) representing depths of 2 to 4 cm, while shorter wavelengths (such as C band; 6.7 GHz) represent depths of only around 1 cm [Wang and Schmugge, 1980; Owe et al., 2001].

[17] Here we use AMSR-E based surface SM retrievals from the VUA as derived with the LPRM ( math formula [Owe et al., 2001, 2008]). This three-parameter MW retrieval model uses one dual-polarized channel for the retrieval of surface SM (in volumetric m3 m−3units) and vegetation water content. The 6.9 GHz C band AMSR-E channel is used except in cases of C band radio frequency interference (mainly in the United States and Japan). In these areas, LPRM falls back on the 10.6 GHz X band AMSR-E channel. A third parameter, effective surface temperature, is derived separately using the vertically polarized 36.5 GHz AMSR-E channel.

[18] Vegetation optical depth is parameterized based on the microwave polarization difference index (MPDI) according to Meesters et al. [2005]. Optical depth has been shown to be strongly related to the canopy density, and for frequencies less than 10 GHz, can be expressed as a linear function of vegetation water content [Jackson et al., 1982; Owe et al., 2008]. LPRM retrievals are masked for heavily vegetated (i.e., forested) areas of CONUS where the vegetation optical depth is greater than 0.8 [Owe et al., 2008] since surface MW emission in these areas is strongly affected by the overlying vegetation canopy. In this study, regions exceeding the optical depth threshold are defined as pixels that have <100 total retrievals during the entire assimilation period (April–October 2003–2008; Figure 2). All such areas are permanently removed from this analysis. Note, however, that diagnostic SM data can be retrieved in these densely vegetated areas using thermal remote sensing, although this added benefit of TIR SM is not quantified in this study. As with ALEXI, the spatial coverage of LPRM retrievals is also reduced during the cold season due its inability to retrieve soil moisture in cases of snow cover.

Figure 2.

Average repeat cycle in days for (a) ALEXI SM retrievals and (b) LPRM SM retrievals. White pixels in Figure 2b represent masked areas (subsequently screened from the entire analysis) where less than 100 LPRM retrievals are available during the analysis period (i.e., April to October during 2003–2008).

2.3. Synergy Between TIR and MW SM Methods

[19] While most studies have focused on MW methods in applications of remotely sensed SM, an important (and largely unexplored) synergy exists between TIR and MW SM retrievals [Li et al., 2010]. TIR methods provide relatively high spatial resolution (on the order of ∼100 m to 10 km), lower temporal resolution due to the limitation of TIR-based LST retrieval to clear-sky conditions (typical repeat cycles of 2–7 days for geostationary satellites;Figure 2a), and the potential for SM retrievals over a wider range of vegetation cover. Under conditions of sparse to moderate vegetation cover, MW methods provide SM retrievals at lower spatial resolutions (25–60 km) but higher temporal frequencies (on the order of retrievals every 1–2 days since imaging through cloud cover is possible; Figure 2b).

[20] To illustrate these differences and their subsequent impact on SM retrieval, Hain et al. [2011]conducted a multiyear CONUS-wide intercomparison during warm season months (April–October) between climatologically normalized SM anomalies from TIR (ALEXI) and MW (LPRM) retrievals with SM anomalies obtained from a Noah LSM simulation [Chen and Dudhia, 2001]. In general, ALEXI showed better agreement with Noah over regions with moderate to high fc, while LPRM was better correlated with Noah over parts of the western and central CONUS with low fc. An analysis of the relationship between correlations with Noah and the average warm season (May–September) fc (see Figure 3) demonstrated that, on average, MW SM produces better correlations with Noah when fc is less than about 60%. However, for fc > 60%, a condition found over roughly half of CONUS, ALEXI SM is more strongly correlated with Noah SM. This reveals an essential synergy between the two retrieval methods where each is best suited for a particular range of fc conditions. Furthermore, MW retrievals are sensitive to SM only in the surface layer (0 to 1 cm), while TIR retrievals provide information about SM integrated over the full root zone.

Figure 3.

Average time series anomaly correlation (r; 2003–2008) as a function of average April–October fc for (a) ALEXI/Noah and (b) LPRM/Noah. (c) The difference in ALEXI/Noah and LPRM/Noah average r. Black (gray) bars indicates where ALEXI/Noah r is greater (less) than LPRM/Noah r. Adapted from Hain et al. [2011].

3. Data Assimilation Methodology

[21] The modeling, data assimilation methodology and evaluation strategy applied in this study are described in sections 3.13.5.

3.1. Noah Land Information System Implementation

[22] All simulations were conducted with the one-dimensional Noah land surface model (version 2.7) [Chen and Dudhia, 2001; Ek et al., 2003] as implemented within the NASA Land Information System (version 5.0; LIS) [Kumar et al., 2006]. Noah is run with four soil layers with thicknesses 0–5 cm, 5–40 cm, 40–100 cm and 100–200 cm. The study domain for the data denial framework is the contiguous United States (24°N to 50°N; −125°W to −65°W) at a spatial resolution of 25 km, with a validation time period extending over the warm season months (April–October) of 1 April 2003 to 31 October 2008. Future applications of the proposed data assimilation methodology will be used in the development of an agricultural drought monitoring system, thus the analysis period is constrained to the warm season months (April–October). Additionally, ALEXI SM retrievals are currently not available between November and March; however, future versions of the ALEXI SM product will include retrievals over the entire year.

[23] Except for rainfall, Noah simulations use meteorological forcing from the North American Land Data Assimilation System (NLDAS) retrospective forcing data set [Cosgrove et al., 2003]. Input rainfall data sources used in the simulations are discussed in section 3.4. Vegetation cover information (i.e., fc) is updated in all simulations via 8 day MODIS leaf area index composites [Myneni et al., 2002] to maintain consistency with the ALEXI fPET signal. The model is integrated forward using a 30 min time step and daily SM fields (in volumetric m3 m−3 units) are output for analysis. SM from Noah layer 1 (0–5 cm) is used for the assessment of surface SM, and Noah layers 2 (5–40 cm) and 3 (40–100 cm) are combined using a layer thickness weighted average to obtain a nominal root zone SM estimate. In order to create realistic initial variability in SM states, the Noah SM profile is uniformly initialized and “spun up” between 1 January 1998 and 1 January 2003, and each simulation is initialized on 1 January 2003.

3.2. EnKF Implementation

[24] The EnKF is based on the generation of an ensemble of model state vectors that are each perturbed in a Monte Carlo manner to estimate the relative value of the uncertainty of model state forecasts [Eversen, 1994]. The ensemble can be created by applying perturbations to state variables, model parameters and/or forcing variables. At time steps where a nobs × 1 observation vector θt is available, the model state vector Vk (nstate × 1) for each ensemble member is updated following

display math

where ε is a nobs × 1 Gaussian noise vector with covariance R (defined below); the observation operator H is a matrix (nobs × nstate) that maps model states to observations; k is the index of the ensemble member; t is the current time step; and t and t+ represent the prior and posterior time steps, respectively. The (nstate × nobs) Kalman gain matrix Kt is calculated as

display math

where math formula is the error covariance matrix for forecasted states and math formula is the observation error variance matrix. The matrix math formula is estimated at time t from a statistical sampling of forecasted state variables around their ensemble mean. After ensemble updating occurs, individual model vectors are propagated until the next observation becomes available. Note that while our particular application is based on the use of a matrix H, the EnKF can also accommodate a nonlinear observation operator.

[25] Our particular application utilizes the NASA Land Information EnKF system [Kumar et al., 2008] and a 40-member ensemble to update all four Noah soil moisture states using both TIR and MW SM retrievals (i.e.,nobs = 2). The EnKF is run for the case nstate = 4, with the four states consisting of 0–5 cm, 5–40 cm, 40–100 cm, and 100–200 cm Noah SM predictions. In the discussion below, λn will be used to refer to the thickness of the nth Noah soil layer. Ensemble spread is created through both forcing perturbations and the application of direct noise to soil moisture states. Both perturbations are applied at each individual 30 min Noah model time step. Precipitation and radiation forcing perturbations are assumed to follow an AR(1) model with 1 day correlation scale and the statistical approach outlined in Table 1. Direct state perturbations are based on additive Gaussian noise applied at every 30 min model step with covariance S and an AR(1) model with a 3 h correlation scale. In particular, the 4 × 4 matrix Sis defined using the following three-part process: (1) presetting the first diagonal component ofS or S (in units of m6 m−6) for the 0–5 cm Noah SM layer, (2) obtaining the nth diagonal component of S by multiplying S with the ratio math formula (or 1, 1/72, 1/122 and 1/202, respectively; see Table 1), and (3) applying the cross-correlation matrix as listed inTable 1to obtain off-diagonal terms inS.

Table 1. Perturbation Parameters for Meteorological Forcing Inputs and Perturbation Parameters for Soil Moisture State Variablesa
Perturbation TypeAdditive or MultiplicativeSDVariance (m6 m−6)AR (1) Time Series Correlation ScaleCross Correlation
  • a

    For state variable perturbations, the variance of perturbations applied to layers 2–4 are computed by scaling the variance for layer 1 (S) as a function of relative layer thickness. Forcing perturbation standard deviations, AR(1) time scales, and cross-correlation parameters have been adapted fromReichle et al. [2007] and Kumar et al. [2009]

Forcing Variable Perturbations
PrecipitationMultiplicative0.5 1 day1.0−0.80.5 
Downward SW radiationMultiplicative0.3 1 day−0.81.0−0.5 
Downward LW radiationAdditive50 W m−2 1 day0.5−0.51.0 
State Variable Perturbations
Layer 1 SM (0–5 cm)Additive S3 h1.
Layer 2 SM (5–40 cm)Additive S(1/7)23 h0.
Layer 3 SM (40–100 cm)Additive S(1/12)23 h0.
Layer 4 SM (100–200 cm)Additive S(1/20)23 h0.

[26] During each Noah half-hourly time step, a set of mean zero, Gaussian perturbations are sampled (with second-order statistics consistent withS) and used to directly perturb Noah SM estimates. Note that all autoregressive time scales and cross-correlation magnitudes inTable 1 are similar to those specified by Reichle et al. [2007] and Kumar et al. [2009].

[27] The 2 × 4 matrix H in (7) is defined as the merger of two separate 1 × 4 vectors or math formula. Microwave-basedθLPRM retrievals are assumed to be consistent with the 0–5 cm Noah layer:

display math

where each vector element refers to a particular SM layer in descending vertical order. The computation of HALEXI is more complicated due to the assumed sensing depth of ALEXI, which varies as a function of vegetation. In this case, HALEXI weights the contribution from the Noah surface and root zone SM layers as a function of fc (see equation (5)). As a result, HALEXI is assumed to be of the form

display math

Rescaled retrievals (see section 3.3below) of ALEXI and LPRM are each assimilated at a time period shortly after their specific observation time: LPRM retrievals from descending AMSR-E overpasses are assimilated at 1:30 AM local solar time, and ALEXI retrievals are assimilated at 12:00 PM local solar time. Therefore the two observation types are never simultaneously assimilated. In practice, this is achieved by populatingHLPRM or HALEXIwith zeros for time steps at which the other observation type is being assimilated. Note that this approach for collapsing a dual assimilation approach into a single observation assimilation case is only valid when all off-diagonal terms inR are zero (see below). Only LPRM descending retrievals are assimilated since they have been found to be superior to corresponding ascending retrievals over CONUS [Crow et al., 2010].

[28] The R matrix in (7) is defined via

display math

We estimate RALEXI and RLPRM (in units of m6 m−6) as a function of fc based on comparisons between anomaly correlation coefficients obtained from LRPM, ALEXI and Noah (Figure 3). In particular, the assumed relationship between both RALEXI and RLPRM (in units of m6 m−6) and fc in Figure 4 is consistent with the qualitative findings of Hain et al. [2011] and Figure 3 that (1) LPRM SM retrievals are relatively superior for 0.20 < fc < 0.60, (2) ALEXI SM retrievals are relatively more reliable for fc > 0.60, and (3) the accuracy of both ALEXI and LPRM retrievals generally increase with decreasing fc down to fc ∼ 0.2. Therefore, Figure 4 represents a minimally complex (but still somewhat subjective) fit to available qualitative information concerning relative variations in the accuracy of ALEXI and LPRM SM retrievals as a function of fc. Note that the spatial scale of comparison applied by Hain et al. [2011] is consistent with the resolution of this analysis (25 km), and that both RALEXI and RLPRM are defined in the model space to be consistent with the preprocessing of retrievals discussed below.

Figure 4.

Based on qualitative results in Figure 3, the parameterization of RALEXI and RLPRM as a function of fc.

3.3. EnKF Error Parameterization

[29] The optimal application of an EnKF requires (in part) an accurate representation of S and R. A number of strategies exist for obtaining appropriate parameterizations of S and R. For instance, Crow and van den Berg [2011] presented an approach based on exploiting a triple collocation analysis to parameterize R, and subsequently utilizing normalization innovation statistics to calibrate S. Here (unitless) normalized filter innovations υ are derived as

display math

where angle brackets denote ensemble averaging and the dummy variable X refers to the particular observation type (i.e., “LPRM” or “ALEXI” SM) being assimilated at time t. If the assumptions underlying the application of Kalman filter are met (e.g., additive Gaussian error, optimal error specification, and a linear model), then the specification of correct S and R should yield a υ time series with a temporal variance of one [Mehra, 1971]. Past results suggest that the moderate level of nonlinearity found in LSMs does not invalidate this constraint [Reichle et al., 2008; Crow and Reichle, 2008]. Therefore, following Crow and van den Berg [2011], (11) is used to constrain the scalar S (which in turn is used to build the entire S matrix). Utilizing fc-based values ofRALEXI and RLPRM derived from Figure 4, six separate simulations are run for each assimilation case, based on a range of prespecified values of S. Optimal values of S at each model grid point are selected based on which of these simulations produces normalized innovations with a variance closest to unity. It should be noted that no attempt was made to optimize the relative magnitude or the cross correlation of perturbations applied to individual Noah SM layers.

3.4. Preprocessing of TIR and MW SM Retrievals

[30] ALEXI math formulais retrieved once per day in cloud-free pixels over CONUS at a spatial resolution of 10 km, using LST data from the GOES East and West Sounder instruments, while math formulais available daily under all-sky conditions at a spatial resolution of 25 km. Thus, to provide a spatially consistent comparison, the 10 km math formularetrievals have been spatially aggregated to 25 km resolution by computing an area-weighted average of all valid 10 km pixels which overlap a given 25 km pixel.

[31] As stated above, SM retrieval data sets and SM predictions from LSMs can exhibit large differences in their climatological statistics. The effect of these biases can be alleviated by employing rescaling approaches to translate retrievals from one SM data set (e.g., ALEXI or LPRM) into the climatology of another data set (in this case, the open-loop simulation using the Noah LSM forced with TRMM precipitation, see section3.5 [Reichle and Koster, 2004]). Here a technique using the first two statistical moments (mean and variance) is used to linearly rescale math formula and math formula retrievals. Data set–specific values of climatological means (μ) and variances (σ) for each day of the year are calculated for all three data sets based on a 28 day centered moving sampling window and 6 years of both math formula and math formula retrievals (2003–2008). Using these statistics, SM retrievals from remote sensing sources (ALEXI and LPRM) are linearly rescaled to provide a volumetric SM content measurement consistent with the LSM SM states VLSM obtained from an open loop Noah simulation. To maintain consistency with the assumed sensing depth of each retrieval method, the first two statistical moments of VLSM are calculated after the application of (8) and (9). Therefore μLSM[LPRM] and σLSM[LPRM] are the sampled mean and standard deviation of HLPRMVLSM, and μLSM[ALEXI] and σLSM[ALEXI] are the sampled mean and standard deviation of HALEXIVLSM. Additionally, the first two statistical moments of math formula and math formula are computed, where math formula and math formula are the sampled mean and standard deviation of math formula and math formula and math formula are the sampled mean and standard deviation of math formula. All mean and standard deviation values are sampled for each day in the analysis using a 28 day centered window to remove potential bias from the seasonal cycle of SM. Therefore, raw retrievals of LPRM and ALEXI (i.e., math formula and math formula) are rescaled using

display math
display math

The rescaled observations θLPRM and θALEXI are then used to construct the observation vector math formula which is subsequently assimilated into Noah using the EnKF approach described in section 3.2.

[32] A potential issue in such rescaling is specifying an appropriate sampling window length—particularly for the case of θALEXI retrievals obtained at relatively low temporal frequencies. Larger window lengths reduce sampling error but run the risk of failing to resolve key seasonal dynamics. For example, as indicated by (9), the appropriate interpretation of θALEXI varies seasonally as a function of fc. Therefore the use of a larger window potentially confounds retrievals where, for example, θALEXIreflects a surface signal versus a mixed surface/root zone signal. To directly examine this trade-off, the relative advantages of long versus short sampling windows were quantified by calculating the cross-correlation between ALEXI and Noah SM anomalies calculated after rescaling using 28 day, 90 day, and 365 day sampling windows in(12) and (13). In areas of the United States exhibiting strong vegetation seasonality (e.g., the upper Midwest and Northeast), the correlations associated with a 28 day sampling window were substantially larger than those calculated within a 90 or 365 day window, while correlations over regions lacking strong seasonal variations in vegetation were largely unchanged regardless of size of the sampling window. This suggests that the benefit in resolving seasonal vegetation variability associated with a shorter sampling window length (i.e., 28 days) generally outweighs the disadvantage of including a reduced number of samples. Nevertheless, EnKF results will be presented for both a 28 day and a 60 day sampling window case for comparison purposes.

3.5. Evaluation Strategy

[33] The quantification of model improvement achieved through assimilation of SM retrievals is often hampered by the lack of in situ SM validation data sets that possess adequate spatial and vertical density over the CONUS [Crow and Zhan, 2007]. As a result, the bulk of our analysis is based on a data denial framework [Draper et al., 2011; Bolten et al., 2010; Lakshmi, 2000]. Following Bolten et al. [2010], this framework evaluates the ability of assimilated math formula and math formularetrievals to correct for errors in Noah SM associated with the use of a degraded precipitation forcing data set. A Noah simulation forced with high-quality precipitation data is used as a benchmark for measuring the relative improvement (or degradation) in LSM SM accuracy resulting from data assimilation. The data denial framework consists of five separate LSM simulations.

[34] 1. The control (CONTROL) simulation is forced with a high-quality precipitation data set, in this case, the NLDAS precipitation forcing, a gridded, quality-controlled rain gauge observation data set [Higgins et al., 2000] at 0.25°, resolution over CONUS based on daily CPC Cooperative precipitation data and radar estimates for temporal disaggregation.

[35] 2. The open loop (OLP) simulation is forced with a “lower-quality” precipitation data set, in this case, the TRMM 3B42RT [Huffman et al., 2007], a real-time, satellite-only precipitation product that is not corrected with any ground-based observations of precipitation. For example,Crow and Bolten [2007]showed that this lack of gauge correction introduces root-mean-square errors of 5 to 10 mm d−1over the central CONUS. Additionally, each precipitation data set (NLDAS and TRMM 3B42RT) has a unique climatology and bias. Therefore, to alleviate any inconsistencies between rainfall products, the long-term bias is corrected in the TRMM data prior to its use in the OLP simulation.

[36] 3. The ALEXI analysis assimilates only θALEXI retrievals into the OLP simulation forced with TRMM 3B42RT precipitation.

[37] 4. The LPRM analysis assimilates only θLPRM retrievals into the OLP simulation forced with TRMM 3B42RT precipitation.

[38] 5. The dual analysis assimilates both θALEXI and θLPRM retrievals into the OLP simulation forced with TRMM 3B42RT precipitation.

[39] The ALEXI, LPRM, and dual analyses are then compared to the daily Noah CONTROL simulation to evaluate their ability to filter random rainfall errors degrading the OLP simulation.

[40] While this data denial approach effectively circumvents the lack of adequate large-scale SM data sets for direct validation, it is important to emphasize that it is limited in two respects. First, it implicitly assumes that all modeling error originates from uncertainty in rainfall forcing. Other types of modeling errors may produce different data assimilation results which will not be captured. In addition, the magnitude and even the sign of assimilation-based impacts seen in our experiment are sensitive to the relative quality of the lower-quality precipitation data set, which may not represent an appropriate baseline for all conditions. However, there is no a priori reason to suggest that either of these limitations will preferentially enhance the assimilation of eitherθALEXI or θLPRM. Therefore, we assume that this data denial system provides reliable information concerning the relative benefits of assimilating θALEXI versus θLPRM(or either versus both). In addition, data denial system results will be supplemented by verification obtained via direct comparisons against ground-based soil moisture observations (acquired from the USDA SCAN network) within isolated 0.25° pixels where such observations are available. In this way, certain qualitative results from the data denial study can be verified using a more direct evaluation approach.

4. Results

4.1. Comparison of Single Versus Dual Assimilation

4.1.1. Data Denial Approach

[41] The main purpose of the data denial validation strategy is to quantify improvement (or degradation) in Noah-based surface and root zone SM predictions associated with the assimilation of ALEXI and LPRM SM retrievals over a large regional domain (e.g., CONUS) in which the spatial sampling of in situ SM observations is inadequate for traditional direct validation against ground-based observations. The assimilation impact is quantified based on the ability ofθALEXI and θLPRM retrievals derived from (12) and (13)to correct growing season (April to October) errors in the OLP (lower-quality precipitation) simulation, using the CONTROL (high-quality precipitation) as a source of validation information. All assimilation impacts are assessed only over pixels where an adequate number of ALEXI and LPRM retrievals are available (i.e., pixels which are nonwhite in bothFigures 2a and 2b). Figure 5shows the root-mean-square difference (RMSD) in SM estimates (relative to the CONTROL simulation) in the 0–5 cm surface layer (Figure 5a) and 5–100 cm root zone layer (Figure 5b) SM predictions from the OLP simulation. As expected, the satellite-based precipitation product (in this case, TRMM 3B42RT) used to force the OLP simulation introduces large SM differences in both soil layers when compared to the CONTROL simulation. The magnitude of these differences varies over the study domain. For example, SM error over the central and eastern CONUS are similar for surface and root zone SM, while in the western CONUS, surface SM errors are larger than those observed in the root zone. This is likely the result of the increased coupling between surface and root zone SM in areas of moderate to dense vegetation, a condition observed during the warm season in the central and eastern CONUS. The largest RMSD values are found in the central CONUS (0.03 to 0.06 m3 m−3), while RMSD values are generally lower in the drier and less vegetated regions of the western CONUS (0.01 to 0.04 m3 m−3). As described above, the RMSD values in Figure 5 are used as the baseline for estimating subsequent improvement (or degradation) associated with the assimilation of θALEXI and/or θLPRM retrievals.

Figure 5.

Root-mean-square difference (RMSD; in volumetric m3 m−3 units) for OLP SM predictions validated against CONTROL SM predictions for (a) 0–5 cm and (b) 5–100 cm Noah soil layers.

[42] Figure 6shows time series of domain-averaged (CONUS) RMSD (with respect to the CONTROL case) for the 0–5 cm surface soil layer in the OLP simulation (black) and each of the three assimilation cases (ALEXI (red), LPRM (green), and dual (blue)) between April 1 2003 and 31 December 2008. Note that assimilation occurs only between April 1 and October 31 of each year and all results are averaged only over pixels in which adequate amounts of both ALEXI and LRPM SM retrievals are available (Figure 2). On average, all three of the assimilation strategies reduce the growing season RMSD in comparison with the OLP case. For the surface layer, spatially averaged RMSD for the warm season months (April–October) for the OLP, ALEXI, LPRM, and dual cases are 0.046 m3 m−3, 0.032 m3 m−3 (30% reduction versus OLP), 0.030 m3 m−3 (35% reduction), and 0.028 m3 m−3 (39% reduction), respectively. The single assimilation of θLPRM produces slightly greater improvement in surface layer SM than does the single assimilation of θALEXI, while assimilation of both θALEXI and θLPRM (dual case) produces an additional small (but consistent) improvement over the single assimilation of either SM product. The differences between the ALEXI and LPRM assimilation cases are likely related to the fact that θLPRM is a direct retrieval of surface SM and provides better temporal sampling than θALEXI (Figure 2).

Figure 6.

Domain-averaged time series of daily 0–5 cm SM RMSD (m3 m−3) for the OLP (black), ALEXI (red), LPRM (green), and dual (blue) cases during the 2003–2008 analysis period.

[43] Spatially averaged RMSD values are also computed for root zone SM, in this case specified as a layer-thickness-weighted average of the 5–40 cm and 40–100 cm Noah soil layers (Figure 7). During the growing season (April to October), the general magnitude of RMSD improvements (in terms of % improvement) within the root zone layer are similar to those in the surface layer. In particular, the RMSD for the OLP, ALEXI, LPRM, and dual cases are 0.041 m3 m−3, 0.028 m3 m−3 (32% reduction versus the OLP case), 0.034 m3 m−3 (17% reduction), and 0.033 m3 m−3 (20% reduction), respectively. However, in this case the single assimilation of θALEXI leads to a largest average improvement in root zone SM. Therefore, unlike surface SM results, dual assimilation case does not, on average, produce the most improvement in root zone SM. This issue will be discussed further in section 4.2. While a longer time series would enhance the statistical power of these comparisons, the strong year-to-year consistency of results presented inFigure 6 and 7 implies that these relative results are qualitatively stable and likely representative of a longer analysis period.

Figure 7.

Domain-averaged time series of daily 5–100 cm SM RMSD (m3 m−3) for the OLP (black), ALEXI (red), LPRM (green), and dual (blue) cases during the 2003–2008 analysis period.

[44] As discussed in section 3.4, all EnKF results are based on the use of a 28 day sampling window for rescaling parameters applied in (12) and (13). In order to examine the potential benefits of a longer window (to reduce sampling uncertainty in the rescaling process), all three data assimilations were repeated for a larger (60 day) window (not shown). This change led to a slight increase in CONUS-averaged, April to October RMSD (on the order of 0.001 m3 m−3 for most cases), suggesting that a 28 day window represents a reasonable approach for balancing the conflicting needs to minimize sampling uncertainty while capturing seasonal dynamics (see section 3.4).

[45] Figure 8 shows a spatial analysis of RMSD differences between the OLP case and each of the three assimilation cases for the surface layer (0–5 cm). The areas of color (red) shading for each assimilation experiment show the magnitude of RMSD improvement over the OLP case, while gray shading denotes regions of degradation caused by the assimilation. The largest surface layer improvement for all three assimilation cases is observed over the central and western CONUS. The LPRM and dual assimilation cases yield marginally larger improvement in surface layer SM across this region than is observed in the ALEXI case. Although a majority of the study domain exhibits surface SM RMSD improvement greater than 0.01 m3 m−3 relative to the OLP baseline case, regions of the eastern CONUS exhibit only minor improvement and/or degradation (e.g., RMSD differences between −0.01 m3 m−3 and 0.01 m3 m−3). However, no areas of degradation greater than 0.01 m3 m−3 are observed anywhere in CONUS.

Figure 8.

Differences in 0–5 cm SM RMSD (m3 m−3) between the OLP case and the (a) ALEXI, (b) LPRM, and (c) dual data assimilation cases (DA-OLP). Red (gray) colors indicate pixels where the assimilation case decreased (increased) RMSD compared to the OLP case. Black pixels denote areas masked due to insufficient LPRM SM availability (seeFigure 2b).

[46] Somewhat different spatial patterns are observed in the root zone RMSD results plotted in Figure 9. For example, relative to surface layer results in southeast CONUS, all three assimilation cases show more widespread areas of large (i.e., greater than 0.01 m3 m−3) root zone RMSD improvement. The ALEXI assimilation case is particularly successful in this region, highlighting its ability to provide meaningful root zone SM information in densely vegetated areas. However, large-scale root zone degradation is also observed in each assimilation case within desert regions of Southwest CONUS. This is in contrast to the surface zone results inFigure 8, where consistent improvement is seen in this same region. Potential reasons for this degradation in root zone SM results (collocated with surface layer SM improvement) are discussed below in section 4.2.

Figure 9.

Differences in 5–100 cm SM RMSD (m3 m−3) between the OLP case and the (a) ALEXI, (b) LPRM, and (c) dual data assimilation cases (DA-OLP). Red (gray) colors indicate pixels where the assimilation case decreased (increased) RMSD compared to the OLP case. Black pixels denote areas masked due to insufficient LPRM SM availability (seeFigure 2b).

[47] The statistical significance of differences between the OLP case and each assimilation case can be assessed by applying F tests on a pixelwise basis. This test assumes that all samples are independent in time. However, significant autocorrelation can be observed in daily SM, which decreases the degrees of freedom in SM time series. To account for this, we assumed a decorrelation length scale of 28 days for SM soil moisture time series. As a result, the total number of assumed degrees for freedom is roughly estimated by dividing the total number of days in the analysis by 28. Based on this assumption, in the surface layer, the dual case showed statistically significant improvement (based on a 1σ threshold) for about 87% of the pixels with valid results in Figure 8, while for the ALEXI and LPRM cases, 76% of valid pixels showed significant improvement. For all three cases, less than 1% of pixels exhibited statistically significant degradation relative to the OLP case. For the root zone, the ALEXI case demonstrated significant (at the 1σ level) improvement for 64% of pixels with valid results in Figure 9, while the LPRM and dual had lower percentages of 47% and 51%, respectively. Additionally, each case had a higher percentage of pixels which showed statistically significant degradation at the 1σ level. For the root zone, 8%, 13% and 12% of pixels showed significant degradation in the ALEXI, LPRM, and dual cases, respectively, while for the surface layer all cases had less than 1% degradation.

[48] Spatial patterns in the RMSD results shown in Figures 8 and 9 can be explained in part by variations in vegetation cover. Figure 10 shows the 0–5 cm RMSD (Figure 10a) and 5–100 cm RMSD (Figure 10b) as a function of mean April–October fc for the baseline OLP case and all three assimilation cases. For the surface layer (Figure 10a), all three assimilation cases show improved RMSD relative to the OLP case over the entire range of fc. The dual case exhibits the lowest RMSD when fc is between 0.0 and 0.75, while ALEXI exhibits a slightly lower RMSD when fc is greater than 0.75. As expected, as fc increases above 0.65, LPRM RMSD increases and it becomes the least accurate assimilation case. Results in the root zone (Figure 10b) show that all three assimilation cases perform poorly compared to the OLP case for fc < 0.20 due to the root zone SM degradation observed in the sparsely vegetated regions of southwest CONUS (see Figure 9). In general, ALEXI is the most accurate case for fc greater than 0.20, while LPRM and dual perform only slightly worse than ALEXI over that range. Nevertheless, all three show large improvements relative to OLP for fc > 0.20.

Figure 10.

Averaged RMSD (m3 m−3) in (a) 0–5 cm SM and (b) 5–100 cm SM predictions obtained from various data assimilation cases as a function of average April–October fc. RMSD values are computed based on 5% fc bins.

4.1.2. Comparison With In Situ SM Observations

[49] In addition to the data denial analyses discussed above, OLP, ALEXI, LRPM, and dual SM were also compared to ground-based observations from the USDA Soil Climate Analysis Network (SCAN). Time series anomaly correlations between modeled and measured SM were computed for each observation site over the April–October, 2003–2008 analysis period. As with the modeled data, anomalies in the in situ observations were obtained by subtracting climatological (2003–2008) SM values computed within a 28 day moving average window centered on a particular day of the year. For the analysis of surface SM anomalies, 5 cm soil moisture observations from 82 SCAN sites were used. Analysis of root zone soil moisture was conducted using soil moisture observations at 40 cm depth obtained the 52 SCAN sites identified inFigure 11.

Figure 11.

Map of all SCAN soil moisture observations used in the analysis. Circles represent sites where both surface (SFC) and root zone (RZ) observations are available, and asterisks represent sites where only SFC are available.

[50] All three assimilation cases showed improvement in time series anomaly correlation with observations relative to the baseline OLP case for both the surface and root zone. Averaged over all available observation sites for the period of April–October (2003–2008), the average surface SM time series anomaly correlation (r) for the OLP simulation was r = 0.23, while the ALEXI, LPRM and dual assimilation results yielded r = 0.38, 0.44 and 0.49, respectively. This behavior mirrors the relative performance indicated in the data denial experiments, with dual yielding the largest improvement in RMSD with respect to the CONTROL case for the surface layer, followed by LPRM and ALEXI. For the root zone, the OLP baseline case exhibited the lowest average anomaly correlation (r = 0.21) while the ALEXI case yielded higher correlations (r = 0.50) than either the LPRM and dual cases (r = 0.45 and r = 0.48, respectively). Again, this is consistent with relative results in section 4.1.1 obtained using our data denial approach.

4.2. Potential Role of Faulty Error Specification

[51] Assuming a perfectly functioning data assimilation system, the additional assimilation of any new observation type should never degrade the long-term average error in the quantities of interest. However such idealized systems are almost impossible to construct in real operational settings. In this analysis, the failure of the dual case to outperform the ALEXI case in root zone results (seeFigures 7 and 10) represents a clear departure from an ideal system. One possibility is that the poor performance of the dual case in the root zone is tied to inaccurate specification of S and Rin the EnKF. In particular, since the failure of the dual case is observed only in the root zone (and not in the surface zone), it appears that the EnKF is not adequately combining information derived from the vertical extrapolation of surface LPRM SM retrievals with (generally) deeper ALEXI SM retrievals. This explanation is plausible given that off-diagonal terms were not optimized in the construction ofS. These off-diagonal terms have a strong impact on the vertical propagation of superficial surface LPRM soil moisture observations to depth [Li et al., 2010]. In a truly real data analysis, error in the vertical physics or the soil parameterization of the Noah model would also hamper the ability of the dual data assimilation case to constrain root zone soil moisture [Kumar et al., 2009]. However, such impacts are effectively removed here by the use of the Noah model in both the CONTROL and data assimilation cases. It is interesting to note that, at a CONUS-averaged scale, these apparent problems emerge only when examining whether the dual case improves upon the ALEXI case and are not severe enough to prevent the LPRM case from generally improving upon the OLP case. This suggests that the parameterization demands of the dual assimilation case are more stringent than for the single assimilation LPRM case.

[52] In addition, regional degradation relative to the OLP case occurs in root zone results for the ALEXI and LPRM data assimilation cases. In particular, an increase root zone RMSD is observed over arid and semiarid regions of southwest CONUS (Figure 9). Interestingly, this degradation occurs simultaneously with a regional improvement in surface zone results (compare ALEXI and LPRM cases in Figures 8 and 9). As discussed above, the apparent inability of the EnKF to vertically extrapolate good surface SM results into the deeper root zone suggests an inaccurate specification of off-diagonal terms inS describing the statistical relationship between model ensemble perturbations applied to various vertical SM layers. As described in section 3.2, perturbations to the SM states of Noah are applied based on the specification of a cross-correlation matrix between soil layers. For these simulations, the assumed model error cross-correlation matrix (Table 1) dictates that perturbations applied to the surface layer are cross-correlated with those applied to the root zone layer. As this area is sparsely vegetated, both ALEXI and LPRM retrievals provide primarily surface SM information, and rely on the accurate specification of these off-diagonal terms inS(in combination with Noah model vertical physics) in order to accurately extrapolate surface SM information into the deeper root zone. In semiarid areas during the dry warm season, surface and root zone layers in Noah are strongly decoupled in the CONTROL simulation and many rainfall events causing SM changes in the surface layer are not associated with an increase in root zone SM. In the assimilation, however, nonzero off-diagonal terms inS virtually ensure that updates are made to the root zone when a surface layer observation is available, regardless of whether such updating is consistent with the vertical model physics captured in the CONTROL run. Since any deviation from the benchmark established by the CONTROL is interpreted as error, such updating may lead to degraded data assimilation results within our data denial framework.

[53] It is also worth noting that root zone OPL errors in the arid and semiarid Southwest are small relative to other parts of CONUS (see Figure 5b). As a result, the misspecification of Sis more likely to produce degraded assimilation results there. Nevertheless, a well-parameterized EnKF should, at a minimum, avoid degrading OPL results in these areas.

5. Discussion and Conclusions

[54] Here, a data denial framework was used to quantify the ability of ALEXI (TIR) and LPRM (MW) SM retrievals (θALEXI and θLPRM) to correct for errors in precipitation forcing (TRMM 3B42RT) in an open-loop (OLP) simulation as compared to a control (CONTROL) simulation forced with a higher-quality, gauge-based precipitation data set. In this way, the evaluation is meant to quantify the ability of remotely-sensed SM retrievals to compensate for the impact of errors in satellite-based precipitation products and the subsequent impact of these errors on efforts to monitor agricultural drought during the growing season.

[55] Data assimilation results are based on three separate EnKF cases: (1) assimilating only TIR-based ALEXI SM, (2) assimilating only MW-based LPRM SM, and (3) assimilating both ALEXI and LPRM SM. Averaged over the study domain during the 2003 to 2008 growing seasons, all three data assimilation cases (i.e., ALEXI, LPRM, and dual) exhibited improved surface (0–5 cm) and root zone (5–100 cm) SM estimates relative to the OLP simulation. On average within the surface zone, the dual data assimilation case provided the greatest improvement (39% reduction in RMSD relative to OLP), followed by the ALEXI and LPRM cases (reduction in RMSD relative to OLP of 30% and 35%, respectively). In the root zone however, the ALEXI case was superior (32% reduction in RMSD relative to OLP) to both the LPRM and dual cases (reduction in RMSD relative to OLP of 17% and 20%, respectively). Similar trends were identified when examining percentage of pixels that had statistically significant improvement in RMSD relative to OLP. In the surface layer, the dual case had the highest percentage of pixels with statistically significant improvement (87%), followed by LPRM (76%) and ALEXI (76%). In the root zone layer, the ALEXI case had the highest percentage (64%), followed by dual (51%) and LPRM (47%). Finally, similar relationships were found via the direct comparison of OLP, LPRM, ALEXI and dual case results to ground-based measurements available within the USDA SCAN network, based on temporal correlations between simulated and observed time series anomalies.

[56] Given a well-functioning EnKF, one would expect the dual assimilation case to outperform each single assimilation case (ALEXI or LPRM) in both the surface and root zone layers. However, such well-behaved performance was observed only in surface zone results and not in the root zone (see, e.g.,Figure 10). In addition, in some regions of CONUS, all of the data assimilation cases failed to improve upon the OLP root zone results. In particular, a contiguous area of degradation was observed over sparsely vegetated areas in the southwest CONUS. As discussed in section 4.2, these suboptimal results may be associated with poorly parameterized model errors since no attempt was made to find optimal cross-correlation terms relating soil moisture perturbations applied to Noah surface and root zone layers (Table 1), nor was any attempt was made to tune the magnitude of applied forcing perturbations (Table 1). However, other sources of error could not be definitively ruled out. A more sophisticated approach for estimating S, RALEXI, and RLPRM would be to implement an adaptive filtering approach where innovation statistics are used to simultaneously constrain S, RALEXI, and RLPRM during the online EnKF analysis. However, such approaches have yet to be widely applied in real data assimilation cases, and preliminary studies using synthetic methodologies have identified a number of unresolved implementation issues [Crow and Reichle, 2008]. Consequently, a full adaptive filtering implementation is left for future research.

[57] Additional future work that could potentially lead to additional improvements of a dual data assimilation system include (1) refinement of the ALEXI SM climatology using better quality control systems to reduce the effects of cloud contamination, (2) attempting to exploit the higher spatial resolution afforded by TIR methods, (3) quantification of the impact of differences in repeat cycles between ALEXI and LPRM, (4) development of more sophisticated methodologies for the specification of model and retrieval error specification, including a focus on improved understanding of the off-diagonal terms inS, (5) improvement of the specification of observations operator for ALEXI (addressing the assumed weighting used for the ALEXI signal based on fc) and LPRM (addressing the assumed mismatch between 1 cm AMSR-E sensing depth and 0–5 cm Noah soil layer), (6) assessing the impact of assimilating both ascending and descending LPRM SM retrievals, and (7) application and preparation for current and future L band MW sensors (the European Space Agency Soil Moisture and Ocean Salinity (SMOS) mission, launched in 2009; NASA's Soil Moisture Active Passive (SMAP) mission, which is to be launched in 2014). In addition, improvements in SM predictions due to ALEXI assimilation in regions of very dense vegetation cover, blanked in the current analyses due to unavailability of MW SM retrievals, could also be quantified.


[58] This work was supported by a grant from the NASA Terrestrial Hydrology Program (NNX06AG07G) and NOAA grants OAR-CPO-2009-2001430 and OAR-CPO-2011-2002561. We wish to thank four anonymous reviewers for helping to significantly improve the quality of the manuscript. USDA is an equal opportunity provider and employer.