An objective methodology for merging satellite- and model-based soil moisture products


Corresponding author: M. T. Yilmaz, Hydrology and Remote Sensing Laboratory, USDA, BARC-WEST, Beltsville, MD 20705, USA. (


[1] An objective methodology that does not require any user-defined parameter assumptions is introduced to obtain an improved soil moisture product along with associated uncertainty estimates. This new product is obtained by merging model-, thermal infrared remote sensing-, and microwave remote sensing-based soil moisture estimates in a least squares framework where uncertainty estimates for each product are obtained using triple collocation. The merged anomaly product is validated against in situ based soil moisture data and shows higher correlations with observations than individual input products; however, it is not superior to a naively merged product acquired by averaging the products using equal weighting. The resulting combined soil moisture estimate is an improvement over currently available soil moisture products due to its reduced uncertainty and can be used as a standalone soil moisture product with available uncertainty estimates.

1. Introduction

[2] Agricultural drought is commonly defined as the presence of insufficient soil water available for adequate crop and forage production. As a result, agricultural drought is typically monitored using soil moisture anomaly products generated during the growing season at weekly or monthly time scales [Anderson et al., 2011]. Consistent estimates of soil moisture for drought monitoring can be obtained in various ways; for example through remote sensing or through modeling of the land-surface water budget. However, these estimates are not perfect and each method has characteristic uncertainties [Koster et al., 2009; Jackson et al., 2010]. Therefore, it is frequently desirable to merge independent realizations to obtain a more accurate unified estimate. Theoretically, the more independent data that are merged, the larger the reduction in the noise of the merged product and past work, in a variety of fields, has demonstrated the benefits of combining geophysical estimates obtaining from a range of observational and modeling resources into a single hybrid estimate (see, e.g., Ebert [2001] or Xie and Arkin [1996]). However, it is important to weigh the products based on their relative accuracies in order to minimize errors.

[3] Data assimilation using Kalman filter-based methodologies is one of the most commonly used approaches for merging different products while taking into account their relative uncertainties. Kalman filter theory can be shown to be a recursive solution of the least squares problem [Sorenson, 1970] for an appropriate time frame. The solution of Kalman [1960]enables propagation of the best estimate and its errors in time, whereas in ordinary least squares the solution is assumed constant in time. The goal for both solutions is obtaining an estimate with minimized error variance. However, both solutions also require prior knowledge of product uncertainties to obtain an optimal analysis. In land data assimilation studies, Kalman filter-based methodologies often rely on ad-hoc statistical descriptions of errors in assimilated observations, model parameters, or model forcings. As a result, the relative weighting applied to modeled and observed soil moisture information by a land data assimilation is arguably subjective and does not necessarily reflect an optimized integration of independent data sources [Crow and Van Loon, 2006; Reichle et al., 2008]. Therefore, our goal here is the development of an objective merging that is less dependent on uncertain, user-defined error assumptions.

[4] Triple collocation is a method that objectively obtains error estimates for three or more independent products. This method was originally introduced by Stoffelen [1998] and Caires and Sterl [2003]to estimate near-surface wind speed errors, and later applied in many hydrological applications. In particularScipal et al. [2008]estimated the errors in passive microwave-, active microwave-, and model-based soil moisture products.Miralles et al. [2010]estimated errors in passive microwave-, station-, and model-based soil moisture products and validated the error estimates using watershed scale station-based data.Dorigo et al. [2010]evaluated the uncertainties of global passive microwave-, active microwave-, and model-based soil moisture products.Hain et al. [2011]estimated errors in passive microwave-, thermal infrared-, and model-based soil moisture realizations and found passive microwave- and thermal infrared-based soil moisture products yield complementary soil moisture information.Parinussa et al. [2011b]estimated errors in passive microwave-, active microwave-, and antecedent precipitation index-based soil moisture products, then compared the triple collocation-based errors with data assimilation-based error estimates [Crow, 2007], and found very high correlation between the error estimates of these two techniques.

[5] It is relatively straight forward to use triple collocation as a means to estimate observation error covariance parameters required by land data assimilation systems [Crow and van den Berg, 2010]. However, applying triple collocation to the parameterization of modeling error within a Kalman filter is more difficult, and—to our knowledge—has not yet been attempted. In particular, a Kalman filter requires covariance information regarding background errors that vary in time according to flow conditions and/or the frequency of assimilated observations. In contrast, triple collocation provides only a temporally constant value of error representing a continuous, nonupdated integration of the forecast model. Likewise, triple collocation provides only information about the magnitude of modeling errors and not their source and/or impact on the full model forecast covariance matrix. Such information is often important in land data assimilation applications [Crow and Van Loon, 2006]. While potentially resolvable, these challenges suggest that the initial use of triple collocation-based modeling errors should be based on a relatively simple least square framework as opposed to a full data assimilation approach.

[6] Here we propose an objective methodology that does not require any user-defined error parameters as input. In this approach, different anomaly soil moisture products are merged in a least squares framework that relies on product error estimates obtained from triple collocation. Specifically, we have merged weekly, growing-season soil moisture anomaly products obtained from thermal remote sensing via the atmosphere land exchange inversion (ALEXI [Anderson et al., 2007a]) model, microwave remote sensing via the land parameter retrieval model (LPRM [Owe et al., 2008]), and the Noah [Ek et al., 2003] land surface model. The least squares framework is also able to provide estimates of uncertainty in the merged product, which could be used to augment existing drought products. It should be stressed that the presented approach is particularly well suited for agricultural drought applications commonly based on the detection of growing-season soil moisture anomalies at weekly to monthly time scales. The proposed methodology can also potentially add value to soil moisture products derived from current and future soil moisture satellite missions (i.e., SMOS (soil moisture and ocean salinity); SMAP (soil moisture active passive)) by optimally merging them with various independent soil moisture estimates.

[7] The general least squares solution is briefly reviewed in section 2. Section 3 reviews the triple collocation equations, section 4 introduces the input data, section 5 presents the results, and section 6 summarizes our conclusions.

2. Least Squares Merging

[8] Least squares is an estimation theory that has been used in numerous studies since its initial applications by Gauss [1963] and Legendre [1806]. Kalman [1960] has shaped the theory into its current form [Sorenson, 1970], which can be used to describe the basis of most modern data assimilation techniques [Talagrand, 1997]. The least squares solution has been derived in many studies; a brief review is given here to provide background for our proposed merging algorithm.

[9] Assume we have three independent realizations (Sx, Sy, and Sz) of a variable along with their respective zero-mean errors ( inline image, inline image, and inline image) and error variances ( inline image, inline image, and inline image). These realizations can be represented by

display math
display math
display math

where St is the true value of the variable and inline image is a measure of the relation between these realizations and the assumed truth. Although in some cases inline image, this is not a requirement; the least squares solution can be obtained as long as all realizations relate to the truth with the same coefficient. The desired merged estimate Sm is obtained as

display math

where wx, wy, and wz are the relative weights of Sx, Sy, and Sz, respectively. To have an unbiased merged estimate ( inline image), it is required that inline image. Given these constraints, the ultimate goal is to derive these weights as functions of the error variance of the three realizations and to find the error variance estimate of the merged product. The error estimate of the merged product is obtained as inline image and the solution we seek minimizes a selected cost function (J) in a mean squares sense. Here we select J to be the error variance ( inline image) of the merged estimate in the form

display math
display math

Setting inline image and inline image in equation (6) and solving for wx, wy, and wz, we obtain

display math
display math
display math

The solution is intuitive since the weights are proportional to the uncertainty of the other two estimates. If two realizations are available instead of three, then the least squares solution can be applied similarly with a cost function selection of

display math

with weights

display math
display math

3. Error Estimation Using Triple Collocation

[10] For a given set of realizations, optimal merging based on the least squares technique described here requires an estimate of the relative uncertainties of the input products. In this study, the error variances of these estimates are obtained using triple collocation. In particular, we apply assumptions underlying the triple collocation approach of Stoffelen [1998],

display math
display math
display math

where St is the true soil moisture anomaly with variance inline image; S1, S2, and S3 are three soil moisture anomalies related to truth with inline image, inline image, and inline imagecoefficients, with zero-mean errorse1, e2, and e3, and with error variances inline image, inline image, and inline image respectively. Here inline image does not imply the truth has errors, but rather it is the true soil moisture variance in time.

[11] In general, we may expect products to have differences in their relationship with the truth. Therefore, rescaling is required to ensure each data set has a consistent relationship with the assumed truth via reconciling equations (13)(15) and equations (1)(3). We rescale these realizations using

display math
display math
display math

where inline image, inline image, and inline image are the rescaled realizations and inline image, inline image, and inline image are the relative errors of the realizations with variances inline image, inline image, and inline image. Rescaled values are related to the initial estimates as inline image, inline image, and inline image, where c1, c2, and c3 are the rescaling factors. By arbitrarily selecting any of the data sets as a reference (by setting inline image), and assuming error covariances between products are zero and the representativeness errors described by Stoffelen [1998] are zero, these factors can be found as

display math
display math
display math

where overbar indicates averaging in time. In general, the accuracy of the rescaling in matching the data sets with the reference data set or the truth is tied to the strength of the linear relationship between the products assumed in equations (1)(3). When compared to more nonlinear systems, highly linear systems are expected to have smaller sampling errors and require fewer observations to obtain similar levels of accuracy. Also, note that this rescaling step can be performed independently for each area or time period of interest, hence it may vary spatially and/or temporally.

[12] Assuming the errors of the products are independent from each other and from the truth, and assuming a mutual linear relationship between these estimates and the true soil moisture, the final error variances of the rescaled realizations are found as

display math
display math
display math

[13] Once these error variance estimates are obtained, they are inserted into equations (7)(9) (or (11) and (12)) to obtain least squares weights at each time step. While the obtained error variance estimates are constant in time, these weights are not. When all three realizations are available, the least squares solution for three data sets (equations (7)(9)) is used; when two out of three realizations are available then the least squares solution for two data sets (equations (11) and (12)) is used. Accordingly, equation (5) or (10) is used to calculate the reported error variance of the merged product at each time step.

[14] In the triple collocation system of equations presented above ((13)(15)), there are seven unknown parameters ( inline image, inline image, inline image, inline image, inline image, inline image, and inline image) constrained by three equations. By selecting a reference data set (i.e., assuming inline image) and rescaling other data sets to this reference, our goal in equations (16)(18) becomes seeking a solution for four unknown parameters ( inline image, inline image, inline image, and inline image), rather than seven. This system, with four unknowns and three equations, is still under-determined. We are able to solve for these four unknowns only after assuming all error related cross covariances are zero. Without this assumption, the inline image term remains in the system of equations as an unknown.

[15] However, in the absence of any other independent information, we cannot decompose the inline image estimate into estimates of inline image and inline image; meaning we can never know the true inline image. Different reference data set selections result in different inline image as well as different inline image, inline image, and inline image. Therefore the triple collocation equations described above provide only the relative accuracy of these realizations (how the noisiness of one product compares against that of another product), whereas the absolute values of the error variances themselves are dependent on the reference data set selection. While triple collocation is not ideal for capturing absolute errors, its representation of relative errors between input products is independent from the arbitrary choice of a single data set as a scaling reference. Fortunately, this type of relative information is all that is required in order to determine optimal least squares averaging.

4. Data

4.1. Input Data Sets

[16] The study area is selected as the contiguous United States (CONUS), between 125° and 67°W and 25° and 50°N. Consistent with our stated focus on the monitoring of agricultural drought during the growing season, daily data sets are obtained from 2002 to 2010 for the growing season months of April through October. Large-scale soil moisture information is currently available from three independent sources: retrievals derived from thermal-infrared remote sensing, retrievals derived from microwave remote sensing, and estimates derived from water balance models forced with micrometeorological observations. Here, all three sources of soil moisture data are used as input into the triple collocation analysis. In particular, this study utilizes an ALEXI energy balance model soil moisture proxy obtained from thermal infrared remotely sensed images, LPRM soil moisture estimates that are obtained from passive microwave remote sensing images, and Noah land surface model soil moisture simulations. The methodology was applied at a grid space of 0.25°; data sets at higher native resolution have been aggregated to this common grid. All data sets were averaged to weekly composites from their native temporal resolution.

[17] ALEXI is a two-source (soil and vegetation) model that solves for the latent and sensible heat components of the surface energy balance by taking advantage of measurements of morning land-surface temperature rise obtained by geostationary satellites [Anderson et al., 1997; Mecikalski et al., 1999; Anderson et al., 2007a]. Using the obtained fluxes, a strong relationship was found between the ratio of actual to potential evapotranspiration fluxes (fPET) and the fraction of available water (faw) in the soil column [Anderson et al., 2007a, 2007b, 2011]. Following these studies, Hain et al. [2009] proposed unique relationships between fPET and faw, evaluated this relation using soil moisture observations from the Oklahoma Mesonet Network, and showed ALEXI has valuable information about faw, which serves as a proxy for the root-zone soil moisture in the vegetated areas. Here we utilize ALEXI-basedfPET retrievals following the approach described by Hain et al. [2011]. Note that ALEXI fPETrepresents a surface–root-zone merged soil moisture estimate, yielding a proxy estimate of water availability for evapotranspiration (i.e., water in the surface layer for bare soil evaporation, and water in the root zone for canopy transpiration). ALEXIfPETvalues have been aggregated from 10 km to 0.25° resolution. Given its reliance on the thermal remote sensing-based observations, current ALEXI retrievals are limited to clear-sky conditions, which is a major limitation to data availability particularly over the northern US. To fill the entire grid, it is necessary to average dailyfPETfields over time to create time composites. Detailed information about the derivation of an ALEXI-based soil moisture proxy can be found in the above mentioned studies.

[18] Noah (version 2.7) LSM data were obtained from the global simulations generated using Global Land Data Assimilation System (GLDAS [Rodell et al., 2004]) forcing data. The Noah model employs a coupled surface water and energy balance, computing multilayer soil moisture as the storage component of the soil water balance. More details about these Noah simulations can be found at These hourly simulations were performed at 0.25° spatial resolution, hence spatial aggregation was not needed. Since the ALEXI soil moisture proxy has mixed vertical support over sparsely and densely vegetated surfaces, a Noah soil moisture estimate was computed that mimics this vertical support. The second-layer (10–40 cm depth) and the third-layer (40–100 cm depth) soil moisture simulations were averaged into a root-zone soil moisture estimate (Noahroot) by weighting each layer volumetric soil moisture proportional to respective soil layer depths. This root-zone product and the surface (0–10 cm) soil moisture simulations (Noahsrfc) were later combined into an adjusted soil moisture estimate (Noahadj) following the method of Hain et al. [2011]

display math

where fvcis the fractional vegetation cover based on remote sensing-based observations of leaf area index acquired by the moderate resolution imaging spectroradiometer (MODIS) as shown inFigure 1. As a result of equation (25), Noahadjestimates are essentially surface soil moisture estimates over areas with no vegetation cover, and are root-zone soil moisture estimates over areas with dense vegetation cover. Here vegetation-adjusted soil moisture reflects soil moisture conditions in the part of the root zone that is in direct interaction with the atmosphere (by supplying water for evapotranspiration). Therefore, it is representative of the soil layer depth for sampling the moisture associated with vegetation and surface water stress and thus a natural metric for monitoring the severity of agricultural drought.

Figure 1.

Average fraction of vegetation cover climatology over CONUS from April to October. The locations of SCAN stations used in the study are shown with symbol X. The state of Oklahoma, where all MESONET stations are distributed within, is shown with a black polygon.

[19] Advanced microwave scanning radiometer EOS (AMSR-E) microwave remote sensing-based brightness temperature observations have been used in numerous passive microwave-based algorithms [Jackson, 1993; Owe et al., 2001; Njoku and Chan, 2006; Lu et al., 2009], and the resulting soil moisture products have been extensively validated under a wide range of ground conditions and climate regimes [Draper et al., 2009; Mladenova et al., 2011; Parinussa et al., 2011a]. Here we utilize LPRM surface soil moisture retrievals [Owe et al., 2008] obtained from Vrije University Amsterdam (VUA). LPRM soil moisture estimates are obtained using a one-layer model based on radiative transfer at the surface. This retrieval model uses soil related information as ancillary data, and solves for soil moisture, vegetation optical depth, and effective physical soil temperature. The model uses the relationship between microwave polarization difference index, vegetation optical depth, and soil dielectric constant [Owe et al., 2008], and solves for the skin temperature using a regression-based model based on Ka-band vertical polarization AMSR-E brightness temperature data [Holmes et al., 2009]. Soil moisture retrievals used in this study are based on C- andX-band (mixed) descending AMSR-E brightness temperature observations. However,X-band observations are only also used in areas of the world whereC-band observations are affected by radio frequency interference. Only the descending AMSR-E observations are used in this study primarily because they may result in more accurate soil moisture estimates than the observations acquired during ascending orbits [Draper et al., 2009]. The LPRM soil moisture estimates refer to the top 3 cm of the soil profile. AMSR-E-based brightness temperature observations are obtained at native spatial resolutions of 56 and 38 km for theC- andX-bands, respectively. The operational LPRM product has been regridded to 0.25° spatial resolution by taking advantage of the multiple footprint centers that fall within the same 0.25° grid.

[20] Note that all three parent soil moisture data sets are obtained from different algorithms driven by different input data. These differences support the error independence assumption underlying application of triple collocation. On the other hand, these products also have different systematic and random relationships with the truth. However, here it is stressed that as long as highly linear relationships exist between the products, the data set retrieval method does not present a problem in a triple collocation framework. This issue will be revisited in section 5.4.

[21] In terms of timing, ALEXI provides a direct estimate of the soil moisture conditions at shortly before the local noon on days with clear morning conditions. LPRM soil moisture retrievals are obtained using microwave remote sensing-based observations collected at 1:30 am local time. On the other hand, Noah soil moisture (SM) estimates are temporally continuous, and available at an hourly time interval. Accordingly, there could be inconsistencies between the weekly composites for each product due to differences in the temporal representativeness of each product. However, the impact of these inconsistencies should be minimized by the temporal averaging and standardization (seesection 4.3 below) performed to obtain weekly composites.

[22] Given orbit patterns and typical frequency of masked retrievals, ALEXI and LPRM provide about 3.0 and 4.0 retrievals per week (respectively) over CONUS during the growing season (Figure 2). ALEXI has best temporal coverage over southwestern CONUS and LPRM has better temporal coverage over the midwest and southern CONUS (Figure 2). In addition to the satellite overpass availability, cloud cover and dense vegetation are the primary factors affecting the data availability of ALEXI and LPRM, respectively.

Figure 2.

Average number of ALEXI and LPRM retrievals per week obtained between April and October (2002–2010).

4.2. Validation Data Sets

[23] The merged product has been evaluated in comparison with in situ soil moisture observations from the Oklahoma MESONET Network [Brock et al., 1995; Basara and Crawford, 2000] and the soil climate analysis network (SCAN [Schaefer et al., 2007]) within the CONUS. In Oklahoma, an integrated network of 135 meteorological stations has been installed during the past two decades (Figure 1). Among these stations, around 100 have calibrated soil moisture monitoring devices taking measurements at 5, 25, 60, and 75 cm depths. Collected data undergo automated and manual quality controls conducted by University of Oklahoma during the conversion of 30 min raw data into daily soil moisture averages [Illston et al., 2008]. There are over 150 SCAN stations (Figure 1) spread throughout the CONUS taking hourly soil moisture measurements at 5, 10, 20, 50, and 100 cm depths [Schaefer et al., 2007], which later undergo quality control procedures.

[24] In a manner analogous to equation (25), a vegetation correction has been applied to the station measurements to ensure consistent soil moisture estimates between the merged products and the validation data sets. More specifically, the first layer (top 5 cm) MESONET data have been taken as surface soil moisture and a weighted average of the second to the fourth layers as a root zone. MODIS-based vegetation cover fraction (Figure 1) information on a 0.25° grid was interpolated from a native 8 day temporal resolution to weekly and is assumed to be of the station location. Vegetation correction to the MESONET data was carried out using equation (25). Similarly a vegetation correction was also applied to SCAN soil moisture values; the first layer soil moisture values were used as surface values and average soil moisture values of the second to the fifth layers, weighted by their depths, were used as root-zone values. The merged soil moisture estimates were validated using these vegetation-cover adjusted soil moisture observations. Because the MESONET and the SCAN station data were adjusted for vegetation cover fraction, the number of available station data points depends on the availability of both the surface and the root-zone observations. Since root-zone observations are not as readily available as surface observations, there are approximately only 50 MESONET and SCAN stations (Figure 1) available for verification.

[25] The skill of the triple collocation-based weights was also evaluated by comparing the performance of the merged estimate against the performance of a naively merged product performance, which simply assumes equal weights for each available product.

4.3. Data Standardization

[26] Weekly composites were standardized, so that their time mean (across years) is zero and time variance is unity for a given pixel and week

display math
display math
display math

where y, w, lon, and lat denote year, week, longitude, and latitude, respectively; SM denotes one of the three soil moisture products used in this study (ALEXI, Noah, and LPRM); SMs are the standardized soil moisture anomalies that are merged; and nar is the number of available realizations out of 9 years for the given week, longitude, and latitude. The climatologies are removed with the standardization process so that data sets have zero mean (consistent with the solution of Stoffelen [1998]) and unity standard deviation. Consequently, the triple collocation analysis and the merging process were performed solely using the standardized soil moisture anomalies defined above. Even though the merged product presented in this study is an anomaly product, it can be linearly transformed into absolute value products by selecting a reference data and applying the inverse of equation (28) using the standard deviation and mean values of the reference data set.

[27] There are only 9 years of data (9 points) available to calculate the mean and the standard deviation statistics for each week during the standardization process. The accuracy of these statistics could be improved by using longer time windows (i.e., 5 weeks). However, this may induce artificial autocorrelation in the standardized products and reduce the degrees of freedom for the triple collocation studies. Hence there is likely a trade-off between the accuracy of the standardized products and the accuracy of the triple collocation-based errors. In order to examine this issue, results were calculated using both a 1 and 5 week sampling window and separately evaluated.

4.4. Vertical Support

[28] The output product produced by ALEXI is a vegetation-adjusted (surface–root-zone merged) soil moisture estimate representing a proxy estimate of water available for evapotranspiration. Usingequation (25), Noah soil moisture predictions can be converted into a variable with the same vertical support. However, LPRM soil moisture is associated only with the surface (0 to 3 cm) and therefore has a different vertical support than Noah and ALEXI soil moisture products over vegetated areas.

[29] To examine this effect, additional triple collocation analyses were performed using vegetation-adjusted LPRM values obtained similar toequation (25)using LPRM-based surface (native) and root-zone products. LPRM-based root-zone products required as input intoequation (25) were obtained using the exponential smoothing methodology described by Wagner et al. [1999] and Albergel et al. [2008]to estimate root-zone soil moisture retrievals from superficial observations

display math

where inline image, LPRM srfc is the surface LPRM soil moisture estimate at time ti; LPRM rootis the root-zone soil moisture estimate; andτis the characteristic time length. Specifically, three vegetation-adjusted LPRM products were estimated using three separate root-zone LPRM values obtained via assigningτvalues of 4, 7, and 14 days. Accordingly, we have performed four parallel triple collocation analyses that use the same ALEXI and Noah data sets but different LPRM-based soil moisture values (one LPRM-surface product and three vegetation-adjusted LPRM products).

[30] In this study we also used CLM (version 2.0) simulations, solely for the investigation of surface–vegetation-adjusted soil moisture values coupling and not in the triple collocation merging methodology (section 5below). Like Noah, CLM is a soil-vegetation-atmosphere transfer model that solves for the water and the energy balance at the surface [Dai et al., 2003], and is driven here by GLDAS forcing data [Rodell et al., 2004]. CLM simulations have 1° spatial resolution and utilize 10 soil layers with 2, 3, 4, 8, 12, 20, 34, 55, 92, and 113 cm depths, respectively. Consistent with Noah soil layer depths, vegetation-adjusted CLM soil moisture values were obtained (equation (25)) by using surface soil moisture estimates defined as the weighted average of the first to the third layers (0–9 cm) and using root-zone soil moisture estimates defined as the weighted average of the fourth to seventh layers (10–83 cm).

4.5. Additional Considerations

[31] For cross comparisons of the linear relation between parent products, cross correlations were calculated without setting any threshold for the availability of the products. The resulting correlation values were then masked if a significant correlation was not found. For the triple collocation we have set a minimum number (100) of mutually available data sets. If at least 100 mutually available soil moisture values were not found, then the triple collocation analysis was not performed and all error variance estimates were assumed equal. Here we note that these estimates are used at all time steps for the weight calculation process regardless of whether or not any of the products are missing.

[32] Even though the triple collocation-based errors are constant in time, weights used for merging products at each time step are calculated separately depending on the availability of the data sets for that particular time step. If all products are available, weights are calculated usingequations (7)(9); if only two products are available then weights are calculated using equations (11) and (12), while the missing product is assigned a weight of zero; or if only a single product is available, then this product is assigned a weight of one and the two other missing products are assigned zero weight. Weights change in time only due to the availability of the products at any given time step. If there are not enough mutually available products, meaning a triple collocation-based estimate is not available and equal error variances are assigned, then products are merged using equal weights.

[33] The merged product at any given time can be based on anywhere between one and three realization(s). Hence, the uncertainty of the merged product (equation (6)) at any given location may not be constant in time; dates with more missing soil moisture values have higher uncertainty compared to dates with less missing values. Accordingly, for each available merged product, its uncertainty is also given as a separate product. There will also be temporal changes in the basis of merged predictions as the availability of ALEXI and LPRM changes over time. The lack of temporal consistency brought on by the intermittent availability of observations is a generic problem in land data assimilation. However, the issue is arguably more acute for our particular system since these variations will manifest themselves as abrupt variations in our analysis (rather than being smoothed over time as they would be in a full data assimilation system).

[34] Triple collocation-based error estimates are also dependent on the availability of the daily products, which influences the uncertainty of the sampled weekly composites. The more frequently a data set is available, the less noisy its weekly composites become. Noah weekly estimates are based on 168 separate hourly Noah soil moisture predictions generated each week (i.e., 24 estimates/day times 7 days), while ALEXI and LPRM have on average 3–4 estimates per week. Hence, Noah has better “weekly” temporal support than do the other products. However, it should be noted that poor support is simply one component of the total random error detected by triple collocation and therefore poses no particular challenge for our proposed merging strategy.

5. Results

5.1. Correlations and Weights

[35] ALEXI-, Noah-, and LPRM-based soil moisture anomaly estimates were used to calculate the error variances of each product in a triple collocation framework. As triple collocation-based error estimates require a mutual linear relationship between products, we have evaluated the linearity between the three products by analyzing their cross correlations (Figure 3) using anomaly products. Significant correlations between LPRM and ALEXI, and between LPRM and Noah over large parts of Eastern CONUS are not found (consistent with Hain et al. [2009]), which is partly due to the nonavailability of LPRM soil moisture estimates caused by the strong attenuation of the microwave signal over densely vegetated areas. On the other hand, there are strong cross correlations over areas of central CONUS, indicating a strong mutual linear relationship between various soil moisture products.

Figure 3.

Cross correlations (r) between weekly ALEXI, Noah, and LPRM composites during 2002–2010 using months April through October. Regions lacking significant correlation (at 95% confidence level) are plotted as white and the regions where either correlated product is consistently missing are given in dark blue.

[36] The triple collocation-based errors were computed usingequations (22)(24) and were used in the least squares framework to obtain weights using equations (7)(9). The resulting weights shown in Figure 4 are intuitively consistent with the cross correlations of the products (Figure 3); the product that has the highest cross correlation with its pairs also has the largest estimated weights. For example, the correlations between Noah and ALEXI and between Noah and LPRM are higher than the correlations between ALEXI and LPRM over North Dakota; therefore, Noah weighting is relatively higher than both ALEXI and LPRM over this area (Figure 4, top row). Similarly, the correlations between LPRM and ALEXI and between LPRM and Noah are higher than the correlations between ALEXI and Noah over Montana; therefore, the optimal weighting applied to LPRM retrievals is higher than the weighting of ALEXI and Noah over this area. In general, ALEXI performs better over southern CONUS, which can be attributed to the lower temporal coverage of ALEXI over northern CONUS due to clouds [Hain et al., 2011].

Figure 4.

Weights of soil moisture estimates obtained from triple collocation. All four rows used the same ALEXI and Noah products in the triple collocation analysis. The first row used the native LPRM surface soil moisture product, whereas the second to fourth rows used exponentially filtered LPRM-based root-zone soil moisture products with characteristic time lengths of 4, 7, and 14 days, respectively. White areas correspond to equal 1/3 weighting where triple collocation results are unavailable.

[37] This study focused on the warm season mainly because agricultural drought is associated with the warm season over the study area; however, it is possible to perform the methodology for any time period using various data sets. In general, we may expect remote sensing-based weekly soil moisture averages retrieved during winter to have higher sampling errors than estimates retrieved during summer due to larger data gaps (both temporally and spatially) partly caused by snow and frozen soil conditions. Hence, a single set of weights for the entire year may not reflect the error characteristics as well as monthly derived weights. Unfortunately, the estimation of monthly weights likely requires longer time series than are currently available.

5.2. Merged Estimate and Station Data

[38] All subsequent merging results are based on the case of no LPRM smoothing (i.e., the top row in Figure 4). For the merging methodology, the weights in Figure 4 are used only when all three the data sets are available; for missing days, weights were calculated using the error estimates of the available days. Parent products (ALEXI, Noah, and LPRM), the merged estimate (merged realization using least squares), and the uncertainty of the merged estimate for the 19th week (7–13 May) of 2007 is shown in Figure 5. For this particular week, the standard deviation of the error estimate is around 0.40 (unitless as all products are standardized anomalies), and the soil moisture anomalies range between −2.6 and +2.7 standard deviations around the climatology of the given local pixel. Note that the merged product is a standardized anomaly product that should generally range between about −3 and +3. However, it is emphasized that this can be easily linearly transformed into absolute value products by using the statistics (i.e., mean and standard deviation) of any reference data set. In fact, such linear transformations are very common in hydrological applications (i.e., variance matching, cumulative distribution function matching, etc.), particularly in land data assimilation studies [i.e., Reichle and Koster, 2004].

Figure 5.

Example weekly composites of ALEXI, Noah, LPRM, merged soil moisture, and its uncertainty estimates for the 19th week of 2007. Soil moisture estimates and the error of the merged product are presented in terms of standard normal deviates. White areas in the error image correspond to areas where triple collocation results are unavailable.

[39] Time series of the parent products, the merged estimate, and the uncertainty of the merged estimate are shown together with data from two individual MESONET and SCAN stations in Figure 6. The weights of the parent products are similar at these station points; hence, the merged estimates fall between the three parent products without closely following any one in particular. On average, 82% of the time the station data (both MESONET and SCAN) fell between the inline imagelines (while we expect around 95% of the time), indicating that the triple collocation-based errors are slightly underestimating the uncertainty of the products and/or the difference is due to the representativeness errors of the station data (Figure 6). Average station data correlations with the parent products and the merged estimate are summarized in Table 1; the significance of these correlations, the correlation comparisons of parent products, and the merged estimate are given in Table 2. On average, parent products are better correlated with the MESONET data than with the SCAN data (first three rows of Table 1). The number of stations that have significant correlations with the parent products and the merged estimate are higher for the MESONET data than for the SCAN data (first four rows of Table 2). The merged estimates are better correlated with the station data than the individual parent products (first four rows of Table 1), particularly better than both ALEXI and LPRM (fifth and seventh rows of Table 2)—implying the merged product is more accurate than its parents products individually. Although, on average, the merged estimate has better correlation with the MESONET (but not SCAN) than the best correlation of the parent products, the improvement is not significant for the majority of the stations (rows 11 and 12 of Table 2).

Figure 6.

Weekly soil moisture composite time series in terms of standard normal deviates for the Apache (MESONET) station (Figure 6, top) and the crossroads (SCAN) station (Figure 6, bottom). ALEXI, Noah, and LPRM values are obtained from pixels that include the available station. Brown lines indicate the inline image confidence interval of the merged product calculated using the uncertainty estimates obtained from equation (5) or (10).

Table 1. Parent Products (ALEXI, Noah, and LPRM), Merged Estimate, and Station Data (MESONET or SCAN) Cross Correlations With the Station Dataa
SurfaceVeg. Adj.RootSurfaceVeg. Cor.Root
  • a

    Three layers of station soil moisture data are considered: surface, vegetation adjusted, and root zone. NAIVE refers to the merged product obtained by giving equal weight to each parent product.

MESONET or SCAN (Surface)1.000.910.371.000.910.67
MESONET or SCAN (Veg. adj.)0.911.000.600.911.000.78
MESONET or SCAN (Root)0.370.601.000.670.781.00
Table 2. Results of Product Versus Ground-Data Cross-Correlation Analysis for Various Scenariosa
  • a

    “Total” refers to the number of ground stations considered. “Neg” and “Pos” refer to statistically significant negative and positive results, respectively, for the scenarios given in the left column, and “Non” refers to neither a positive result nor a negative result. For the significance tests, a 95% confidence level is used.

Correlations significantly different than 0ALEXI500248440539
MERGED estimate correlations better than individual products (no significance test)ALEXI5054544440
NAIVE estimate correlations better than individual products (no significance test)ALEXI5034744341
MERGED best significantlyALL500491440440
MERGED bestALL50019314402321
NAIVE best significantlyALL500473440431
NAIVE bestALL50018324402717

[40] Here the station-based vegetation-adjusted soil moisture is better correlated with the surface than root-zone soil moisture, while we expect vegetation-adjusted soil moisture to converge to the surface soil moisture over nonvegetated areas (Table 1). In general the vegetation-adjusted product is highly related to both surface and the root-zone soil moisture products, while the relative magnitude of these relations are driven by the vegetation cover fraction of the area of interest. Given the average fraction of vegetation over the study area is 0.37 (Figure 1), it is expected that the vegetation-adjusted soil moisture will on average be more strongly associated with the surface than with the root zone in this experiment.

[41] The above results were obtained by calculating the anomalies using a weekly sampling window (see section 4.3). Sampling errors in the mean and standard deviation estimates that were used to create these anomalies could potentially degrade the quality of results. Therefore, we repeated this analysis using a longer 5 week (versus 1 week) sampling window centered on the week-of-interest to calculate Noah, ALEXI, and LPRM anomalies. While a longer sampling window decreases sampling errors in estimated anomalies, it also obstructs the effective characterization of short-term soil moisture anomalies and decreases the number of effective degrees of freedom in the anomaly time series (by inducing temporal autocorrelation). Since results from this sensitivity analysis (not shown) suggest that the negative impacts of a longer sampling window outweigh the positives (i.e., all cross correlations referred inTable 2 are degraded), we kept the standardization process as described in section 4.3.

5.3. Comparisons to Naive Merging

[42] Similar to the triple collocation-based merging scheme, naive merging also leads to an integrated product that was generally better than any of its three parent products in isolation (Table 2). However, the triple collocation-based merged estimate did not generally lead to an integrated product that was demonstrably superior to naive merging (i.e., merging with equal weighting) (Table 2). Here we stress that the analyses in Table 2exclude stations where root-zone station-based soil moisture observations (primary reason) and triple collocation error estimates are not available. This indicates that the lack of difference between the triple collocation results and naive (i.e., equal) weighting results is due to the approximate equal weighting by triple collocation (and not a lack of data availability). Potential reasons for the lack of significant improvement in our merged product versus the baseline parent and the naively merged products include (1) station data are point data and may have high representativeness errors [Ryu and Famiglietti, 2005; Miralles et al., 2010; Cosh et al., 2006], and/or (2) triple collocation-based errors may not be optimal due to inadequate mutually available data (limited temporal extent of parent products), and/or (3) the weights are optimal, but the parent products may have similar skills and therefore merging them in a naive way produces estimates that are only marginally different from the optimally merged estimates obtained via triple collocation.

[43] In particular the station observations are point data, and thus very susceptible to representativeness errors and the weights obtained through triple collocation are very sensitive to the number of mutually available data. It is our experience that the number of mutually available triplets in this study may not be sufficient for highly accurate triple collocation estimates on weekly or monthly time scales [Zwieback et al., 2012]. However, as longer time series become available through remote sensing techniques and modeling, and as improved station data (with less representativeness errors via better selection of station and/or sensor locations) are collected, it is expected that the merged estimates will result in higher improvements over the parent products.

[44] The difference between the optimal solution and the naive method was also evaluated by investigating the sensitivity of the optimal solution to data availability and averaging. Specifically, the triple collocation-based weights and the cross correlations for various averaging window lengths (weekly or monthly) were calculated (Table 3) to evaluate the sensitivity of derived optimal weights to aggregation period and retrieval availability. To do this, the daily data were averaged into either weekly or monthly composites by using all the available daily data for averaging (i.e., the “all available scenario”) or using only the days when all three products are available (i.e., the “mutually available scenario”). Applying the mutually available scenario guarantees that equal numbers of daily products are used in weekly or monthly composites analyzed via triple collocation. Later, the weights and the correlations were spatially averaged into a single value, for each scenario separately, by filtering out the areas that do not have reliable triple collocation estimates (for weight averages) or have insignificant correlations (for correlation averages). In general, the differences in product weights were higher than the differences between product cross correlations for weekly all available scenario, while the weight differences were much less for the weekly mutual scenario and both monthly scenarios (Table 3). This implies that the weighting favors products with higher temporal availability (=model) for weekly scenarios, but the effect of this retrieval frequency is reduced when data sets are averaged for longer time periods. This reduced difference in weights and correlations can also explain the similarity between the performance of merged products based on triple collocation and naive weighting (i.e., naive merging is equal to having 0.33 weights, hence the smaller the difference between weights of the products, the more it is similar to naive merging). Since the skills of the parent products are very similar, the naive averaging approach simply follows the optimal solution obtained via triple collocation.

Table 3. Mean Weights and Cross Correlations Over CONUS for Different Data Compositing Strategiesa
Mutually available weekly0.270.330.41
Mutually available monthly0.330.290.37
All available weekly0.240.380.37
All available monthly0.310.340.35
  • a

    Mutually and all available scenarios refer to the parent products that are used in data standardization (section 4.3).

Mutually available weekly0.380.400.43
Mutually available monthly0.440.450.45
All Available weekly0.400.380.44
All Available monthly0.460.440.46

5.4. Vertical Support

[45] As discussed above, the final merged soil moisture estimate is a mixed product that reflects the soil moisture layer that is actively interacting with the atmosphere via evapotranspiration. Hence, using the surface-only microwave remote sensing product over sparsely vegetated areas is consistent with the properties of the mixed product. However, over vegetated areas this mixed vertical support is inconsistent with microwave-based soil moisture retrievals, which are strictly limited to the near-surface layer (surface to 3 cm). Consequently, over densely vegetated areas there is a potential inconsistency in the vertical support of LPRM soil moisture retrievals relative to ALEXI and Noah products (see above). A series of analyses has been performed to test the effect of using the surface-only microwave remote sensing product on triple collocation results over vegetated areas.

[46] Since the parameter of interest is the vegetation-adjusted soil moisture value (rather than root-zone soil moisture), we have narrowed our focus to this parameter. High correlations between surface and vegetation-adjusted soil moisture values at weekly time scales over densely vegetated areas imply a strong linear relation between the surface and the vegetation-adjusted soil moisture simulations, similar to the triple collocation equations (equations (16)(18)) where we assume a linear relation between each data set and the truth. Therefore the applicability of these equations to soil moisture products obtained at different vertical depths is determined by the linearity of the relationship between surface and vegetation-adjusted soil moisture. The depth variations pose a problem to this application only if they manifest themselves in a nonlinear or a hysteric relationship between products. Conversely, if the relationship is linear, it simply folds into the linear rescaling step which underlies the application of triple collocation. Therefore the impact of vertical consistency (between LPRM and Noah/ALEXI-based soil moisture products) will hinge on the degree to which soil moisture estimates at various depths can be linearly related.

[47] Correlations were computed between the surface and vegetation-adjusted soil moisture values from both Noah and CLM (Figure 7) and both MESONET and SCAN station data (Table 4). Very high correlations (i.e., linear relationships) were found between the surface and the vegetation-adjusted station-based soil moisture data from station-based analysis (inTable 4, 0.91 for both MESONET and SCAN data) and from model simulations (in Table 4, 0.96 correlations over CONUS for Noah and CLM, respectively). This suggests that at weekly time scales vertical inconsistencies in support can be effectively resolved via linear rescaling.

Figure 7.

Weekly composite correlations between surface and vegetation-adjusted Noah and CLM soil moisture estimates.

Table 4. Noah, CLM, and Station Cross Correlations Between Surface and Vegetation-Adjusted Weekly Soil Moisture Composite Values at Multiple Locationsa
Surface–Veg. Adj.MESONET StationsSCAN StationsCONUSCONUS-EastCONUS-West
  • a

    CONUS-East lays between 88° and 75° W, and 32° and 41° N and CONUS-West between 116° and 103° W and 29° and 36° N.


[48] Another way to test the potential impact of the surface-only LPRM data products is to transform it into an approximated root-zone product using a low-pass filter. In general, the differences between triple collocation analyses that use different LPRM products (corresponding to various amounts of temporal smoothing viaequation (29)) are minimal (Figure 4), suggesting the nonlinearities due to vertical support differences do not have a major impact on estimated weights (supporting the correlation results given above).

6. Discussion and Conclusions

[49] Model error covariance estimates in many hydrological data assimilation applications are obtained through random perturbation of forcings and states without any rigorous justification for the magnitude of these perturbations [Reichle et al., 2008; Crow and van den Berg, 2010]. Since ensemble spread tends to be a stronger function of forcing spread than initial condition spread [Yilmaz et al., 2012], this results in a merging scheme that is dependent on the user to accurately characterize modeling errors which, in turn, determine the relative weight applied to model background and observations at update times.

[50] Here we introduce a methodology that is completely objective and does not assume any arbitrary assumptions concerning the error characteristics of its input data sets. Specifically, error variances of three independently estimated soil moisture data sets are obtained using a triple collocation method and different soil moisture products are merged in an ordinary least squares framework. With the objective analysis introduced here, we are also able to estimate the uncertainty of the merged soil moisture as a separate product, which could be particularly useful for applications which require information concerning the product reliability.

[51] Disadvantages of this framework when compared to traditional data assimilation techniques include the limitation that estimated model errors are assumed to be constant in time and corrective information obtained via the merger is not temporally propagated forward in time (as in sequential filtering). On the other hand, the triple collocation/least squares approach is simple, highly transparent, and based on an objective calculation of relative errors in various soil moisture estimates. As discussed above, it is likely possible (although not trivial) for triple collocation-based estimates of modeling error to be incorporated into a full data assimilation system. This analysis can be viewed as a preliminary step in that direction.

[52] In addition, there are three necessary requirements for the triple collocation analysis at the center of our approach: the independence of errors, the availability of long-enough time series, and the mutual linearity of products. The first assumption can be justified for many geophysical variables (i.e., soil moisture, soil temperature, potential evaporation, etc.) as there are numerous independent satellite- and model-based estimates of each. The second assumption (adequate data-series length) is required to minimize sampling errors in triple collocation estimates. While the 9 year data sets used here are adequate for this preliminary analysis, it should be noted that better triple collocation results may be obtained if longer data sets are results. Additionally the availability of longer time series will also enable estimation of separate sets of weights for seasonal or subseasonal time scales to partly address the issue of nonstationary weighting of products. The third assumption can be easily checked and the linearity can be justified via simple correlation calculations, as was done in this study. Nevertheless, the methodology is very flexible and can be applied in many hydrological applications given the availability of appropriate input data sets.

[53] In this study we have applied a triple collocation-based merging strategy to integrate soil moisture anomaly information acquired from microwave remote sensing, thermal remote sensing, and land surface modeling. The final merged product can be used as a standalone product particularly for agricultural drought monitoring applications. The approach also provides the ability to estimate uncertainties associated with the merger estimate. When compared to ground-based soil moisture observations, the merged product improves upon the accuracy of its three parent products but is not superior to merged products obtained using naive equal weighting. Given the small differences found between cross correlations of products and between weights of products, the lack of difference between our results and naive weighting appears attributable to the marginal skill differences that exist between ALEXI-, Noah-, and LPRM-based soil moisture estimates over the CONUS. We expect the differences between the skills of triple collocation- and naive method-based merged products would be higher over study areas where the differences between the skills of the parent products are higher.


[54] We thank Jeffrey Basara of University of OK and Micheal Cosh of U.S. Department of Agriculture for the MESONET and SCAN soil moisture data sets. We also thank the anonymous reviewers for their constructive comments, which led numerous clarifications in the final version of the manuscript. Research was partially supported by NASA Terrestrial Hydrology Program grant NNX06AG07G entitled “Monitoring of Root-Zone Soil Moisture Using Multi-Frequency Observations of Surface Soil Moisture and Evapotranspiration.” U.S. Department of Agriculture is an equal opportunity provider and employer.