Bias reduction in short records of satellite soil moisture



[1] Although surface soil moisture data from different sources (satellite retrievals, ground measurements, and land model integrations of observed meteorological forcing data) have been shown to contain consistent and useful information in their seasonal cycle and anomaly signals, they typically exhibit very different mean values and variability. These biases pose a severe obstacle to exploiting the useful information contained in satellite retrievals through data assimilation. A simple method of bias removal is to match the cumulative distribution functions (cdf) of the satellite and model data. However, accurate cdf estimation typically requires a long record of satellite data. We demonstrate here that by using spatial sampling with a 2 degree moving window we can obtain local statistics based on a one-year satellite record that are a good approximation to those that would be derived from a much longer time series. This result should increase the usefulness of relatively short satellite data records.

1. Motivation

[2] Long-term in situ measurements of soil moisture are limited to parts of Eurasia and small sections of North America [Robock et al., 2000]. To derive global soil moisture distributions, as might be needed for the initialization of seasonal forecast systems [Koster et al., 2004], two alternative data sources are often considered. First, useful global distributions of soil moisture can be produced by a land surface model when forced with observed precipitation, radiation, and other meteorological data [Rodell et al., 2003]. Second, satellite sensors can provide passive C-band (6.6 GHz) or L-band (1.4 GHz) radiance measurements that can be interpreted in terms of surface soil moisture content [Owe et al., 2001; Jackson et al., 2002]. However, the model-based product is subject to the many limitations of the model used, to errors in the specification of vegetation and soil parameters, and to errors in the forcing data. The satellite data, for their part, are not available everywhere and not available continuously. Also, satellite retrievals represent only a shallow near-surface layer and do not provide critical information about soil moisture in the root zone.

[3] Many have argued that a land assimilation system that merges satellite retrievals and model soil moisture will provide optimal global estimates of the state of the land surface. In a data assimilation system, a model-generated soil moisture is “corrected” toward an observational estimate, with the degree of correction determined by the levels of error associated with each. Idealized analyses with large-scale assimilation systems, using synthetic (model-generated) observational data, demonstrate the potential of the approach [Walker and Houser, 2001; Reichle and Koster, 2003].

[4] Synthetic data studies, however, avoid a fundamental difficulty associated with satellite data assimilation: the strong biases that exist between satellite-based and model-based soil moisture estimates [Reichle et al., 2004]. The top panel of Figure 1 shows, for example, the difference between the mean near-surface soil moisture field retrieved from the C-band Scanning Multichannel Microwave Radiometer (SMMR) over the period 1979–1987 [De Jeu, 2003] and that simulated by the NASA Catchment land surface model [Koster et al., 2000] for the same period. Despite global coverage of the satellite, soil moisture retrievals are not available in areas that contain frozen soil, a significant fraction of surface water, or dense vegetation. As for the model, it was forced with reanalysis data that have been corrected by observations as much as possible [Berg et al., 2003]. Precipitation – arguably the most critical input for accurate soil moisture modeling – is based on a merged product of satellite and gauge data from the Global Precipitation Climatology Project (GPCP, Version 2) [Huffman et al., 1997]. Model soil moisture data have been generated at the exact times and locations of SMMR retrievals, to ensure maximum compatibility of the two data sets. The model's computational units are irregularly shaped catchments (or watersheds) with an average area of about 2500 km2 [Reichle et al., 2004].

Figure 1.

Difference in 1979–1987 (top) mean and (bottom) std of SMMR soil moisture retrievals and model soil moisture [m3m−3].

[5] Figure 1 shows that across the globe, SMMR retrievals are typically wetter than model soil moisture, except in the eastern half of North America, northern Eurasia, and the Sahel. The bottom panel of Figure 1 shows the corresponding differences in the standard deviation (std) of the instantaneous fields, that is the bias in the std. SMMR retrievals exhibit more variability than model soil moisture across North America, in northern Eurasia, southern Africa, and southern Australia. Elsewhere, particularly in India, SMMR retrievals are less variable in time than model soil moisture. (Note that Reichle et al. [2004] used monthly data as opposed to instantaneous data. Consequently, the time series std in the present paper is about twice as large.)

[6] The satellite and model data clearly differ in their statistical moments. These biases are not uniform but are spatially distributed with complex patterns and with magnitudes on the order of the dynamic range of the signal. Furthermore, the relative accuracy of the two data sets cannot be objectively determined. Reichle et al. [2004] demonstrate that neither is clearly superior when compared to the limited array of in situ point observations. Such bias is unavoidable, both now and in the foreseeable future. Even if the satellite retrievals could be considered unbiased relative to nature, simulated soil moisture contents reflect the many necessary simplifications imposed in the land surface model and should arguably be considered model-specific “indices of wetness” rather than quantities that can be measured in the field [Koster and Milly, 1997] (See also Entin et al. [1999] for a strong demonstration of the model-specific nature of simulated soil moisture.) To merge successfully the satellite observations with the model data, biases across the statistical moments must be quantified and corrected. In effect, the satellite-based moisture contents must be converted (“scaled”) into moisture contents consistent with the land surface model used.

[7] Herein lies a major problem. In order to correct the biases, the temporal statistical moments of both the simulated soil moisture and the satellite-derived soil moisture must be well-established, and without further assumptions, this would require many years of data for each. While such data exist for the model-generated estimates, the passive C-band Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) has become operational only in June 2002. Two passive L-band sensors, the Soil Moisture and Ocean Salinity (SMOS) mission [Kerr et al., 2001] and the Hydrosphere State (HYDROS) mission [Entekhabi et al., 2004], are still in their planning stages. Moreover, the expected lifetime of these sensors is only a few years. Given the tremendous investment placed in the satellites, researchers are pressured to use the satellite products in a data assimilation system as soon as they are produced.

[8] We thus require a strategy for making use of a short record of satellite data under the constraint that we do not have global estimates of the data's temporal statistical moments. (Knowledge of the data's uncertainty does not ameliorate the problem, since we also do not know the true statistical moments.) Here, we present a viable strategy involving the ergodic substitution of variability in space for variability in time. To demonstrate the strategy's effectiveness, we use a single year of the SMMR soil moisture record to determine scaling parameters that convert an instantaneous field of SMMR retrievals into a soil moisture field consistent with the land surface model used. These scaling parameters are then applied to the full 9 years of SMMR data. When the statistical moments of the 9 years of scaled satellite data are compared to those of the simulated soil moisture fields, the biases in the mean and std are seen to be much smaller than those in Figure 1, indicating that the scaling, based on a single year of data, was a success. These scaled data can be merged more reliably with land model simulations in a data assimilation system.

2. Approach

[9] Our strategy for bias removal is to match the cumulative distribution function (cdf) of the satellite retrievals to the cdf of the model soil moisture. Similar cdf matching techniques have been used, for example, to establish reflectivity-rainfall relationships for calibration of radar or satellite observations of precipitation [Atlas et al., 1990; Anagnostou et al., 1999] and for long-range hydrologic forecasting [Wood et al., 2002]. Our approach is illustrated in Figure 2, which shows cdf's of surface soil moisture at a particular location in the Northern Great Plains (46N, 100W). At this location, SMMR retrievals are considerably wetter and exhibit more variability than model soil moisture. The scaled satellite retrieval x′ is given by the solution to

equation image

where cdfs and cdfm denote the cdf's of the satellite and model soil moisture, respectively, and x is the unscaled satellite soil moisture. Since assimilation systems ingest instantaneous satellite retrievals at the local scale, equation (1) is solved at each location after estimating the corresponding local cdf's. The bold arrows in Figure 2 illustrate schematically how the unscaled satellite retrieval x is converted into the scaled retrieval x′ (using the “ideal” cdf estimated from 1979–1987 SMMR retrievals.) Note that cdf matching corrects all moments of the distribution function regardless of its shape, subject to statistical errors associated with a limited sample size. In practice, we can expect meaningful estimates only for the first few moments, and limit ourselves to analyzing the mean, std, and skewness.

Figure 2.

Cdf estimates at 46N, 100W: (Squares) 1979–1987 SMMR retrievals, (Solid line, no marker) 1979–1987 model soil moisture, (Circles) 1979 only SMMR retrievals using a spatial sampling window of 2 degree radius (approximate cdf), (Stars) 1979–1987 SMMR retrievals scaled with approximate cdf.

[10] Our goal is to obtain an acceptable estimate of the cdf used for scaling from only the first year of SMMR data. In order to control statistical noise in the cdf estimate, we estimate the temporal statistics at a given site by using observations at neighboring locations that are within a chosen distance from the site. In other words, we apply a moving spatial sampling window to the computation of the statistics and implicitly assume some degree of ergodicity in the data. We then use this approximate estimate of the cdf (based on just one year of SMMR data) to solve equation (1) and obtain 9 years of scaled SMMR retrievals from the 9 years of unscaled SMMR data. Finally, we compare the statistics of the scaled data set to those of the model soil moisture. Note that the model cdf used for scaling is based on model soil moisture from 1979 to 1987.

3. Results

[11] Robust estimation of statistics requires sufficient data. Our cutoff criterion for estimating the local cdf is that at least 100 measurements must be available within the spatial sampling window. Naturally, the degree of global coverage of cdf estimates obtained in this way increases rapidly with the size of the window, but so does the error associated with the ergodicity assumption. We are thus faced with a trade-off between coverage and error. To quantify this trade-off, we tried several spatial sampling windows with radii ranging from 0 to 5 degrees.

[12] Since the ergodicity error increases monotonically with the window size, a reasonable approach is to use the minimum window size for which the coverage of the approximate cdf estimates (obtained from one year of SMMR data) is almost complete relative to the coverage obtained when the cdf is estimated from 9 years of SMMR data without spatial sampling. For SMMR, this approach suggests that the optimal spatial sampling window has a radius of 2 degrees. The approximate SMMR cdf based on 1979 data only and using a 2 degree spatial sampling window is illustrated in Figure 2 for the representative location in the Northern Great Plains. The rough agreement with the full SMMR cdf is an indication of the validity of the ergodicity assumption. When 9 years of SMMR retrievals are scaled using this approximate cdf estimate, the cdf of the resulting scaled SMMR retrievals (also shown in Figure 2) is much closer to the model cdf than before scaling.

[13] Figure 3 shows global maps of the biases obtained when the statistics of the scaled SMMR retrievals (using approximate cdf estimates) are compared to those of the model soil moisture. As in Figure 1, the biases in Figure 3 are computed for the period from 1979 to 1987. While there is some bias left, scaling with the approximate cdf based on just one year of satellite data clearly removes much of the biases seen in Figure 1. The biases after scaling depend only weakly on the particular year used for estimating the cdf. This is not surprising, given that the bias in the mean is much larger than the interannual variability. Globally averaged, the bias in the mean (or std; or skewness) is reduced by 80% (or 55%; or 25%) when only a single year of SMMR retrievals is used to estimate the cdf used for scaling. Since cdf estimation involves finite size bins, even scaling with the “ideal” cdf that is computed from the entire SMMR history does not completely eliminate the biases, particularly in the higher moments. In the ideal case, the bias in the mean (or std; or skewness) is reduced by 98% (or 90%; or 55%).

Figure 3.

Same as Figure 1 except that SMMR retrievals were scaled with an approximate cdf estimated from 1979 only SMMR data using a spatial sampling window (2 degree radius).

4. Conclusions

[14] We use the 9-year SMMR record to demonstrate that temporal sampling of SMMR soil moisture retrievals can be traded off against spatial sampling. Robust estimation of the statistics for bias removal via cdf matching was accomplished using only a one-year satellite record. When only one year of data is available and the cutoff criterion for computation of statistics is set to 100 data points, a reasonable approach is to estimate the cdf used for scaling by applying a spatial sampling window with a 2 degree radius. In this case, the global average bias in the mean of the scaled SMMR 9-year data set (relative to model soil moisture) is reduced by 80% when compared to the original bias of the unscaled SMMR retrievals. For the bias in the std (skewness), cdf matching permits bias reduction by 55% (25%). With our method, current and future satellite retrievals of soil moisture can be assimilated more confidently in near-real time using only a one-year climatology.

[15] Although differences in the spatial and temporal mean and variability between state-of-the-art land surface modeling systems are substantial, our method does not depend on the particular model used precisely because we scale the satellite retrievals to be consistent with the given model. Finally, AMSR-E and future sensors yield improved measurements of brightness temperatures compared to SMMR. Most importantly, AMSR-E offers higher sampling rates than SMMR (around 2.5 times higher spatial resolution and wider swath width), which may permit reducing the size of the spatial sampling window and hence the ergodicity error. Nevertheless, the retrievals used here are based on a state-of-the-art algorithm, as is the modeling system. Therefore, the underlying errors in the retrieval algorithm, the land surface model, and the surface meteorological forcing data are unlikely to change significantly in the near future. Our approach presents a valuable tool for the imminent operational use of AMSR-E and future soil moisture retrievals.


[16] This research was sponsored by NASA grant NRA-00-OES-07. We thank M. Owe and R. de Jeu for the SMMR retrievals and A. Berg and J. Famiglietti for the forcing data.