Land surface temperature plays an important role in surface processes and is a key input for physically based retrieval algorithms of soil moisture and evaporation. This study presents a framework for using independent estimates of land surface temperature from five microwave satellite sensors to improve the accuracy of land surface temperature output from a numerical weather prediction system in an off-line (postprocessing) analysis. First, structural differences in timing and amplitude of the temperature signal were addressed. Then, satellite observations were assimilated into an auto-regressive error model, formulated to estimate errors in the numerical weather prediction output. Errors in daily minimum and amplitude were treated separately. Results of this study provide new insights about potential added benefits of preprocessing and off-line assimilation of microwave remote sensing-based and model-based temperature retrievals. It is shown that the satellite observations may be used to reduce errors in surface temperature, particularly for day-time hours. Preprocessing is responsible for the bulk of this reduction in temperature error; data assimilation is shown to further reduce the random temperature error by a few tenths of a Kelvin, accounting for a 10% reduction in RMSE.
 Land surface temperature (LST) plays an important role in land surface processes and is a key input for physically based retrieval algorithms of important hydrological states and fluxes such as soil moisture and evaporation. For example, dedicated soil moisture satellite missions like the Soil Moisture and Ocean Salinity (SMOS) and the Soil Moisture Active/Passive (SMAP) rely partly on LST provided by Numerical Weather Prediction (NWP) centers.
 Unlike SMOS and SMAP, other remote sensing platforms have instruments that can be used to estimate LST, either from thermal infrared (TIR) or microwave Ka-band. For example, the Aqua satellite carries the Advanced Microwave Scanning Radiometer on Eos (AMSR-E), which contains a Ka-band channel that can be used to provide estimates of LST for soil moisture retrieval schemes [Jackson, 1993; Owe et al., 2008]. On the same platform, the Moderate Resolution Imaging Spectroradiometer (MODIS) includes multiple TIR bands that are used to retrieve high-resolution land surface temperature products [Wan, 2008]. As a result, there are multiple independent estimates of land surface temperature that can be obtained from existing sensors and satellites, often at multiple times during the day. TIR approaches typically have better spatial resolution but are limited in temporal coverage by the occurrence of clouds. In this study, the emphasis is on temporal consistency rather than on spatial resolution; therefore, we focus on the use of microwave sensors to estimate surface temperature.
 In general, the errors associated with any method for measuring or modeling temperature are assumed to be composed of random and systematic components. Especially at the very surface, these systematic biases tend to have diurnal patterns. This is because surface temperature is characterized by a strong diurnal cycle that is driven by solar radiation and is dependent (in terms of both timing and amplitude) on the exact sensing depth and thermal properties of soil and vegetation [Van Wijk and de Vries, 1963]. The diurnal oscillating aspect of the difference between LST estimates makes it difficult to combine time series data obtained from different sources.
 Over the past 10 years, Kalman filter-based data assimilation techniques have been widely applied to the mitigation of the random error in land surface state variables like soil moisture [Reichle, 2008]. Assimilation of surface temperature data has been shown to reduce random errors in modeled soil moisture [Lakshmi, 2000] and the surface energy balance [van den Hurk et al., 2002]. On the other hand, the marginal effect on soil moisture skill when simultaneously assimilating soil temperature in addition to moisture is typically minimal [Yilmaz et al., 2011]. However, the application of Kalman filter techniques with the specific aim of reducing the random error in temperature estimates has been attempted in only a few studies [Bosilovich et al., 2007; Reichle et al., 2010]. The reported reduction in error from these approaches is relatively small, which is attributed to the difficulty in reducing structural differences between model forecasted and remotely observed temperature. Reichle et al. , for example, tested various bias mitigation techniques and found that a dynamic bias estimator performs best, provided it considers the diurnal cycle of that bias.
 The overall goal of this study is to develop an off-line method to use multiple satellite data sources to enhance the accuracy of LST output from NWP systems. Specifically, we aim to demonstrate the utility of microwave remote sensing data in adding skill to modeled surface temperature output. A secondary goal is to define quality control and preprocessing steps necessary for successful LST assimilation into land surface models.
 In a recent study, the accuracy of NWP data was analyzed in terms of 5 cm soil temperature [Holmes et al., 2012]. Of the sources considered, the Modern Era Retrospective Analysis for Research and Applications (MERRA [Rienecker et al., 2011]) was found to have several advantages that facilitate (off-line) temperature assimilation, specifically:
MERRA has an hourly output which reduces uncertainties in the interpolation to the temporal sampling of satellite observations;
the timing of MERRA's surface temperature indicates a top layer with a very low heat capacity that should agree well with thermal infrared and microwave Ka-band sensing depths; and
the good performance of MERRA during the night-time indicates a high skill in modeling daily mean temperature.
For these reasons, we have selected MERRA as the modeled temperature dataset in this study. The satellite temperature estimates are derived from three sensor systems that include a microwave Ka-band (37 GHz) radiometer, as carried on five satellites (see Section 2.2).
 This study proposes a framework for merging satellite temperature observations with the (hourly) MERRA surface temperature output. A critical step in the process is the removal of systematic biases, in particular the timing and amplitude of the diurnal cycle and the mean bias. These biases are challenging to remove from temporally sparse (and irregular) satellite data. Therefore, their mitigation is addressed in great detail (see Section 3.1). After minimizing the systematic biases, an off-line Kalman filter-based data assimilation technique is used to reduce the random error in scaled NWP temperature, while preserving the continuity of the NWP record (Section 3.2).
 For calibration and validation, ground-based surface temperature observations from Fluxnet stations across the United States and Europe are used. A further validation is performed with Oklahoma Mesonet data to assess the improvement in temperature at the 5 cm depth, important for soil moisture remote sensing.
2.1 Model-Based Land Surface Temperature
 The modeled temperature dataset was acquired from NASA's Global Modeling and Assimilation Office (GMAO) and their MERRA reanalysis product (http://gmao.gsfc.nasa.gov/research/merra, [Rienecker et al., 2011]). MERRA products are generated using Version 5.2.0 of the GEOS-5 DAS (Goddard Earth Observing System (GEOS) Data Assimilation System (DAS)) with the analysis and model output both at a spatial resolution of 0.5° latitude by 0.67° longitude and with a six hourly analysis cycle. Two dimensional diagnostics describing the radiative and physical state of the surface are available as hourly averages.
 Surface processes in MERRA are based on the NASA Catchment land surface model [Ducharne et al., 2000; Koster et al., 2000]. Each MERRA grid cell contains several irregularly shaped tiles, and each tile is further divided into subtiles based on their modeled hydrological state: saturated, unsaturated, and wilting. The surface temperature of a grid cell is obtained by area-weighted averaging of the surface temperatures of all subtiles within the grid cell. The subtile surface temperatures are prognostic variables of the model and represent a bulk surface layer with a small but finite heat capacity. For all vegetation classes except broadleaf evergreen trees, this bulk surface layer represents the vegetation canopy and a surface layer at the top of the soil column (effective layer depth<1 mm).
 Among the measurements that are assimilated into MERRA that are most relevant to the present study are Radiosonde temperature readings (geographically and temporally sparse) and satellite-based radiance observations. No direct assimilation of soil temperature data takes place.
 In this study, we analyzed the gridded surface temperatures over land (TNWP). The data were regridded by means of bilinear interpolation onto a 0.25° regular grid, and the hourly output was temporally interpolated to a 15 min temporal resolution.
2.2 Validation Datasets
 Two sets of in situ measurements are used to calibrate and validate the methods. Data from Fluxnet for the year 2005 are used to calibrate and validate the analysis with ground measurements that closely match the vertically support of skin temperature estimates obtained from satellite observations. Soil temperature measurements from the Oklahoma Mesonet Network are used for secondary validation of the methodology at 5 cm soil depth, for the year 2009. The first set of validation data is from Fluxnet [Baldocchi et al., 2001], which is a global network of meteorological towers that, among many other parameters, provides measurements of upwelling long wave radiation (R↑, Wm-2). Surface temperature can be estimated from these measurements by converting R↑ to TLW (K) according to the Stefan–Boltzmann's law:
where ε (-) is the long wave emissivity of the surface and σ the Stefan–Boltzmann's constant (σ = 5.6697 × 10-8 Wm-2 K-4). Here, ε was determined for each site individually based on sensible heat flux and air temperature measurements [Holmes et al., 2009] and listed in Table 1. TLW corresponds to the mean temperature of the uppermost surface within the sensors field of view.
Table 1. Geographical Location and Characteristics of the 17 FLUXNET Sites Used in the Ground Validation Study. The Longwave Surface Emissivity (ε) is Based on the Study by Holmes et al. 
 The second set of validation data are obtained from the Oklahoma Mesonet [McPherson et al., 2007; Illston et al., 2008]. This is a statewide network of meteorological stations that spans the geographical region of 33–37°N (~400 km), and 94–103°W (~800 km). In this study, we analyzed soil temperature data recorded at 5 cm depth under native sod at a 15 minute resolution for the year 2009. Various automated and manual quality control checks are performed on the Oklahoma Mesonet data, including a site visit at least three times a year [Shafer et al., 2000]. For this study, all data that are not labeled “good” were removed from the analysis. Additional quality checks were applied as detailed in the study by Holmes et al. , based on consistency with temperature sensors at deeper depths.
 The purpose of this validation at 5 cm depth is to make the results directly comparable with the previously published assessment of the MERRA soil temperature output in the study by Holmes et al.  and give performance metrics that are relevant to applications requiring soil temperature rather than the surface temperature. For this reason, the same analysis and the exact same 63 stations (listed in Table 2) are used as in the earlier study, see Holmes et al.  for a full description of the validation methodology.
Table 2. List of Stations Used in the Mesonet Validation Analysis
Mesonet stations (63)
Ada, Altus, Alva, Antlers, Apache, Ardmore, Arnett, Beaver, Blackwell, Boise City, Broken Bow, Burneyville, Butler, Camargo, Chandler, Cherokee, Cheyenne, Chickasha, Clayton, Cloudy, Claremore, Cookson, Durant, El Reno, Erick, Eufaula, Foraker, Grandfield, Hectorville, Hinton, Hollis, Hooker, Idabel, Kenton, Ketchum Ranch, Kingfisher, Lahoma, Marena, May Ranch, Medicine Park, Miami, Marshall, Newkirk, Nowata, Norman, Oilton, Okemah, Pauls Valley, Pawnee, Porter, Pryor, Putnam, Ringling, Shawnee, Slapout, Spencer, Stuart, Tipton, Tishomingo, Vinita, Walters, Webbers Falls, Westville
2.3 Satellite Observations
 The use of microwave Ka-band observations to derive land surface temperature is an alternative to the more commonly used TIR approaches. It has been applied mostly as part of soil moisture retrieval algorithms (e.g., [Jackson, 1993; Owe et al., 2008]) and recently in a land evapotranspiration model [Miralles et al., 2011]. All these applications are based on linear regression methods calibrated with in situ data and are tailored to the depth requirements of a specific application. The vertically polarized, Ka-band brightness temperature (TB,KaV), as measured in orbit, is in theory functionally dependent on the temperature of the surface layer with a depth of only ~1 mm for wet soils, which increases to ~10 mm for very dry soils [Ulaby et al., 1986]. However, it must be pointed out that vegetation appears relatively opaque at 37 GHz. Consequently, for most land surfaces, TB,KaV will be most sensitive to the canopy temperature [Holmes et al., 2009].
 In the present application, the transformation of TB,KaV to the satellite estimate of land surface temperature is done by directly scaling to the NWP temperature output, which is given in detail under Section 3.1.2. As a result, a priori calculation of a regression equation (to account for both surface emissivity and atmospheric effects) is not needed.
 Microwave Ka-band observations are available from the Advanced Microwave Scanning Radiometer on EOS (AMSR-E), the Special Sensor Microwave and Imager (SSM/I), and the TRMM Microwave Imager (TMI). Detailed sensor specifications are listed in Table 3. The coverage of all polar orbiting satellites is global. The equatorial orbit of TMI extends from 39°N to 39°S, which means that it is available for Fluxnet stations A, H, P, and Q (Table 1) and for the entire Mesonet domain.
Table 3. Specifications of Satellite Sensors Providing Ka-band Microwave Observations Used for Land Surface Temperature Estimation
 Each sensor has a different spatial resolution and the location of the center of the footprint and its azimuthal orientation varies between consecutive overpasses. To combine these observations, they are binned to a 0.25° regular global grid. This resolution was chosen based on the satellite with the coarsest resolution (SSM/I, see Table 3). The value for each grid cell is the mean of all observations with a footprint center within the boundaries of the cell. This is different from a previous validation of the Ka-band approach [Holmes et al., 2009], where the nearest observation to the station was used. The benefit of this modified approach is that the data reflect exactly the values that would be calculated from a globally gridded data set. However, because the satellite measurements now include a larger domain, the representativeness of the station data for the grid cell may be reduced.
 With a possible application for soil moisture missions in mind, it is worth noting that observations from this current group of satellites are available for the SMOS satellite (launched in 2009). For SMAP, with a projected launch in late 201, successors to the present group of satellites are expected to compensate for the loss of any current sensor. For example, AMSR-E stopped collecting data in October 2011, but its successor with an identical configuration, AMSR-2 on GCOM-W, was launched in May 2012 [Imaoka et al., 2007]. The SSM/I satellites are replaced with SSMIS [Sun and Weng, 2008], and the Global Precipitation Measurement mission (GPM), projected launch date in 2014, will expand the geographic range of the current TMI.
2.3.1 Quality Control of Ka-Band Data
 Several quality control filters are applied to the satellite Ka-band observations before using them in the analysis. First, because open water has a much lower emissivity, only observations over land with little or no open water within the footprint are accepted. For AMSR-E, observations where the land ocean mask (provided with L2 data) indicates the presence of open water in the 37 GHz footprint are removed before averaging the remaining observations. SSM/I data are delivered with a binary land classification (not in percentage), so only pixels classified as water are removed. For TMI data, no high-resolution land sea mask is provided, which should be considered in later analyses if there are discrepancies. Second, since the presence of snow and frozen soil dramatically reduce surface emissivity, only observations above the freezing point are used.
 Because the satellite track relative to the Earth changes from day to day, so does the sampling of any particular grid box. On some days, the sampling may not fully cover the entire grid box which can lead to a bias in areas of heterogeneous land cover. To limit this potential effect on the integrity of the time series, the number of footprints (Nb,p) within each 0.25° grid box (b) for a given overpass (p) was monitored relative to the median for the entire year of data. Table 3 lists these values for each satellite, with a median (Nb) of 8 for AMSR-E, 6 for TMI, but only 1 observation for SSM/I on a typical pass. The oversampling of the 0.25° grid box by AMSR-E and TMI Ka-band observations allows for an additional filter. This filter compares the number of observations at a particular overpass to the median value for that box, and rejects a sample when Nb,p<(median(Nb)-3).
 A second benefit of the oversampling is that it allows for an analysis of the variance of the observations during each sampling of the grid box. The sample standard deviation (s) of the observations (x) from a single overpass within the grid box (b) is given by
Assuming a high correlation between errors of coincident observations with high overlap in field of view, the s over a homogenous area should be well below the precision of the radiometer. However, heterogeneous land cover (especially water bodies), frost, snow cover, and active precipitation all cause an increase in s. High s may therefore be used as an instantaneous proxy of measurement uncertainty.
 The sample standard deviations are calculated using equation (2) for the AMSR-E vertical polarized Ka-band channel (TB,KaV). Figure 1 shows the range of measured AMSR-E (ascending only) s for each station in Table 1 over 2005. For most sites, the bulk of the observations have s below the uncertainty of the radiometer (0.7 K, see Table 3). Four stations stand out with higher s; F, M, N, and Q. For F (Cabauw, Netherlands), this behavior can be explained by a lake located 11 Km NW of the station, which is not identified by the land/ocean mask of AMSR-E. Similarly, station N (Le Brai, France) is located just 20 km from the Arcochon Bay and 10 km from the urban surroundings of Bordeaux. Station Q (Yatir, Israel) is located in a pine forest but surrounded by sparse shrubs, and station M (Loobos, Netherlands) is located in a forest bordered by cropland. The utility of using s in assessing the relative accuracy of the observations will be further explored in Section 3.2.2.
2.3.2 Intercalibration of Ka-Band Sensors
 Because of the variety of satellite sources for Ka-band observations, a thorough intersatellite calibration of the different sensors is necessary. This presents challenges because land surface temperature has a strong diurnal variation and the satellites observe at different times of day. Fortunately, the TRMM satellite has an equatorial orbit with local overpass times that vary over the year. Because of this irregular orbit, the TMI observations overlap for periods of the year with each polar orbiting satellite used in this study. To compare the satellites with different spatial resolutions, the swath data are averaged for each 0.5° grid box. All data pairs, with observation times within 10 minutes of each other, and for the grid boxes that cover the state of Oklahoma were spatially and temporally aggregated for years 2004 and 2005. The basic procedure used here was described in the study by Parinussa et al.  and expanded to encompass all the Ka-band sensors used in this study.
 Figure 2, with scatter plots of TMI TB,KaV measurements against the corresponding measurements from the other four satellites, shows a high level of consistency between all Ka-band sensors. The standard errors of estimate (SEE) are just slightly larger than the sensor accuracy, between 0.7 and 1.0 K. The bias relative to TMI is low: 0.1 K to -0.2 K for the three SSM/I platforms and 0.6 K for AMSR-E. Since the TMI and SSM/I sensors all have a center frequency of 37.0 GHz, and an earth incidence angle close to 53°, the observed bias is attributable to residual calibration differences. On the other hand, the AMSR-E Ka-band has a slightly different configuration, operating at a 36.5 GHz and a 55° incidence angle. Based on radiative transfer simulations, this would account for a 0.3 K positive bias of AMSR-E relative to TMI [Holmes et al., 2009], half of the observed bias of 0.6 K.
 To create an intercalibrated set of satellite observations, the regression equations displayed in Figure 2 (at the top in each plot) were universally applied to calibrate the AMSR-E and SSM/I sensors to match the TMI observations.
3.1 Systematic Differences Between Temperature Sets
 For optimal performance of a data assimilation system, systematic differences between the datasets need to be minimized through pre-processing or by a dynamic bias estimation (e.g. [Dee and da Silva, 1998; Bosilovich et al., 2007; Reichle et al., 2010]). Here an analysis of systematic differences between the temperature datasets was conducted to help formulate a preprocessing plan. Specifically, differences in timing, minimum, and amplitude of the diurnal temperature cycle were addressed prior to the actual data assimilation procedure. These aspects of the temperature time-series together define the temperature state:
where M is the minimum temperature, A is diurnal amplitude (the difference between maximum and minimum temperature), and D is the periodic diurnal kernel with values between 0 and 1. The subscripts d and i indicate the day of the year and hour of the day, respectively.
 To attribute biases in Td,i to one of the three components (M, A, or D), a three-step preprocessing procedure is formulated (Figure 3). Considering a sparse temperature time series, the only component that can be determined independently of others is the timing of D. Therefore, the first step consists of determining the difference in timing between the D of MERRA and the satellite set. This difference in timing is attributed to a difference in vertical support between model and measurement; it is mitigated by synchronizing the MERRA temperature set to the satellite as addressed in Section 3.1.1. In the second step, the conversion of Ka-band brightness temperature to the MERRA soil temperature range is based on a scaling of the night time observations, as described in Section 3.1.2. Once both timing and night time mean are in agreement, the amplitude of MERRA is scaled to match the amplitude of the satellite data (detailed in section 3.1.1).
 The direction of preprocessing steps 1 and 3 (i.e., the decision to scale the NWP set to the satellite observations) is determined by the requirement of the phase synchronization and amplitude correction operations for a continuous time-series with subdaily resolution (met only by the NWP set). On the other hand, in step 2, the satellite is scaled to the NWP data so that all discussion can be conducted in the physical temperature range as determined by the NWP model rather than that of brightness temperature.
3.1.1 Synchronize Timing of Diurnal Temperature Cycle
 The temperature of the upper soil layer is characterized by strong diurnal cycles of heating and cooling that result in sharp vertical gradients. Therefore, if two temperature records of the same land surface do not represent the exact same depth, there will be systematic differences in the timing and amplitude of their diurnal temperature cycle. To test if the temperature datasets used in this study relate to the same soil depth, the annual mean timing of the periodic diurnal kernel (D in equation (3)) was determined for each set. This was done by optimizing the phase φ [hr] of sine-based diurnal kernel function (Dsim) so that the unbiased root mean square error (ubRMSE) between Dsim and D for a given temperature set is minimized:
Note that with time of day (i) is in hours, and setting φ = 12 hr, (4) results in Dsim with a peak at noon. Optimized φ, as determined for each of the temperature sets, are shown in Figure 4 for the 17 Fluxnet stations in Table 1 (year 2005). The sets peak between 1 and 2 hours after solar noon at most stations. The MERRA TNWP has the smallest variation at ~1:15 p.m. ± 15 minutes. The satellite data (TB,KaV; all available Ka-band observations combined) peak at about 1:45 p.m. ± 15 minutes. The in situ data have a higher variation and peaks between 12:45 and 2:15 p.m. This high variation compared to the satellite data is attributed to scaling and representation errors of the station data.
 On average, the difference in timing (dφ) between the satellite observed TB,KaV and TNWP is a half hour. Assuming a minimal error in the modeled timing of maximum net radiation, it is hypothesized that this difference is due to depth differences between the model temperature layer and the sensing depth of the Ka-band sensor. Based on this, dφ is accompanied by a difference in amplitude of the diurnal cycle of factor e− dφ. Moreover, harmonics with longer periods will have proportionally longer time offsets and smaller amplitude adjustments [Van Wijk and de Vries, 1963]. A method to remove such predictable differences between two temperature time series based only on their timing difference is described by Holmes et al. . This phase synchronization requires a continuous temperature set; it cannot be applied to the irregular observation times of the satellite overpasses. Therefore, we use this method to synchronize TNWP to the phase of TB,KaV based on the observed dφ, thereby minimizing all temperature differences that can be explained by this difference in timing. The effect of this synchronization can be represented as
where is the phase-corrected temperature for a given day and location and the effect on the timing of the diurnal kernel is represented by the periodic function cdφ,i ().
 To improve the representation of the in situ data for use in calibration, the Fluxnet observations were also synchronized to the phase of the satellite observations by the same procedure, resulting in .
3.1.2 Scaling Brightness Temperature to NWP Temperature
 Brightness temperature relates to surface temperature in a complex way that can be affected by the soil surface condition, vegetation, and the atmosphere. It can be described by radiative transfer models with various degrees of complexity for different wave lengths. At Ka-band, these models are often simplified or replaced by an empirical linear regression to retrieve land surface temperature [Holmes et al., 2009]. For a data assimilation application, the observations need to ultimately match the model, overruling any previous linear scaling. Therefore,
TB,KaV was directly scaled to , to calculate the satellite derived temperature TSAT
where overbar represents annual average and the empirical factor, εKa, is calculated for each grid cell separately. Assuming represents the emitting layer for Ka-band emission and there is no interaction with the atmosphere, εKa can be interpreted as the effective Ka-band emissivity. In this study, the scaling is done at annual time-scale, which may result in a time-varying bias if atmosphere or surface changes cause a deviation of the linearity assumption. Scaling over shorter time windows could mitigate these types of errors if necessary.
 Since at this point in the preprocessing the amplitude of the diurnal temperature cycle is not scaled yet, and the highest correlation with in situ data was found for the night-time temperatures, the mean emissivity was only calculated based on AMSR-E 1 AM descending observations. Furthermore, only data where the NWP model indicates an unfrozen surface were considered.
3.1.3 Remove Bias in Amplitude of the Diurnal Temperature Cycle
 After correcting for both the timing and the mean bias of the datasets, the amplitude of the diurnal temperature cycle (A) is investigated. Even though there are no direct observations of the diurnal amplitude, it can be approximated by the temperature difference between measurements near the daily minimum and maximum. AMSR-E, with observations at ~1 AM/PM, is closest to a direct observation of amplitude when consecutive data pairs of ascending and descending orbits are available (see section 2.2). During some periods of the year, the varying overpass time of TMI may also be close to the 1 AM or 1 PM, in which case it was used to supplement AMSR-E for the calculation of the amplitude.
 Figure 5 (top) shows the amplitude bias of and TSAT, relative to , for the year 2005 and for each station. The above described timing correction (Section 3.1.1) of in situ and model data to the phase of the satellite data has already removed more than half of the systematic differences in amplitude—see plotted difference between before (black) and after (colored) differences in Figure 5. What is interesting is that the residual bias of and TSAT with the in situ data agree in sign for most stations. This strongly suggests that the residual bias is due to representation error in the in situ data rather than the NWP or satellite-based retrievals. In particular, it appears that the small footprint of the in situ data may result in a residual bias in amplitude (persisting even after the correction of timing effects described in section 3.1.1). The large positive bias of and TSAT relative to station Q (Yatir, IS) can be explained by the difference in temperature between the prevailing shrubs and the forest canopy as described in Rotenberg and Yakir .
 The remaining amplitude bias between and TSAT after the phase synchronization is generally not more than a few degrees K but still constitutes a significant structural bias that needs to be addressed prior to data assimilation. Similar to the timing correction, the amplitude is best corrected using a (semi) continuous time series. In this investigation, regression coefficients (β0, β1) were fitted to minimize the difference between the satellite and model amplitudes via
where i = Asc and i = Desc indicate the hour of day for the ascending and descending pass, respectively. The coefficients β0 and β1 were determined for each grid box based on the full year of data and used to obtain the adjusted model temperature amplitude.
 The coefficient of determination (r2) of the satellite and the model data with the in situ observations is shown in Figure 5 (bottom). For most stations, the r2 between in situ data and satellite amplitude is much higher than with the NWP data. The stations for which outperforms TSAT by this metric are the four stations with a high spatial standard deviation (F, M, N, and Q).
3.1.4 Summary of Preprocessing Steps
 The result of the three preprocessing steps, the synchronization of timing (Section 3.1.1), the scaling of the Ka-band brightness temperature to the NWP temperature (Section 3.1.2), and the amplitude bias removal (section 3.1.3) can be summarized in the following set of equations:
Together, these three steps help reconcile the NWP temperature set with the satellite observations. In addition, the in situ data are adjusted for differences in timing and amplitude similar to . The resulting is considered to reduce some of the limitations in the experiment setup, with differences in recording depth and representation errors of the site versus the wider grid cell.
3.2 Reducing Random Errors in Temperature
 The processing steps described in Section 3.1 are aimed at eliminating any systematic differences existing between NWP and satellite-based surface temperature. What remains after such processing are random errors that can be minimized through the implementation of data assimilation approaches. Here we apply two sequential Kalman filters (KF). The first, KF/A, is used to assimilate observations of the amplitude of the diurnal temperature cycle with the off-line model output. The second, KF/M, is used to assimilate observations of the daily minimum temperature. The application of these filters is illustrated in Figure 6, starting with the reconciled data and ending with the merged final product.
3.2.1 Kalman Filter Design
 In general terms, the KF assumes that the true state at time d (Xd) evolves from Xd-1 following:
where F indicates the forecast model and η is serially uncorrelated, mean-zero Gaussian noise with variance Q. Observations are related to X according to
where H is the observation model that maps the state space into the observation space and ν is serially uncorrelated, mean-zero Gaussian noise with variance R.
 In this particular application, X is defined as time-varying error in a given NWP model parameter (P) or:
where P refers to either the amplitude or the minimum of the diurnal temperature cycle, as will be detailed in sections 3.2.2 and 3.2.3. Note that since the application of the KF is completely off-line from the NWP modeling, the variable is wholly deterministic and can therefore be legitimately differenced with to define a new target state variable. This particular form of (13) eliminates the need to run a full NWP model in the analysis. Instead, the forecast model is based on a simple auto-regressive update:
where is the forecasted state at time d, evolving from the updated analysis state according to γ, a fixed parameter between 0 and unity.
 Assimilated observations Y are taken from the difference between NWP and satellite-based P at time d:
 At times where Y is available, the KF updates the forecasted state estimate with Yd to obtain the updated analysis state estimate, :
The Kalman gain, Kd, determines the size of the update depending on the relative magnitudes of R and the error variance of the forecast model ():
This Kd is further used to calculate the error variance of () following
After application of equations (17) and (18), is forecasted forward in time using equation (14), and via:
At times when no observation is available to constrain the state estimate, the analysis will be equal to forecast (), while the modeled state will simply decay to zero at a rate determined by γ in equation (14). By formulating the forecast model in terms of parameter error in this way, the continuity of the original temperature record is preserved, which is of great benefit for retrospective studies.
 For KF to work optimally, it is critically important that the covariance between noise in the forecast model (η) and error in the assimilated observations (ν) is zero. To demonstrate that our definitions of X and Y are in agreement with this assumption, equations (13) and (15) are inserted in equation (12):
Further considering H = 1, and reordering yields
Therefore, even though the observations are defined in terms of PNWP, the observation error term νd is solely attributable to error in PSAT and wholly independent of both PNWP and ηd. It should also be stressed again that the forecast model for this KF is the auto-regressive model as defined in equation (14). No analysis results are fed back into the NWP data, which is treated as a completely deterministic time series variable. All analyzed error estimates are instead used to provide an improved temperature estimate for secondary applications, off-line from the actual NWP model. The implementation of the KF is detailed below for both the amplitude and the minimum of the diurnal temperature cycle, including the estimation of the error model parameters.
3.2.2 Implementation of Kalman Filter for Amplitude (KF/A)
 The first Kalman filter (KF/A) is used to reduce the random error in amplitude of the diurnal temperature cycle (A) in the off-line NWP temperature output. Here the state is therefore defined as the true error in A of the deterministic parameter TNWP. A is calculated as the difference between consecutive overpasses at the 1 AM/PM overpass times of AMSR-E and supplemented by TMI whenever possible, as described in section 3.1.3. By using this approach, an observation of the amplitude is possible twice a day, measuring both the diurnal warming up and cooling down of the land surface. Therefore, the analysis interval for the amplitude filter is set at 12 hours.
 The observed offset in the case of the KF/A (YA) is calculated by taking the difference between the satellite amplitude and the NWP amplitude:
 By implementing the KF/A according to equations (13)–(19), these observations are used to obtain an updated estimate of offset of NWP A (), and this estimate is used (off-line) to correct the amplitude of the model and calculate the updated T′ as
 The γ, Q, and R parameters for KF/A were optimized using Fluxnet in situ data and listed in Table 4. The sensitivity of the filter to the autocorrelation γ is limited in this study, partly because observations are available for most analysis windows. However, using the average autocorrelation calculated for the true offset (ANWP − ATruth) as a guide, we approximate γ =0.5 (for a 12 hour interval). Similarly, based on the variance of this true offset, we assigned Q equal to 1 K2. The absolute value of Q is not important however since our KF is only sensitive to the ratio Q /R.
Table 4. Parameters of the Kalman Filter for Diurnal Amplitude (KF/A) and Daily Minimum (KF/M) Temperature
γ : Autocorrelation of model error (-)
Q : Variance of model error (K2)
R : Variance of observation error (K2)
(0.8 + median s)2
Analysis interval (hours)
 To find the optimal observation error variance R, when Q = 1 [K2], the updated temperature was calculated for 30 scenarios with values of R between 0 and 6 K2. The value of R that results in highest improvements in root mean square error (RMSE) relative to the in situ data (based on the full year of data) are used to parameterize R as a function of the median s of the AMSR-E observations obtained from equation (2). The optimized R increases with s, and this finding is further supported by the calculated ratio of RMSE between satellite and NWP with in situ data. Figure 7a shows the RMSE ratio as calculated for each station, together with the optimized relationship of √R with s. Since in a real world application little site specific a priori knowledge of R will be available, we used this relationship to parameterize R as a function of s (see Table 4).
3.2.3 Implementation of Kalman Filter for Daily Minimum (KF/M)
 After correcting for the error in amplitude, all remaining differences between model and observations are attributed to the offset in the minimum daily temperature. Therefore, the observed offset in daily minimum (YM) is the average of the differences between TSAT and T′—obtained from (23) —within the daily analysis window. Unaccounted errors in the shape of the diurnal temperature cycle may cause biases in individual observations relative to the peaks and troughs of the true diurnal temperature cycle. To minimize this, only the mean of ascending and descending data pairs of AMSR-E and SSM/I are considered so that YM is calculated as
 By implementing the KF/M according to equations (13)–(19), these observations were used to update the estimate of the offset in M () and used to calculate the final merged temperature T″ as:
 The autocorrelation for KF/M is found to be higher than that found for KF/A, and set at γ = 0.5 for a 24-hour interval. As with KF/A, the value of R relative to a fixed Q = 1 [K2] is optimized to yield highest improvements in root mean square error with the in situ data. A wide range of R values were found with no apparent explanatory relationship (Figure 7b). Therefore, R was set at a constant 4 K2 and compared to the calculated RMSE ratio with in situ data (shown in Figure 7). Contrary to amplitude results in KF/A, optimized R values for KF/M are generally much higher than the ratio of the RMSE values (Sat/NWP) with in situ data, reflecting a poorer performance of the auto regressive error model.
 The procedure outlined in section 3 was applied to the 17 Fluxnet stations described in section 2.2 and Table 1. As an example of KF results, Figure 8 shows time series of the effect of preprocessing and assimilation of observations for a 7-day period at station B. This small subset of the full data set shows the effect of the amplitude correction (top), the eduction in random error in amplitude through KF/A (day 3, bottom), and a correction of the daily minimum through KF/M (day 5 and 6, bottom). Note that the station is located too far north for TMI and that the WindSat satellite is shown but not used in the analysis.
 The Fluxnet stations cover a range of land use conditions and climate types, and full results at these stations are discussed in section 4.1. However, because the same stations were used for the calibration of the filters, the validation is supplemented with an independent set of soil temperature sensors from the Oklahoma Mesonet. The results of this validation exercise are discussed in section section 4.2. The spatial area, period, and datasets associated with each calibration and validation component of this study are summarized in Table 5.
Table 5. Summary of Datasets Used for Each Part of the Study
 The Fluxnet long wave temperature data are used as a first assessment of added value of the proposed temperature assimilation scheme. These validation results cover the period March through November 2005. As described in section 3.1.1, all temperature records including the original TNWP and in situ data are adjusted to correspond to the timing of the Ka-band sensor. This step is considered to account for a limitation that results from the experiment setup, with differences between sensing depths of in situ and satellite observations, and model depth. For this reason, it is not evaluated here as part of the improvements from temperature merging.
 The validation results of low and high vegetation stations were aggregated for each 3-hour window to give a consolidated view of the impacts of the mean amplitude correction, the KF/A, and the KF/M, respectively. Figure 9 shows the results in terms of RMSE (in green) and SEE (in blue). The top row shows these metrics for baseline comparison of against (dashed line) and the merged temperature T″ against (solid line). Rows 2–4 show the incremental effect of each processing step.
 The effect of the removal of amplitude bias between in situ, NWP model, and satellite is shown in row 2. By design, night-time temperatures are not affected. In contrast, in the middle of the day, the amplitude correction reduces the RMSE by 25%–30%. Since the average amplitudes of NWP and Ka-band are more similar than that of the in situ data (as shown in 3.1.3), this large effect is mainly due to the scaling of the validation target to the amplitude of the Ka-band observations. This means that these reductions in error may represent in part an improvement in merged temperature, but mostly an improvement in reducing the structural differences between the validation target and the merged estimate. Still, it is worth noting that over half of the daytime increase in error (as compared to night time lows) is removed by a single time-constant amplitude correction.
 Compared to the effect of the amplitude correction, the reduction in random error, via application of KF/A and KF/M, is relatively small (note the change in y-axes scale for rows 3 and 4 of Figure 9). Nevertheless, these improvements represent unambiguous further gains in performance. As shown in row 3, the KF/A has the most effect during the day as might be expected and peaks at a 0.2 K reduction in SEE. The KF/M, in row 4, has more divergent results for the low and high vegetation sites. For the latter, it is more constant over the day and reduces the RMSE by about 0.1–0.15 K, slightly less in SEE. For the low vegetation sites, the KF/M reduces the error more during the day and has much larger reduction in RMSE than in SEE.
 More detailed validation statistics for all stations together and aggregated for the 1-hour windows of 0–1 a.m, 6–7 a.m., and 0–1 p.m. are listed in Table 6. For 0–1 p.m., this shows that the RMSE is reduced from 4.3 K to 2.5 K. The amplitude scaling reduced the RMSE by 36%, and the two filters reduce the RMSE an additional 10%. For the two night-time windows of 0–1 a.m. and 6–7 a.m., the amplitude scaling has no effect, and the filters reduce the RMSE by 5%, from 2.2 K to 2.1 K. The results are further expressed in terms of r2, SEE, and ubRMSE, as listed in Table 5.
Table 6. Summary of the Validation Results as Aggregated for Selected Hours
Fluxnet (17 stations, 2005)
Oklahoma Mesonet (63 stations, 2009)
4.2 Validation With Soil Temperature at 5 cm
 During the first step in the preprocessing, the timing of the TNWP is corrected to match that of TB,KaV. As such, the merged temperature T″ is related to the same surface layer as the Ka-band microwave emission. For Ka-band, this is the uppermost layer in the sensor's field of view (bare soil, vegetation, or any combination of land covers) and is often referred to as surface or skin temperature. With an application for L-band soil moisture retrievals in mind (requiring soil temperature of a layer of 5–30 cm depth), the temperature at 5 cm (T″5) is modeled from T″. Modeling this deeper temperature from the surface estimate is greatly facilitated by the availability of a continuous record. Here, we can use the same approach as described by Holmes et al.,  and previously utilized in section 3.1.1. For this validation study, the original TNWP and the intermediate products were all synchronized to match the mean phase of the in situ temperature at the nominal depth of 5 cm (T5IS), this is indicated by the subscript “5.” No local information is needed other than the (regional) mean phase difference between each temperature record and the in situ temperatures. These Mesonet validation results cover the period April through October 2009.
 Figure 10 shows the validation results of T″5 compared to 5-cm temperatures observations from the Oklahoma Mesonet (see Table 6 for detailed statistics). The results are shown in terms of SEE and ubRMSE. The ubRMSE was calculated by removing the mean bias within each hour window, effectively removing residual bias in amplitude. Looking at ubRMSE instead of RMSE is necessary to distinguish the residual bias from random component to correctly attribute the analysis improvements. This residual amplitude difference (found also for NCEP and ECMWF, not shown) might be related to vegetation density and requires further investigation, but it is outside the scope of this paper.
 As in the Fluxnet analysis, the improvement due to preprocessing dominates, especially during the day. Because both and the merged T″5 are synchronized to the phase of T5IS, this is purely a result of the scaling of the model amplitude to the mean of the Ka-band data which itself has not been altered. The effect of the filters is smaller but still demonstrates skill during the day. The KF/A increases the r2 of the amplitude with in situ data from 0.60 to 0.65. In the early morning, the combined effect of both preprocessing and filters is a 0.1 K reduction of ubRMSE at 1 AM, whereas during the day, it reduces the ubRMSE by 0.4 K. The KF/A and KF/M together reduce the RMSE by 7% at 0–1 p.m., which is slightly less than what was found for the Fluxnet validation. This may be due to the fact that lower frequency signals become more dominant at 5 cm depth when compared to the surface temperature, reducing the impact of short term random errors.
 NWP temperature estimates are increasingly used as auxiliary data to interpret satellite observations or directly as input to secondary model studies. Because the surface temperature is highly variable at short temporal scales, it typically is poorly constrained in NWP models. This paper describes a methodology to merge multi-satellite Ka-band observations with an NWP temperature product and thereby improve its accuracy and utility for secondary applications.
 Preprocessing involves minimizing the differences in phase and amplitude between NWP output and satellite Ka-band observations. Two off-line Kalman filters are then used to reduce random errors in the amplitude and the minimum of the diurnal temperature cycle. An autoregressive model that describes the offset in NWP temperature with the truth is used as forecast model and updated by assimilating observations from four different satellite sensors.
 An analysis of Fluxnet long wave radiation data was used for calibration and initial validation of the model. This indicates that error reductions of 5%–10 % in surface temperature are possible. A validation against in situ measurements at 5 cm demonstrates KF reduces the random noise of the temperature around 8% during the day-time but not during the night. The effect of the preprocessing is much larger, on the order of 20%–35% during the day, but this may partly reflect the improvement in validation target by rescaling it to match the satellite observations. The Ka-band observations improve the amplitude of the diurnal temperature cycle in particular.
 Future studies will need to investigate the dependency of this result to vegetation density, accuracy of NWP temperature, and the temporal availability of satellite observations. The KF results may be improved if the spatial variation of the relative skill of the observations and the model can be better predicted, particularly the constant value of R used in this study for the KF/M.
 The result of the merger of the datasets is a continuous record of model based surface temperature that is constrained in its diurnal cycle by satellite estimates. Such a record can benefit the analysis of surface heat exchanges. Furthermore, because the record is continuous in time, it can be used to estimate the temperature at different depths within the canopy/soil surface through established heat flow modeling techniques, as demonstrated by a validation with 5 cm soil temperature.
 This work was funded by NASA through the research grant “The Science of Terra and Aqua” (NNH09ZDA001N-TERRAQUA). The authors would like to thank the Global Modeling and Assimilation Office (GMAO) and the GES DISC (both at NASA Goddard Space Flight Center) for the dissemination of MERRA and the taxpayers of the State of Oklahoma for funding the Oklahoma Mesonet through the Oklahoma State Regents for Higher Education and the Oklahoma Department of Public Safety. This work used data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program (DE-FG02-04ER63917 and DE-FG02-04ER63911)), CarboEuropeIP, CarboItaly. USDA is an equal opportunity provider and employer.