Evaluation of the North American Land Data Assimilation System over the southern Great Plains during the warm season



[1] North American Land Data Assimilation System (NLDAS) land surface models have been run for a retrospective period forced by atmospheric observations from the Eta analysis and actual precipitation and downward solar radiation to calculate land hydrology. We evaluated these simulations using in situ observations over the southern Great Plains for the periods of May–September of 1998 and 1999 by comparing the model outputs with surface latent, sensible, and ground heat fluxes at 24 Atmospheric Radiation Measurement/Cloud and Radiation Testbed stations and with soil temperature and soil moisture observations at 72 Oklahoma Mesonet stations. The standard NLDAS models do a fairly good job but with differences in the surface energy partition and in soil moisture between models and observations and among models during the summer, while they agree quite well on the soil temperature simulations. To investigate why, we performed a series of experiments accounting for differences between model-specified soil types and vegetation and those observed at the stations, and differences in model treatment of different soil types, vegetation properties, canopy resistance, soil column depth, rooting depth, root density, snow-free albedo, infiltration, aerodynamic resistance, and soil thermal diffusivity. The diagnosis and model enhancements demonstrate how the models can be improved so that they can be used in actual data assimilation mode.

1. Introduction

[2] Being able to accurately forecast weather is one of the fundamental goals of the science of meteorology. Understanding the governing physical laws of atmospheric motion and the interaction between the atmosphere and the other components of the climate system is the key to improving our forecast skill. Owing to the chaotic nature of the climate system, a precise weather forecast beyond about 2–3 weeks is impossible [Lorenz, 1963]. However, short-term weather forecasts can be reasonably accurate using numerical weather prediction models developed based on the governing physics. Evolution of low-pressure systems and changes of temperature can be predicted fairly well, but precipitation (including precipitation type, precipitation rate, location, and duration), possibly the most important quantity, is difficult to predict accurately.

[3] Mitchell et al. [2003] show an example of the difference in modeled precipitation fields and observations across the continental United States in their Figure 2. Weather forecast models, such as Eta, and its data assimilation system, the Eta Data Assimilation System (EDAS), have difficulties accurately simulating precipitation, including location, amount, and type. This deficiency is especially large during the warm season (from late spring to early fall), and is linked to interactions between the atmosphere and underlying land surface. These interactions can trigger mesoscale circulations [Weaver and Avissar, 2001], can change the planetary boundary layer [Betts et al., 1996, 1997, 1998], and can induce local water recycling [Dirmeyer and Brubaker, 1999]. The land surface is now considered an important part of this coupled system, as the coupling between the atmosphere and the land surface plays a key role in convection and precipitation distribution. During the warm season, the coupling is enhanced by larger sensible and latent heat fluxes. Therefore accurate information about land surface conditions (including the mean and spatial distribution) becomes crucial for weather forecasts. To obtain accurate and near-real-time land surface conditions through soil moisture monitoring networks over large regions is almost impossible, since such networks do not exist. A feasible approach is to simulate the land surface conditions in a data assimilation system where observed and modeled atmospheric information is used to drive a land surface model to calculate land surface conditions. The North American Land Data Assimilation System is a multiinstitutional project focusing on providing accurate land surface information by developing such a data assimilation system [Mitchell et al., 2000; Mitchell et al., 2003].

[4] The objective of this study is to evaluate the performance of the NLDAS land surface models. Given the aforementioned deficiency in precipitation forecasting, we focus our evaluation on the warm season. Section 2 gives a brief description of NLDAS and the NLDAS models. Section 3 describes the observational data used in this study. A comprehensive comparison is shown in section 4, followed by further analysis and discussion in section 5. The conclusions are presented in section 6.

2. NLDAS Models

[5] NLDAS is an off-line data assimilation system; land surface models are driven by atmospheric forcing and there is no feedback from the land surface to the atmosphere. Land surface models are run over the NLDAS domain with a 1/8° latitude-longitude resolution. The atmospheric forcing comes from the EDAS and observations [Cosgrove et al., 2003]. EDAS provides the baseline of the atmospheric forcing variables. Two major components of the forcing, precipitation and solar radiation, are provided by actual observations rather than model output: a unified observational precipitation analysis [Higgins et al., 2000] and satellite retrieval [Pinker et al., 2003], respectively. The 3-year retrospective run starts October 1996 and ends September 1999 with the first year considered as a spinup period. (The NLDAS spinup issue is described by Cosgrove et al. [2003].)

[6] Four state-of-the-art land surface models (Noah, Mosaic, variable infiltration capacity (VIC), and Sacramento model (SAC)) are currently implemented in the NLDAS system. These models represent different approaches to land surface modeling. The Mosaic and Noah models grew from the legacy of the atmospheric surface-vegetation-atmosphere transfer (SVAT) scheme community of coupled modeling, and VIC and SAC grew out of the hydrological community as uncoupled models. Since then, Mosaic, Noah, and VIC have come to be executed extensively in both coupled and uncoupled mode on various spatial scales, so that at present, all three models can be considered as SVATs and semi-distributed hydrological models. Similarly, SAC grew out of the hydrological community as a conceptual hydrology model, executed and highly calibrated on a non-gridded lumped basis for individual catchments, but has since been converted to a semi-distributed gridded version of SAC for testing on larger scales than catchments, in such projects as NLDAS here.

[7] The Mosaic land surface model developed by Koster and Suarez [1992, 1994] is a SVAT scheme that accounts for the sub-grid heterogeneity of vegetation and soil moisture with a “mosaic” approach. Energy balance and water partition are independently calculated at each patch (or tile) in one grid cell. These tiles are based on surface vegetation type. Grid values are computed as a weighted average of variables at all the tiles. The Mosaic model participated in the Project for Intercomparison of Land-surface Parameterization Schemes (PILPS), the GEWEX-sponsored land surface scheme intercomparison project [Henderson-Sellers et al., 1993], and over the course of the experiment proved quite effective in reproducing observed energy and water budgets. When implemented into NLDAS, the model has been configured with each tile having three soil layers with the same soil type and thicknesses of 10, 30, and 160 cm. This configuration was chosen to make it easier to compare with other models and observations.

[8] The Noah land surface model [Chen et al., 1996; Koren et al., 1999; Ek et al., 2003; Mitchell et al., 2002] is also a SVAT. It includes packages to simulate soil moisture (both liquid and frozen), soil temperature, skin temperature, snow depth, snow water equivalent, canopy water content, and the energy flux and water flux terms of the surface energy and water. The Noah surface infiltration scheme follows that of Schaake et al. [1996] for its treatment of the subgrid variability of precipitation and soil moisture. Since January 1996, the Noah model has been coupled to (and upgraded with) the operational Eta mesoscale forecast model at the National Centers for Environmental Prediction (NCEP), and the companion EDAS. The NCEP operational version of the Eta/EDAS suite as of 19 June 2002 includes the version of the Noah model being evaluated in the NLDAS control runs in this paper. Moreover, this same version is being executed in the 25-year, 32-km, Eta/EDAS-based Regional Reanalysis now underway at NCEP. For over 10 years since the early 1990s, the Noah model has been upgraded continually by NCEP and its collaborators, starting from the late 1980s version of the Oregon State University (OSU) land model. The Noah model has been evaluated extensively in both uncoupled mode [Chen et al., 1996; Wood et al., 1998; Chen and Mitchell, 1999; Schlosser et al., 2000; Luo et al., 2003a; Bowling et al., 2003] and in coupled mode with a land-surface emphasis [Betts et al., 1997; Yucel et al., 1998; Berbery et al., 1999; Hinkelman et al., 1999; Berbery et al., 2003; Ek et al., 2003]. In both modes, the Noah model has proven quite effective in reproducing observed energy and water budgets without the complexity of tiling.

[9] The variable infiltration capacity (VIC) model [Liang et al., 1994, 1996a, 1996b] is a semi-distributed grid-based hydrological model. As compared to other SVATs, VIC's distinguishing hydrologic features are its representation of subgrid variability in soil storage capacity as a spatial probability distribution, to which surface runoff is related [Zhao et al., 1980], and its parameterization of base flow, which is represented in the lower soil moisture zone as a nonlinear recession [Dumenil and Todini, 1992]. Subgrid-scale variability in soil properties is represented in VIC by a spatially varying infiltration capacity. Thus the spatial variability in soil properties and topographic effects at scales smaller than the grid scale are represented statistically, without assigning infiltration parameters to specific subgrid locations. In NLDAS, three soil layers were used, with the top layer the thinnest (10 cm) and varying depths of the other layers over different regions. The model closes the surface energy budget by iterating for surface temperature. The model has been widely applied to large continental river basins, for example, the Columbia [Nijssen et al., 1997], the Arkansas-Red [Abdulla et al., 1996; Wood et al., 1997], and the Upper Mississippi [Cherkauer and Lettenmaier, 1999], as well as at continental scales (the 50-year retrospective NLDAS activities [Maurer et al., 2002; Roads et al., 2003]) and global scales [Nijssen et al., 2001]. It has also participated widely in intercomparison projects like PILPS.

[10] The Sacramento model (SAC) is a storage-type model, which is conceptually different from other SVATs. It is usually run together with the SNOW-17 model, both part of the National Weather Service River Forecast System [Burnash et al., 1973; Anderson, 1973]. SAC is a conceptual rainfall-runoff model. It has a two-layer structure, and each layer consists of tension and free water storages. The free water storage of the lower layer is further divided into two sub-storages that control supplemental (fast) and primary (slow) groundwater flows. The basic inputs needed to drive SAC are rain plus snowmelt from SNOW-17 and potential evapotranspiration. The outputs include estimated evapotranspiration and runoff. For the NLDAS runs, the potential evaporation was obtained from the Noah model output. Both SAC and SNOW-17 are controlled by numerous model parameters. SNOW-17 has 12 parameters while SAC has 16 parameters. For the NLDAS runs, the SNOW-17 model parameters were prescribed with a uniform set of parameters for the entire modeling domain. The SAC parameters were determined by using a priori parameters developed by Koren et al. [2000]. The a priori parameter estimates were derived by relating the SAC model parameters to the 1-km State Soil Geographic Database (STATSGO) and Soil Conservation Service Curve Numbers. Owing to the significant difference in model structures and the lack of energy components in SAC, we focus on the other three models in energy-related comparisons, but include SAC in the soil moisture studies.

3. Observations

[11] To evaluate NLDAS, we need detailed, high-quality observations. The southern Great Plains (SGP) region is probably the most intensively observed region in the world, with both the ARM/CART and Oklahoma Mesonet networks (Figure 1).

Figure 1.

Observational network coverage of the southern Great Plains region. The 114 Oklahoma Mesonet stations are indicated by circles. Among them, 72 stations with soil moisture and soil temperature observations, indicated by open circles, were used in this study. The 24 ARM/CART stations are indicated by triangles. Stations used as examples in paper are indicated with their names. The background is the most predominant surface soil type, as specified by NLDAS (O, other; B, bedrock; W, water; OM, organic materials; C, clay; SiC, silty clay; SC, sandy clay; CL, clay loam; SiCL, silty clay loam; SCL, sandy clay loam; L, loam; Si, silt; SiL, silty loam; SL, sandy loam; LS, loamy sand; S, sand).

[12] The Oklahoma Mesonet is a mesoscale meteorological monitoring network [Brock et al., 1995]. The 115 stations cover every county of the state of Oklahoma, taking observations of conventional meteorological variables. Among them, more than 72 stations are also equipped with soil moisture sensors. These heat dissipation sensors are installed at four depths: 5 cm, 25 cm, 60 cm, and 75 cm below the surface at most stations [Basara and Crawford, 2000]. The sensors measure the temperature change (ΔT) over time after a heat pulse is introduced. Data from the sensors are carefully calibrated to provide estimates of soil water potential as a function of that ΔT. The volumetric soil water content (soil moisture) is then estimated based on a soil water retention curve that is measured and calibrated in the laboratory using a combination of published techniques and empirical corrections to these techniques. The retention curve is soil-specific and is highly dependent upon soil properties. Comparison between the soil water content estimated from the sensors with gravimetric sampling of the soil profile and neutron probe measurements has indicated that the estimates of soil water content using ΔT and the combined techniques are quite reasonable for most stations. However, the sensor values tend to overestimate soil water content slightly under very dry soil conditions. Because the sensor measures temperature change before and after a heat pulse is applied, the “pre-heating” temperature can be considered as measurement of soil temperature, accurate to 0.5°C.

[13] Soil moisture and temperature observations from the Mesonet are the major source of data we use in evaluating land surface conditions. Figure 2 is an example of observed volumetric soil moisture from the Mesonet stations at Bixby, Oklahoma, and Catoosa, Oklahoma. The soil moisture profile is estimated from the four observations at different depths. There is a consistency between the local precipitation and soil moisture. Heavy precipitation events can recharge the deep soil moisture quickly, while light precipitation events can only wet the surface. There is a seasonal variation of soil moisture with the driest condition during July and August. Soil moisture does not change very much during winter. Bixby has an impermeable layer at the bottom just below the lowest sensor, explaining the high value of soil moisture there. Catoosa has bedrock, so there is no sensor at 75 cm. The bottom panel of Figure 2 is a snapshot of the observed soil moisture map for the top 40 cm on July 19, 1998. Cressman objective analysis [Cressman, 1959] is used here to interpolate the observations from individual stations to make up the map. At this time, the soil moisture field across the region is relatively homogeneous. There are a few stations where soil is much wetter than the other stations, one of which is Catoosa. This deviation from the average field at a moment might be related to precipitation distribution as well as soil texture and vegetation cover. However, since the deviation is fairly systematic throughout the 21-month period, it must be explained by soil textures at these stations. Catoosa has silty clay loam at 5 cm and 25 cm, and clay at 60 cm. These soil types would tend to retain quite a bit of water. Bixby has quite sandy soil. There is some similarity between the bottom panel of Figures 2 and 1, where the predominant soil type of the top layer is plotted. For example, the sandier soils (darker blue in Figure 1) correspond to the drier regions (brown in Figure 2).

Figure 2.

Examples of the observational data set from Oklahoma Mesonet stations. Precipitation and soil moisture observations from the Mesonet stations at Bixby, Oklahoma, and Catoosa, Oklahoma, for the period of January 1998 to October 1999. At the bottom is a snapshot of the top 40-cm soil moisture field (% by volume) over the SGP region based on 72 Oklahoma Mesonet observations on July 19, 1998. Cressman objective analysis is used to interpolate station observations to obtain the map.

[14] Soil moisture is also measured at some of the ARM/CART Extended Facilities (EF). The soil water and temperature system (SWATS) uses the same type of sensor as used at the Mesonet stations to obtain the soil moisture estimates, but the sensors are installed at eight depths. Since the number of SWATS stations is much smaller than Mesonet stations and there are many missing or bad data, we do not use them in this study.

[15] Surface radiation and turbulent fluxes are also observed at the ARM/CART site by different instruments. The Solar and Infrared Radiation Station (SIRS) instruments observe shortwave and longwave radiation for both the upwelling and downwelling components every minute at many ARM/CART extended facilities. These measurements allow us to validate the NLDAS forcing [Luo et al., 2003b] as well as to evaluate the models' performance in radiative flux simulations. The Energy Balance Bowen Ratio (EBBR) system is a ground-based system using in situ sensors to estimate the vertical fluxes of sensible and latent heat. It is installed at 14 grassland locations within ARM/CART. Flux estimates are made from observations of net radiation, soil heat flow, and the vertical gradients of temperature and relative humidity using the Bowen ratio energy balance technique [Brutsaert, 1982]. A bulk aerodynamic approach is applied when the Bowen ratio is between −0.75 and −1.5 to reduce the errors occurring when Bowen ratio has a value near −1 [Wesely et al., 1995]. Net radiation is also observed at the EBBR stations, but the accuracy of SIRS-observed net radiation when SIRS is co-located with EBBR is believed to be better. Although this measured flux is only representative of the grassy area within about 50 m of the EBBR stations, it still can provide very valuable information in the validation process.

[16] We used observations from the 24 ARM/CART extended facilities listed in Table 1. When we use shortwave and longwave radiative flux measurements, we use all observations from all extended facilities with SIRS instruments, but when we study surface turbulent fluxes, we only use extended facilities that have EBBR instruments. Thus we are able to obtain as much information as we can, but a precise energy closure is not necessarily achieved. Figure 3 is an example of the observed fluxes from two ARM/CART extended facilities for one summer day (July 19, 1998). Both of these stations are equipped with SIRS and EBBR instruments. Although they have a similar surface type and are not very far from each other, the observed surface fluxes can be very different, especially for latent and sensible heat flux. The difference in noontime latent heat can be as large as 200 W m−2. There is a tremendous spatial variability in surface fluxes, just as in soil moisture. Overall, the soil moisture and soil temperature observations from Mesonet and ARM/CART SWATS, and surface energy fluxes from ARM/CART SIRS and EBBR, provide a valuable validation data set over the SGP region.

Figure 3.

Example of the observational data set from ARM/CART site. A typical summer diurnal cycle of radiative and surface turbulent fluxes measurement by ARM/CART SIRS and EBBR instruments. Solid black line is the downward solar radiation and dashed black line is the reflected shortwave radiation. Solid green line is the downward longwave radiation and dashed green line is the upward longwave radiation. Blue curve is the surface turbulent flux of latent heat (positive upward) and red curve is the surface turbulent flux of sensible heat (positive upward). Purple curve is the ground heat flux (positive upward). These time series were constructed based on hourly averages of the observations. Panels a and b are observations from two ARM/CART extended facilities.

Table 1. ARM/CART Extended Facilities and Radiation Data Used in This Studya
Station IDStation NameStateSurface TypeSIRSEBBR
  • a

    Solar and Infrared Radiation Station (SIRS) and Energy Balance Bowen Ratio (EBBR). KS, Kansas; OK, Oklahoma.

EF-3Le RoyKSwheat and soy beansyesno
EF-4PlevnaKSrangeland (ungrazed)yesyes
EF-7Elk FallsKSpastureyesyes
EF-8ColdwaterKSrangeland (grazed)yesyes
EF-12PawhuskaOKnative prairieyesyes
EF-13LamontOKpasture and wheatyesyes
EF-18MorrisOKpasture (ungrazed)yesyes
EF-19El RenoOKpasture (ungrazed)yesyes
EF-22CordellOKrangeland (grazed)yesyes
EF-23Fort CobbOKpasturenono

4. Comparisons of Model Output With Observations

[17] A direct comparison between the modeled soil moisture field and observations is possible at all 72 Mesonet stations and the ARM/SWATS stations, but these comparisons may suffer from scale incompatibility problems. Vinnikov et al. [1996], Entin et al. [2000], and Crow and Wood [1999] all show that soil moisture spatial variation has a small scale related to hydrological processes and soil characteristics. But Vinnikov et al. [1996] and Entin et al. [2000] also point out that a larger scale of soil moisture variation exists due to the control of the atmosphere on the land surface. A direct comparison between modeled soil moisture for a 11 × 14 km grid box and soil moisture observations at a point is a little ambiguous, and the natural variability of soil moisture adds to this uncertainty. Spatial averaging is a simple way to get around this problem and has been used in many similar validation and evaluation studies [Robock et al., 1998; Entin et al., 1999; Srinivasan et al., 2000]. Spatial and temporal averaging reduces the spatial and temporal variability and gives us a more meaningful state of soil moisture, soil temperature, and surface fluxes. In this study, we also take this approach.

[18] When we compare model output with observations, the spatial averaging is done in such a way that the sample sizes from two data sets are the same. In other words, only model values when the respective observations at the same location and same time are available are taken into the averaging. This is true for all the comparisons in this study unless specially noted. Since the number of stations where observations of different variables are taken is different, we average all available observations from all stations to maximize our sample size.

[19] During the validation process, we found some differences between the ways that models use the atmospheric forcing. Before we look at the comparison, we first describe this potential problem. NLDAS retrospective forcing consists of instantaneous values of conventional meteorological variables and solar and longwave radiation, as well as the hourly accumulated precipitation for the previous hour. Figure 4a shows the incident solar radiation at one location during an idealized clear day. The area under the curve represents the total amount of energy that is provided to the land surface, which is shown in Figure 4b. While NLDAS retrospective forcing provides values at each hour with instantaneous values, models do not necessarily use the same time step in their integration. Currently, VIC uses a 1-hour time step, and it does not do any interpolation when using the NLDAS forcing. VIC takes the instantaneous values of solar radiation at the end of each hour and uses them directly. Effectively, it uses the instantaneous value as an “average” for the 1-hour time step. It is clear from Figure 4a that VIC gets more energy in the morning and less energy in the afternoon, which is equivalent to a phase shift of about 30 min. Noah and Mosaic both use 15-min time steps, and they interpolate the hourly forcing into 15-min intervals. During this process, different models take different approaches. For conventional meteorological variables and longwave radiation, Mosaic uses a simple linear interpolation between the two hours. For solar radiation, a solar zenith angle interpolation scheme is implemented based on a calculation of the zenith angle, which tends to minimize the error introduced by the simple linear interpolation scheme. As shown in Figure 4a, this method successfully reproduces the “truth” and the difference in total energy received is much smaller. Noah interpolates all the variables in a “bi-linear” fashion. Noah considers the solar radiation in the forcing as an hourly average for the previous hour instead of instantaneous value at the end of the hour as it should. To conserve energy, Noah uses a bi-linear interpolation algorithm so that four values within 1 hour will provide the same amount of energy to the model as what it assumes in the forcing. Because it conserves energy in this way, the total amount of energy Noah receives each hour is exactly the same as what VIC receives, but the energy accumulation for Noah and VIC (Figure 4b) has a small phase shift. Although models are forced with the same forcing data, the way they interpret and use the forcing can have subtle differences. The problem illustrated here might not necessarily create huge differences in their simulations, but it is an issue that must be addressed in the future in this project and any other model intercomparison projects.

Figure 4.

Schematic demonstration of the difference in using hourly NLDAS forcing and resulting phase shift in models' energy terms. The synthetic solar radiation curve is calculated based on simple spherical geometry without any consideration of atmospheric absorption or cloud reflection. The blue and green lines (Noah and VIC) overlap in Figure 4b.

4.1. Net Radiation

[20] Luo et al. [2003b] show that the radiative forcing provided by NLDAS for the period of interest agrees fairly well in the SGP region. Here we investigate whether the models partition the incoming energy in a similar way to that observed in the real world.

[21] Figure 5 presents the monthly energy budget from three of the four models for the period of interest. Monthly net radiation from observations and each model simulation was averaged over all the ARM/CART SIRS locations. Surface turbulent latent and sensible heat fluxes and ground heat flux were similarly averaged, but over all ARM/CART EBBR locations. Because the number of SIRS and EBBR stations is different and occasionally they are not co-located, the energy budget will not be precisely closed based on this averaging, but the discrepancy is less then 20 W m−2 in most months. We try to use as many observations as possible due to the limited number of observations in this region. Net total radiation from the observations is calculated based on the upward and downward components of the shortwave and longwave radiation measured by SIRS. At a monthly timescale, only the seasonal cycle is clearly shown. For the 21-month period, the maximum net radiation is about −160 W m−2 in June and July and the minimum net radiation is about −10 W m−2 in December. All modeled net radiation closely follows the observations except for this month, but Mosaic tends to overestimate the net downward radiation, especially during the summer months. From May 1998 to August 1998, Mosaic overestimates the net downward radiation by about 10 W m−2. It also slightly overestimates net radiation for the summer of 1999.

Figure 5.

Surface energy budget at monthly timescale over the SGP region based on available observations from ARM/CART site. Black lines are observations, averaged over all available ARM/CART stations. Red lines are for the Mosaic model, blue for Noah, and green for VIC. Values for each model are calculated based on a subset of model results that has exactly the same sample size as the observations, and are clean comparisons to the observations. All fluxes are positive upward.

[22] Figure 6 shows the monthly mean diurnal cycle of each energy term for two representative months from the entire period, July and September 1999. In July 1999, Mosaic systematically overestimates the net downward radiation for the second half of the daytime, while VIC and Noah slightly underestimate it and exhibit the phase shift discussed above. Except near noon time, Noah and VIC have almost identical net radiation during the day time. At night, VIC has a larger upward longwave radiation, which causes a larger positive net radiation.

Figure 6.

Monthly mean diurnal cycle of surface net radiation (R), latent heat flux (LE), sensible heat flux (H), and ground heat flux (G). The black lines are observations based on all available observations from ARM/CART stations with SIRS and EBBR instruments. Red lines are for the Mosaic model, blue for Noah, and green for VIC. Values for each model are calculated based on a subset of model results that has exactly the same sample size as the observations, and are clean comparisons to the observations. All fluxes are positive upward.

[23] Overestimation of net radiation in off-line simulations is normally linked to either a low surface albedo, which affects upward shortwave radiation, or a possible colder radiative skin temperature, which affects the upward longwave radiation. Since the summer surface albedo does not involve snow in this region, the bias in skin temperature simulation should be the reason, and this is shown in the next section.

4.2. Radiative Skin Temperature

[24] Figure 7 shows a comparison between modeled skin temperature by the three models and the observations. The upward longwave radiation observed at ARM/CART SIRS stations was converted to equivalent surface skin temperature (effective radiative temperature) with the assumption that surface emissivity is 1. The diagram shows the monthly mean diurnal cycle of skin temperature averaged over all ARM SIRS stations and the differences between the modeled skin temperatures and the observations. It is clear that Mosaic underestimates midday skin temperature systematically throughout the 21 months by up to 4°C. Conversely, Noah overestimates the midday skin temperature by up to 4°C during summer months. Part of the overestimate of the skin temperature can be attributed to a too large value of the aerodynamic resistance. In several follow-on experiments when a smaller value of aerodynamic resistance is used, the bias in skin temperature in Noah's simulations decreases. However this can only explain about 2°C of the bias. The reason for the bias in Mosaic is not clear at this stage. VIC's skin temperature simulation is relatively better than the other two models. There is no strong bias, but there is a slight tendency for it to underestimate skin temperature in the morning and overestimate it in the afternoon.

Figure 7.

Radiative skin temperature comparison. (a) Time series of diurnal cycle of the observed radiative skin temperature estimated from all available ARM/CART stations with SIRS instruments. (b, c, d) Differences between the models and observation for Mosaic, Noah, and VIC, respectively.

4.3. Turbulent and Ground Heat Fluxes

[25] Sensible and ground heat fluxes link the surface thermal state with the energy budget. As shown in Figure 5, even when models receive the same amount of radiative energy and absorb a similar amount, they can partition the radiative energy in a very different manner. Mosaic produces too much latent heat and too little sensible heat flux, VIC does the opposite, and the Noah simulation is close to the observations. Mosaic overestimates the latent heat flux by 50 W m−2 in 1998, which is more than 50% of the observed latent heat that summer. It also deposits more energy into ground heat flux during the summer and withdraws more from ground heat flux during the winter. The summer latent heat and ground heat is so large that it has to borrow energy from sensible heat flux. Mosaic's sensible heat flux is less than 50% of the observed values during spring and summer. This incorrect partition of energy between latent and sensible heat also affects soil moisture simulations, as shown later.

[26] Besides energy partition at the monthly scale, models should also be able to reproduce the diurnal variation of these fluxes. The models partition radiative energy in very different ways (Figure 6). In July 1999, Mosaic overestimates the daytime latent heat flux, while the other models underestimate it, and the reverse is true for the sensible heat flux. The Noah turbulent fluxes are closer to the observations than the other models. There are also significant phase differences among models and between models and observations. VIC's latent heat flux peaks in the morning and then decreases gradually in the afternoon, and its sensible heat flux peaks in the afternoon 4 to 5 hours later. Mosaic and Noah have their sensible heat flux peaks a little earlier than their latent heat flux. It seems that VIC's problem is connected to its vegetation parameters, which control the evapotranspiration.

[27] Although in the monthly mean Mosaic puts too much energy into ground heat flux during summer and Noah and VIC were very close to the observations (Figure 3), the monthly mean diurnal cycles show that only Noah is close to the observations (Figure 6). The amplitude of the diurnal cycle for Mosaic is almost 3 times larger than observed. VIC also deposits a huge amount of energy into ground heat in the morning, but withdraws much more in the night than Mosaic does, so the daily and monthly means are very close to the observations. Mosaic and VIC also have a significant phase difference in ground heat fluxes, of 3 to 6 hours. Noah has a smaller diurnal variation of ground heat flux than the other two models, but still a bit larger than the observations. The phase difference is also smaller for Noah. The phase difference in ground heat flux cannot be attributed to the radiative forcing, but rather to the way energy is partitioned into turbulent fluxes, and to the ground heat flux and soil temperature simulations.

[28] Results from several other months in Figure 6 confirm the findings from July 1999. The pattern of energy partition is quite systematic, but these systematic biases change with the seasons, being largest in June and July. For example, in September 1999, the biases are smaller but of the same sign as in July, except the Mosaic ground heat flux error at night (0000–1200 UT) is larger. There is also interannual variation. In July 1998, the latent heat flux from Mosaic is closer to the observations in amplitude, but still with a small phase shift.

4.4. Soil Temperature

[29] As shown in Figures 5 and 6, ground heat flux needs to be improved in the model simulations, most notably in VIC and Mosaic. Related to ground heat flux simulation are soil temperature and its profile. Figure 8 shows the averaged soil temperature for the top two layers from the Noah and VIC models. Mosaic has only one soil temperature used in its force-restore calculations, and it is not explicitly tied to a particular depth. Compared with observations, Noah and VIC do quite a good job in simulating the near-surface soil temperature. For the second layer, they both are still close to the observations, with a maximum difference less than 5°C. Interestingly, VIC's soil temperature follows observations more closely in the spring and Noah's follows observations more closely in the fall. Since we only have less then 2 years of data, it is hard to confirm whether this is a systematic pattern. The high bias in VIC's second layer soil temperature for the period October 1998 to January 1999 directly contributes to its ground heat flux simulation errors. Without an explicit soil temperature profile in Mosaic model, it is impossible to compare with observations directly, and this might be the cause of its ground heat flux problem.

Figure 8.

Time series of the spatially averaged soil temperature from two model layers (0–10 cm and 10–40 cm) compared with observations at two depths (5 cm and 25 cm). Black is observations. Noah is plotted in blue, and VIC is in green.

4.5. Soil Moisture

[30] We compared the spatially averaged soil moisture from all Mesonet stations with the respective average calculated from the model output (Figure 9). Although the models output their states at hourly interval and observations were taken every 30 min, we average them to daily values because we are more interested in the longer timescale variation. The observed volumetric soil moisture for the top 40 cm is estimated from the observations at 5 cm and 25 cm below ground. Modeled values are calculated from their top two layers. Noah and Mosaic both have two soil layers in the top 40 cm, and the water content from these two layers is directly used to calculate the volumetric values. VIC has a variable layer thickness for the second layer which is typically from 10 to 40 cm. When the second layer is thicker than 30 cm, only the top 30 cm of the second layer is taken with the assumption that soil moisture is homogeneously distributed vertically in the layer. SAC is a storage type model, and there are no explicit soil layers, but the sum of water content from the free water storage and tension water storage of its top layer can be used to estimate the volumetric soil moisture valid for the top 10–30 cm, depending on soil properties. We use this quantity to compare with other models and observations, keeping in mind that they are not strictly the same thing.

Figure 9.

Time series of the spatially averaged top 40 cm volumetric soil moisture for the period of January 1, 1998, to September 30, 1999. Except for SAC, the observations and other three models are valid for the top 40 cm. SAC is a storage type model and the sum of two top storages (free water and tension water storage) does not necessarily correspond to 40 cm. Red is Mosaic, blue is Noah, green is VIC, and gold is SAC.

[31] The observed volumetric soil moisture for the top 40 cm is generally above 30% but decreases to below 25% during the summer. This seasonality is closely related to the seasonal variation of evapotranspiration, which is mainly energy driven. Superimposed on this seasonal variation are shorter timescale variations that are driven by individual precipitation events. Compared with the observations, the Noah model has higher soil moisture values most of the time. The systematic bias is about 7% (m3/m3). This bias is a bit reduced in the summer when the soil is very dry. VIC is a little closer than Noah to the observations, but it does not get dry enough during summers and the seasonal variation is too small in VIC's simulation. Mosaic, conversely, has a pronounced seasonal variation. Its summer soil moisture is much drier than observations and the other two models. This dryness in the summer of 1998 persists, resulting in a dry bias in the rest of the simulations. SAC shows a dry bias in soil moisture simulation throughout the period, and its temporal variability is too large, but the estimates are not precisely for the top 40 cm, so that the higher variability in the SAC model as compared to observations is not unexpected. Nevertheless, all the models are able to capture the observed variations fairly well, particularly when examining the anomalies (Figure 9b). The anomalies were calculated with respect to the mean for the entire 21-month period.

[32] The comparisons between the model simulations in all the state and flux terms and the observations from Oklahoma Mesonet and the ARM/CART site reveal that models are able to capture the general features of the observations for monthly timescales and each model is able to closely simulate some but not all of the quantities in the energy and water budget. Mosaic has problems with skin temperature, soil moisture, and energy fluxes terms. Noah has a reasonable simulation of energy fluxes and soil temperature and soil moisture states, but a poor simulation of skin temperature. Conversely, VIC has a good simulation of the state variables such as soil temperature, soil moisture, and skin temperature, but its flux terms are not satisfying. The only variable we evaluate for SAC, soil moisture, has a dry bias, and its temporal oscillation may be over-amplified. Therefore we need to analyze each model and try to improve its particular aspects.

5. Analysis and Discussion

[33] The results presented above are not very surprising because it has been found that models perform differently even when driven by the same atmospheric forcing in the Global Soil Wetness Project [Entin et al., 1999] and several PILPS experiments [e.g., Schlosser et al., 2000; Slater et al., 2001; Bowling et al., 2003; Luo et al., 2003a]. Since different models have different problems, we now examine several of them and attempt to determine the factors that cause these problems and suggest ways to reduce them.

5.1. Soil Texture and Soil Hydraulic Parameters

[34] Soil texture plays a very important role in water-flux partition and hence energy budgets. Fine texture soil is able to hold more water then coarse soil, but has a smaller hydraulic conductivity. If models use a different soil type than what is observed, differences in their simulations would be expected. Soil texture has a tremendous spatial variability (both horizontally and vertically) in the real world, on scales as small as a few meters. The NLDAS specified soil texture comes from the 1-km USDA-NRCS State Soil Geographic Database (STATSGO) [Miller and White, 1998]. The predominant soil type in each 1/8° NLDAS grid square is provided to each model (Figure 1), but each model uses this information differently in its calculations. If the models assume different soil properties from each other, they will produce different simulations. Similarly, if the model soil types are different from those that actually exist at the Mesonet stations, we would expect differences in the model simulations and the observations.

[35] Figure 10 compares the soil types from the local stations to the NLDAS specification. Although there are many locations where the models use the same soil types as observed at the soil moisture stations, there are stations with significant differences between two data sets for both the 5 cm and 1 m integrated soil texture. The agreement between the local soil at particular stations and the soil type at that NLDAS grid is determined by the representativeness of the station of its surrounding area and the accuracy of the two data sets. The spread illustrates that soil texture differences can be a potential factor producing differences in model simulations.

Figure 10.

Comparison of the observed soil type in the stations with the NLDAS specified soil type at the closest grid. We obtained the local soil type at each Mesonet station where soil moisture sensors are installed. (a) We vertically integrated the percentage values of sand, silt, and clay at the four depths where soil moisture is observed to get a bulk soil texture for the whole soil column. Different colors indicate the number of occurrences of one event. (b) Comparison of soil types for the topsoil layer in the models and observations. (c–f) Comparison of the soil hydraulic parameters used by three models. Red bars are for the Mosaic model, blue for the Noah, and green for VIC.

[36] Soil type information is expressed as numerical values describing certain properties of the soil, such as porosity, field capacity, and hydraulic conductivity. These values are then used in water flux and water storage calculations. Soil types are determined by the composition of sand, silt, and clay, and the composition varies continuously in nature. This continuous spectrum is typically divided into 12 categories. Therefore there is still a certain amount of variability even inside one category. Soil hydraulic parameters hence can vary fairly significantly for one soil type. Figure 10 also shows the soil parameter values that are used by the three models for each soil type. Because different models use different soil parameters measured by different people, these values are all reasonable and all exist in nature. However, for land surface modeling purposes, we want to know which is more representative and is able to produce better simulations of the land states and surface fluxes. Porosity does not have very large variability from one soil type to another, so all the values used by models are relatively close (not shown here). Field capacity and wilting point are two critical states of the soil. Field capacity represents the water that can be held against the force of gravity and is determined by the soil matrix itself. Wilting point is reached when water uptake by vegetation through transpiration ceases, so it is determined by both soil properties and vegetation. The range between wilting point and field capacity is what we normally see in soil moisture conditions, and represents the active or plant-available water holding capacity. Although models use very similar values for porosity, the active water holding capacities differ significantly because models use different values of field capacity and wilting point for the same soil type. Saturated hydraulic conductivity and b parameter [Clapp and Hornberger, 1978] determine the complicated relationship between soil water content and soil water flow inside the soil. Different values will cause models to perform differently in soil moisture, runoff, and evaporation simulations.

[37] To evaluate how important these soil property differences are in producing the different results described above, we performed additional simulations (Table 2) using the local observed soil texture and a common soil hydraulic parameter data set that is a combination of data from Cosby et al. [1984] and Rawls et al. [1991a, 1991b] for all the models. We obtained all the parameters from Cosby et al. [1984] except saturated hydraulic conductivity, which was obtained from Rawls et al. [1991a, 1991b]. For convenience, this simulation was only carried out at grids where Mesonet or ARM/CART stations are located and local soil information is available. We designate the experiment with local soil properties the Local run and the one with local soil properties and common soil parameters the Local/Common run. If the soil type does not differ for a station between the observations and the NLDAS forcing, Mosaic's simulation will not change since it originally uses the Cosby/Rawls parameter data set. The other two model simulations will change, however, because of the parameter changes for the same soil type.

Table 2. Simulations Performed by Different Models
Control runyesyesyesyes
Local forcingyesyesyesyes
Local soil type and common soil parametersyesyesyesyes
Soil heat capacity (Mosaic only)Mosaic_CH70K   
Aerodynamic conductance (Noah only) Noah_0.1  
Aerodynamic conductance (Noah only) Noah_0.05  
Model structure (VIC only)  VIC_DeltaH 

[38] The differences between the Local and Local/Common simulations are complicated and hard to categorize simply, but the basic results are illustrated in Figure 11 with soil moisture simulations for two Mesonet stations at Skiatook and Waurika here (see Figure 1 for locations). The soil texture observed at Skiatook is sandy loam, which is the same as what is specified by NLDAS for that grid. Compared with the Local run, Mosaic's soil moisture in the Local/Common run does not change, and this is what we expected. Owing to the changes of hydraulic parameters in the Noah model for sandy loam soil, its soil moisture simulation significantly improved. The original difference in volumetric soil moisture was about 10% (m3/m3) between the Local and Control runs. On the contrary, VIC also changes its soil hydraulic parameters for the sand loam soil, but its soil moisture simulation gets worse. Originally, its soil moisture was very close to the observations, but a high bias was created due to the soil parameter change. At Waurika, the soil texture observed at the station is sandy loam, while NLDAS specified loam for that grid. When Mosaic runs the simulation with these two different soils, the changes are generally insignificant. Because, from loam to sandy loam, the saturated hydraulic conductivity increases while field capacity and wilting point decreases, the soil tends to be a little drier in Mosaic's Local/Common run. However, it is still quite different from observations, especially during summer. Changes in Noah's simulation improve its soil moisture at this station. The wet bias in the Local simulation is now much more reduced. For both these two stations, the Noah simulation is excellent for the Local/Common runs. Although soil hydraulic parameters are changed for each soil type in VIC's Local/Common run relative to its Local run, the change in soil moisture is virtually zero at Waurika. The reason is that VIC uses its original parameters for loam in its Local run and uses the Cosby/Rawls parameter for sandy loam in its Local/Common run, and the parameter values only change slightly.

Figure 11.

Top 40 cm soil moisture simulated in Local and Local/Common runs. The black line is observations. Solid color lines are from the Local run and dashed lines are from the Local/Common run.

[39] From the above examples, we realize that it is the soil hydraulic parameters that really matter. Changes in these parameters can change a model's simulation, especially soil moisture, and as a result, runoff and evaporation as well. They can improve a model's performance when more correct values are used and can degrade a model's performance when incompatible values are used. More importantly, different models have their own calibrated parameters, and changing a subset of the parameters without considering others will not guarantee improvements in the models' performance even though the improved parameters values might be more physically sound. In our case, changing Noah's soil parameters does make its soil moisture simulation better compared with observations, but VIC has been calibrated quite well with its original parameters, and changing to a new parameter data set degrades its performance.

5.2. Model Structure

[40] Model structure and the parameterization used for such a structure sometimes affect a model's performance. We have seen that VIC has quite a problem with the ground heat flux in NLDAS simulations (Figures 6 and 12). A careful investigation of the problem leads to significant improvement of ground heat flux. In the control run, VIC's energy balance is calculated at a thin surface layer; that is, the net radiation is balanced by latent, sensible, ground heat flux and the change of the heat storage of this thin layer. A more correct energy balance should be at a virtual plane at the surface with zero thickness and zero heat storage. By only this change in the model, VIC's ground heat flux is significantly improved (Figure 12). This might not be applicable to other models, but similar factors in model structure should be seriously and carefully considered.

Figure 12.

Difference of monthly mean diurnal cycle of ground heat flux between models and observations. (a) Mosaic, (b) Mosaic with corrected soil heat capacity, (c) Noah, (d) VIC, and (e) VIC_DeltaH. The plots are models minus observations. Ground heat flux is defined as upward positive, so a positive difference means a stronger upward flux or a weaker downward flux in the model compared with the observations.

5.3. Soil Heat Capacity

[41] Along with model structure, the choice of model parameter values can profoundly impact land surface simulations. As noted above, the Mosaic LSM features overly large ground heat flux values over the ARM/CART region. An investigation of this problem uncovered that the value of the Mosaic soil heat capacity parameter used in NLDAS simulations (175,000 J m−2 K−1) was one optimized for use with the a temperature data assimilation system [Radakovich et al., 2001] and not the traditional Mosaic soil heat capacity value (70,000 J m−2 K−1) as specified by Koster and Suarez [1996] and as used in several PILPS experiments. In the data assimilation development, the higher value was chosen for optimum Mosaic simulation of the seasonal soil temperature cycle, and so that the assimilated soil temperature had the proper persistence. This data assimilation technique was not used in NLDAS simulations, and the use of the associated soil heat capacity value greatly degraded the simulation of ground heat flux values. To address this issue, a second limited area run was conducted over the ARM/CART region using the traditional Mosaic soil heat capacity value (70,000 J m−2 K−1) as specified by Koster and Suarez [1996] and as used in several PILPS experiments. Figure 12 shows that the use of this value greatly improves the simulation of the diurnal ground heat flux cycle. Maximum errors are reduced by a factor of 2. Some improvement is also noted in the simulation of sensible heat flux, while the accuracy of the latent heat flux cycle remains relatively unchanged. Future NLDAS Mosaic simulations will make use of this lower soil heat capacity value, and the quality of Mosaic simulations should improve accordingly. The data assimilation system will also be fixed so as to give the proper weight to assimilated observations while preserving the correct model physics.

5.4. Aerodynamic Conductance

[42] Turbulent surface flux parameterizations in most land surface models involve a parameter called aerodynamic conductance that affects the partition of the radiative energy into surface latent and sensible heat fluxes. Physically, we expect that a larger aerodynamic conductance will produce more surface turbulent fluxes, which will result in a lower radiative skin temperature. Figure 13a shows the monthly mean diurnal cycle of aerodynamic conductance (Ch) for July 1998 from three models. There are no observations of this quantity available to validate these model values, but intercomparison among models helps to explain the difference in models' simulation of skin temperature. VIC has the largest Ch throughout the entire period, and consistent with that, its radiative skin temperature is lower than the Noah model. As can be seen in Figures 56, based on the July flux bias errors, we did expect and we did find that Mosaic has the coolest skin temperature of the three land models because Mosaic has the lowest Bowen ratio (low bias in sensible heat and high bias in latent heat), but we did not find the expected result in the relative magnitude of the high bias in mid-day summer skin temperature between VIC and Noah. From the July flux results alone in Figures 56, we expected VIC to have the warmest high bias (warmer than Noah) in mid-day summer skin temperature, since its low bias in latent heat flux (lower than Noah) and high bias in sensible heat (higher than Noah) were both substantially worse than that of Noah. Yet, the VIC high bias in July mid-day skin temperature was not worse than Noah high bias in July mid-day skin temperature, because of the offsetting effect of high VIC Ch and low Noah Ch. Sensible heat flux (H) is the product of the Ch and the difference between the air temperature (Tair) and the skin temperature (Tskin). Compared to Noah, VIC can have higher mid-day July values of sensible heat flux than Noah, yet lower mid-day skin temperature than Noah, because VIC Ch is substantially higher than Noah.

Figure 13.

Aerodynamic conductance comparison and improvement of Noah's skin temperature due to changes in aerodynamic conductance. (a) Monthly mean diurnal cycle of aerodynamic conductance from three models in July 1998. Mosaic, Noah, and VIC are plotted in red, blue, and green, respectively. (b) Monthly mean diurnal cycle of aerodynamic conductance in several Noah simulations with different values of Zilintikevich parameters (CZIL) for July 1998. Blue is the control case where CZIL = 0.2. Black solid line is the run where CZIL = 0.05 and the brown line with open circles is the run where CZIL = 0.1. (c) Skin temperature difference between the Noah run when CZIL = 0.1 and observations from ARM/CART. (d) Same as in Figure 13c, but the simulation is from Noah run when CZIL = 0.05.

[43] Thus the results show that researchers cannot only go by Bowen ratio errors across land models as the predictor of the relative magnitude of skin temperature bias across land models that receive the same surface forcing, because the sensible heat flux is a product of the temperature difference and the aerodynamic conductance. Furthermore, as shown by Mitchell et al.[2003], the sensitivity experiments of increasing Ch in Noah showed very little change in H (because substantial increase in Ch is offset by decrease in Tskin), and very little change in latent heat flux, because in case of latent heat flux in Noah (and many land models) over vegetation such as SGP, the canopy resistance is much larger and dominant over the aerodynamic resistance (inverse of Ch) in the resulting latent heat flux. Thus the increase of Ch in Noah experiment had very little effect on sensible and latent heat, but a desirable effect in Tskin (cooler). The interaction of Ch and (TairTskin) factors in H can yield results where Ch and Tskin can change quite a bit (and favorably change Tskin) but offset each other, and hence H may change little.

[44] In future follow-on NLDAS studies, the ARM measurements could be used to derive Ch from the ARM measurements of Tair, Tskin, and H. One conclusion might be that land modelers (unlike planetary boundary layer modelers) might be paying too much attention to Bowen ratio issues (e.g., canopy conductance, evaporation, soil moisture) and not enough attention to aerodynamic conductance issues. For land assimilation of satellite skin temperature to be effective, then ideally the land model skin temperature bias should be a manifestation of Bowen ratio error emerging from evaporation error that emerges from soil moisture error. This ideal is thwarted if poorly treated aerodynamic conductance in the land model has either a large high bias in Ch (that masks or mitigates skin temp error from large Bowen ratio error, as likely in VIC in NLDAS) or a large low bias in Ch (that exaggerates a skin temperature error from a modest Bowen ratio error, as likely in Noah in NLDAS).

[45] As shown in Figure 7, Noah overestimates skin temperature by about 4°C at summer noontime. Several additional experiments (Table 2) were carried out with Noah model to investigate the reason for its bias. These experiments are identical to the control case except that the tunable Zilintikevich parameter is changed to different values. This parameter controls the ratio of the roughness length for heat to the roughness length for momentum and effectively allows tuning of the aerodynamic resistance of the atmospheric surface layer. Decreasing the Zilintikevich parameter increases Ch [Chen et al., 1997]. As shown in Figure 13b, by tuning this parameter, we almost double Ch during July 1998. As a result of this change, the bias in skin temperature decreases (Figures 7c, 13c, and 13d). The 4°C bias at summer midday has been reduced, while the latent flux does not change very much with a slight increase of sensible heat flux. The impact of the changes in this parameter on other parts of the simulation is generally quite small.

[46] The above factors help us to understand and explain some of the differences between models and observations and between models, and also help us to improve models' performance to some extent. However, there still are many puzzles. Why are certain model variables well simulated compared with observations and other variables not? For example, VIC has fairly good simulations of all the state variables, such as soil moisture, soil temperature, skin temperature, but its flux terms are very different from observations with significant low latent heat flux and high sensible heat flux. The Noah model can simulate the fluxes quite well, but its skin temperature is biased.

6. Conclusions

[47] We compared the net radiation, surface turbulent heat fluxes, soil moisture, and soil temperature, from the NLDAS models with the observations taken at Oklahoma Mesonet and ARM/CART site in the southern Great Plains region. Spatially averaged quantities are compared to reduce small-scale noise. NLDAS models can capture broad features of soil moisture variations in the top 40 cm, especially the anomalies from the mean for the entire period, but inter-model variability is still very large. Mosaic tends to have a larger temporal variability and is the driest model during summer. The partition of net radiation at the surface differs significantly among the three models. VIC systematically underestimates latent heat flux and overestimates sensible heat flux. On the contrary, Mosaic underestimates the sensible heat flux and overestimates latent heat flux. Noah is somewhere in between and is closest to the observations. These systematic biases change with seasons and also vary from one year to the other. These differences are much larger than the spatial variability observed in this region at different ARM/CART stations.

[48] The difference between the models and the observation can be partially explained by the difference in soil texture difference between the value used by the model and station observed value. The experiment conducted in the study in which models are using the same soil type as observed at local stations illustrates that models have the ability to better reproduce the observations when forced with the correct forcing and correct land surface specifications (including soil texture and possible vegetation). An important message from this experiment is that changing a subset of model parameters to more “correct” or “realistic” values will not guarantee an improvement in model performance. Therefore model parameters have to be calibrated as a whole. The difference in energy partition among models is mostly determined by the model physics, and changes in surface specification or choices of soil parameters are not significant enough to be responsible for the difference in energy budgets. We also found that some factors such as aerodynamic conductance and model structure affect particular aspects of model performance. Aerodynamic conductance affects the efficiency of surface turbulent fluxes to the atmosphere, and hence affects the skin temperature simulation. Experiments with the Noah model demonstrated that a larger aerodynamic conductance can reduce the high bias in its midday skin temperature.

[49] Since we only have observations over a relatively small area compared with the entire NLDAS domain, model performance over other regions with different climates might be different. Some of the comparisons including snow cover and streamflow over other regions are presented by Sheffield et al. [2003] and D. Lohmann et al. (Streamflow and water balance intercomparison of four land-surface models in the North American Land Data Assimilation System (NLDAS), submitted to Journal of Geophysical Research, 2003).

[50] Evaluating these models in offline mode is just the first step of the NLDAS project. Whether the information provided by the models is useful and how useful needs to be demonstrated in the future when they are used for data assimilation of precipitation, radiation, and soil moisture observations, and these improved land surface conditions are used for weather forecasting. If we can accomplish improvement in weather forecasts, especially summer precipitation forecasts, then we will have shown that the land surface schemes are indeed working well and that can also be used for climate simulations.


[51] The work by Rutgers University was supported by NOAA OGP GAPP grant GC99-443b (A. Robock, PI), the Cook College Center for Environmental Prediction, and the New Jersey Agricultural Experiment Station. The work by NCEP/EMC, NWS/OHD, and NESDIS/ORA was supported by the NOAA OGP grant for the NOAA Core Project for GCIP/GAPP (co-PIs K. Mitchell, J. Schaake, J. Tarpley). The work by NASA/GSFC/HSB was supported by NASA's Terrestrial Hydrology Program (P. Houser, PI). The work by Princeton was supported by NOAA OGP GAPP grant NA86GPO258 (E. Wood, PI). The work by NCEP/CPC was supported by NOAA/NASA GAPP Project 8R1DA114 (R. Higgins, PI). The work by University of Maryland was supported by grants NA56GPO233, NA86GPO202, and NA06GPO404 from NOAA OGP and by NOAA grant NA57WC0340 to University of Maryland's Cooperative Institute for Climate Studies (R. Pinker, PI). Figures were drawn with GrADS, created by Brian Doty. We thank DOE for the ARM/CART meteorological and heat flux data that were provided to the project at no cost, and the NOAA Office of Global Programs and NASA Land Surface Hydrology Program for their purchase of the Oklahoma Mesonet meteorological and soil moisture and temperature data for their funded investigators.