The North American Regional Reanalysis (NARR) and Community Land Model (CLM, version 3.5) outputs are analyzed to characterize the surface water and energy budgets in the Mississippi River Basin (MRB). NARR and CLM performance are evaluated with reference to energy flux observations from 16 AmeriFlux sites in MRB. The issue of point-scale observations versus climate model grid cell outputs is addressed by analyzing the spatial variability in long-term monthly precipitation and temperature observations from 71 United States Historical Climatology Network stations in Indiana and Illinois. The model outputs are also evaluated for their ability to capture spatial and temporal variability in total runoff. Compared to average values at 11 AmeriFlux sites in MRB, NARR show higher biases (compared to CLM) in incoming solar radiation (24%), sensible heat flux (27%), and latent heat flux (59%), whereas CLM show smaller biases (compared to NARR) in incoming solar radiation (0.5%), sensible heat flux (−2%), and latent heat flux (11%). Seasonal cycle of observed sensible heat flux in the crop region shows two peaks (bimodal pattern), which is captured by NARR, but CLM do not show any bimodal pattern. Based on 25 years (1980–2004) monthly climatology in MRB, NARR has 11% energy balance closing error (latent + sensible + ground heat flux = 1.11 net radiation) and 12% water balance closing error (evapotranspiration + runoff = 1.12 precipitation), whereas CLM does not have water and energy balance closing errors, primarily due to model design. In comparison to the observed mean annual runoff of 237 mm/yr based on 1988–1999 data in MRB, NARR and CLM mean annual runoff values are 89 mm/yr and 281 mm/yr, respectively. Overall, CLM provides relatively better characterization of surface water and energy fluxes in the MRB compared to NARR.
 Studies involving water and energy balances together provide two levels of constraints (closure of water and energy balance equations), and hence can lead to better conceptualization of the hydrologic system at basin scales. As a part of the Global Water and Energy Cycle Experiment (GEWEX), many studies were conducted with the goal of ‘closing’ the water and energy balances for continental-scale basins [Roads et al., 2002]. Roads et al. , hereafter referred to as WEBS (Water and Energy Budget Synthesis), have synthesized water and energy budgets for the Mississippi River Basin (MRB) from best available models and observations for the period 1996–1999. The WEBS study found that while model outputs qualitatively correspond with the available observations, large quantitative uncertainty exists among different model outputs. Limited number of tower flux observations (total 2) was cited as most rare observations in the WEBS study. Since 2000, major developments have occurred with respect to improvements in regional reanalysis data (e.g., NARR) [Mesinger et al., 2006], land surface modeling (e.g., CLM 3.5) [Oleson et al., 2008], greater availability of energy flux observations (e.g., AmeriFlux data) [Law et al., 2009], and availability of new land cover change data sets [Fry et al., 2009]. Hence, it is worthwhile to revisit the WEBS study or a portion of that, and provide updated information about water and energy budgets in the MRB. As a part of a broader objective of assessing the impacts of climate and land cover changes on the water availability, this study presents an assessment of the reanalysis data and climate model outputs for quantifying water and energy budgets in MRB.
 Energy and water fluxes within a hydrologic boundary/basin are linked through evapotranspiration (ET), which is a major component of the hydrologic cycle [Postel et al., 1996]. Changes in ET brought by major land cover change can significantly impact regional fresh water availability, as well as regional ecosystems [Gordon et al., 2003; Zhang and Schilling, 2006]. Despite the importance of ET, relatively few reliable estimates of ET are available compared to runoff. Limited availability of observed ET is a major constraint for studying ET variability, and using ET for model evaluation or validation purpose in hydroclimatic studies. In the last decade, coordinated efforts have been made to measure carbon and water fluxes to assess changes in the terrestrial ecosystem (FLUXNET) [Running et al., 1999; Baldocchi et al., 2001]. FLUXNET provides a global network of over 500 flux measurement sites spreading across diverse biomes and climatic regions (http://daac.ornl.gov/FLUXNET/). FLUXNET coordinates among regional networks to ensure consistency and intercomparability of the flux measurements, provide infrastructure support for data archival and distribution, and support discussion and synthesis of scientific results, with the overarching goal of providing validation data set for net primary productivity, evaporation, and energy absorption at global scale. Hence, FLUXNET data provides an opportunity to improve our understanding of land surface and atmospheric interaction. AmeriFlux is the regional network of FLUXNET sites in America, and it provides a relatively denser network of observation sites in the United States (Figure 1). Thus, AmeriFlux data can be used to study spatial and temporal pattern of ET, and for evaluating the performance of land surface hydrology models in MRB. The AmeriFlux data is available for a relatively short period (average data length: 6 years in this study) and only few randomly distributed stations are available (total 16 in MRB), hence alternative sources of information need to be explored for large-scale hydroclimatic studies.
 Reanalysis data provide spatially and temporally continuous outputs for different surface and atmospheric variables by assimilating available observations from various sources (e.g., satellite data, meteorological observations from surface stations, and data from rawinsondes and dropsondes) with the help of atmospheric and land surface models. The North American Regional Reanalysis (NARR) data is a much improved version of reanalysis outputs compared to Global Reanalysis 1 and 2 (REAN1 [Kalnay et al., 1996] and REAN2 [Kanamitsu et al., 2002]) for hydroclimatic studies in the region [Mesinger et al., 2006]. NARR outputs have been used to: (1) evaluate the performance of global and regional climate model outputs [Kumar et al., 2010; Diffenbaugh, 2009], (2) study the pattern of major hydroclimatic variability (e.g., precipitation recycling) [Dominguez and Kumar, 2008; Dominguez et al., 2008], and (3) assess the impacts of land use land cover change [Fall et al., 2010]. However, reanalysis outputs can have biases and uncertainties, and the quality of outputs can vary depending upon the variable of interest [Maurer et al., 2001]. This study evaluates the performance of NARR outputs for surface water and energy fluxes in MRB using independent observations and/or other model outputs.
 Understanding the evapotranspiration and precipitation feedback mechanism between climate change and land use land cover change is a critical component for the assessment of present and future water availability. Currently available coupled land surface and atmospheric modeling system (e.g., Community Climate System Model) [Collins et al., 2006] provides an important tool to incorporate the feedback between land cover and climate. The coupled model outputs for surface water and energy fluxes can have biases, a portion of which can be attributed to biases in the atmospheric forcing [Lawrence et al., 2007]. Hence, performance of coupled modeling system for surface water and energy fluxes should be evaluated using offline simulation of the land surface component model. Recently, Model parameterization of land component of the Community Climate System Model (CCSM) is significantly improved with respect to ET partitioning, runoff scheme, groundwater model, and frozen soil scheme (CLM3.5) [Oleson et al., 2008; Stöckli et al., 2008]. This study evaluates the performance of CLM3.5 (hereafter referred as CLM) for surface water and energy fluxes in MRB.
 Updated sources of water and energy flux observations, as well as reanalysis and climate model outputs are presented in the above discussion. These sources also have limitations such as: (1) point-scale measurements from AmeriFlux sites; (2) limited assimilation of surface observations in NARR (e.g., precipitation is assimilated in NARR, but ET and runoff are not); (2) coarse resolution of CLM; and (4) surface energy flux formulation and parameterization difference between NARR and CLM. This study lays out the foundation for accomplishing the broader objective of assessing the impacts of climate and land cover changes on the water availability in MRB by identifying different sources of uncertainties in the reanalysis and climate model outputs for water resources assessment. This study also provides an assessment of our progress in closing the water and energy balance in MRB in the last 10 years since the WEBS study.
2. Study Area, Data, and Model Outputs
 The Mississippi River Basin (MRB) is the largest river basin in North America, with more available observed data than any other major basin in the world [Roads et al., 2003]. The MRB covers 41% of the conterminous United States, and has a total basin area of 3.2 million km2. Major climatic gradients (temperature and precipitation) based on PRISM (Parameter-elevation Regressions on Independent Slopes Model) [Daly et al., 1997, 1998] present-day climate normal (1971–2000) are shown in Figures 2a and 2b. Basin average annual temperature and precipitation are 10.4°C (range: −6.2°C to 22°C) and 810 mm (range: 144 mm/yr to 2901 mm/yr), respectively. Annual average temperature shows a north to south gradient, and annual precipitation shows a northwest to southeast gradient (Figures 2a and 2b).
 Major land cover classes in MRB based on the National Land Cover Data 2001 (NLCD, http://www.epa.gov/mrlc/nlcd-2001.html) are shown in Figure 2c. NLCD 2001 provides high-resolution (30 m) land cover data for the United Sates using satellite imagery (Landsat 5 and 7), and ancillary data (e.g., DEM, slope, aspect, population density) based on decision tree classification algorithm, a supervised classification method [Homer et al., 2004; Breiman et al., 1984]. Agriculture is the dominant land cover in MRB (39.2%), followed by grassland and shrubs (31.0%), forest (20.7%), wetland (3.6%), urban land cover (3.2%), open water (1.9%) and barren (0.4%). Land cover change aggregated over local watershed scale (eight-digit Hydrologic Unit Code (HUC8); http://nwis.waterdata.usgs.gov/tutorial/huc_def.html) based on NLCD 1992–2001 Land Cover Change Retrofit Product [Fry et al., 2009] is shown in Figure 2d. There are 851 HUC8 units in MRB, ranging in drainage area from 31 km2 to 17,287 km2 (average area = 3826 km2). Eighty HUC8 (9.4%) have experienced greater than or equal to 5% land cover change, and many of those HUC8s are located in the southern part of the basin (Arkansas-White-Red, and Lower Mississippi Basin; Figures 2c and 2d). Overall, 2.5% of MRB has undergone land cover change between 1992 and 2001. Most dominant land cover transition includes a decrease in the Forest area, and an increase in grassland/shrubs and urban area (Figure 3).
 Elevation in MRB ranges from the sea level (0 m) at the mouth of the Mississippi river to 4282 m in the Rocky Mountains range along the western boundary of the basin. Available sources of soil characteristics (physical and hydraulic properties) information include SSURGO (The Soil Survey and Geographic) and STATSGO (The State Soil and Geographic) soil maps from the United States Department of Agriculture (USDA; http://soildatamart.nrcs.usda.gov/), and soil map provided by Miller and White . More details on topography and soil characteristics can be found in the WEBS study.
2.1. AmeriFlux Observations
 The AmeriFlux Network was established in 1996, and it provides measurements of carbon, water, and energy fluxes in major vegetation types across different ecologic and climatic conditions in the Americas. Each flux tower represents an average footprint of 1 km radius at respective tower site [Baldocchi et al., 2001; Running et al., 1999]. A total of 16 AmeriFlux sites are available in MRB (Figure 1 and Table 1) with an average record length of 5.8 years (range: 2–12 years) between 1995 and 2007 [Law et al., 2009]. The number of available flux tower observations has increased in the recent years, with most observations (86%) available since 1999. Monthly average Level 4 data sets of energy flux observations are included in this study (http://public.ornl.gov/ameriflux/level4data.html). In Level 4 data set, missing values for half hourly flux observations are filled with observations under similar meteorological conditions, and nighttime fluxes are corrected for violation of eddy covariance method assumption using u* filtering (http://www.bgc-jena.mpg.de/bgc-mdi/html/eddyproc/index.html). All AmeriFlux sites used in this study are referred using the following abbreviation: XX_YYY_ZZZ, where XX is the country name (U.S. for United States), YYY is the abbreviation of the site name, and ZZZ is the land cover type. For example U.S._MMS_DBF refers to a “US” site named Morgan Monroe State (MMS) Forest with a deciduous broadleaf forest (DBF) land cover type. A list of AmeriFlux sites included in the analysis is provided in Table 1.
NARR and CLM elevation and land cover represent nearest grid point elevation and land cover in the respective data set. CLM land cover is grouped under three categories: percent forest (FR), percent shrub and grass (ShGr), and percent crop (CRO). AmeriFlux sites are referred to as XX_YYY_ZZZ, where XX is the country name (U.S. for United States), YYY is the abbreviation of the site name, and ZZZ is the land cover type (CRO, crop; CSH, closed shrubland; DBF, deciduous broadleaf forest; ENF, evergreen needleleaf forest; GRA, grassland). ARM, atmospheric radiation measurement; SGP, southern Great Plains; MF, mixed forest; BSH, broadleaf shrub with ground cover.
ARM SGP burn site- Lamont
ARM SGP control site- Lamont
ARM SGP site- Lamont
Morgan Monroe State Forest
Missouri Ozark Site
Walker Branch Watershed
Mead - irrigated maize site
Mead - irrigated maize-soybean
Mead - rainfed maize-soybean
Niwot Ridge Forest (LTER NWT1)
 NARR provides a spatially continuous, high-resolution (3-hourly temporal resolution, and 32 km spatial resolution) regional reanalysis data set for the North American domain since 1979. The NARR data set is developed as a major improvement upon the earlier National Center for Environmental Prediction–National Center for Atmospheric Research (NCEP-NCAR) Global Reanalysis (REAN1) data set in terms of resolution and accuracy [Mesinger et al., 2006]. The atmospheric component of NARR uses NCEP regional Eta model [Berbery et al., 2003; Mesinger, 2000] with lateral boundary conditions from Global Reanalysis-2 (REAN2) [Kanamitsu et al., 2002] and the Eta data assimilation system [Rogers et al., 1996]. The land component of NARR uses the Noah land surface model [Ek et al., 2003; Chen and Dudhia, 2001]. Major observations assimilated in NARR include: (1) precipitation data from rain gauging stations; (2) radiance data from satellite observations; (3) near surface wind (10 m) and moisture (2 m) data from Global Reanalysis-2 outputs; (4) sea and lake surface temperature; and (5) sea and lake ice cover data. Successful assimilation of high-quality detailed precipitation observations in NARR provides improved data set for studying land surface hydrology (e.g., soil moisture), and land atmospheric interaction [Mesinger et al., 2006].
2.3. CLM Offline Simulation
 CLM (version 3.5) is the recently released land component of the Community Climate System Model [Collins et al., 2006; Oleson et al., 2008]. Major hydrological processes in CLM include canopy interception, transpiration, throughfall, evaporation, infiltration, surface and subsurface runoff, and water table dynamics. A grid cell is first divided into four major land units (vegetative cover, lake, wetlands and glacier), and the vegetative fraction of the grid cell can have a maximum four Plant Functional Types (PFT) out of a total 16 PFTs [Oleson et al., 2004]. Surface data in CLM (e.g., PFTs, leaf and stem area) is based on multiyear MODIS land surface data at 0.5° resolution [Lawrence and Chase, 2007]. Several improvements in land surface parameterization have been incorporated in CLM to alleviate water and energy biases observed in its predecessor CLM3 [Oleson et al., 2008; Stöckli et al., 2008]. Improvements include improved canopy evaporation scheme, simple surface and subsurface runoff scheme based on distributed hydrologic model called TOPMODEL [Niu et al., 2005; Niu and Yang, 2006], simple groundwater model [Niu et al., 2007], and a new frozen soil scheme [Oleson et al., 2008]. Offline results of CLM provided by Oleson et al.  are used in this study. The offline simulation of CLM uses atmospheric forcing data from Qian et al.  for 1948–2004, and has a long spin-up period (624 years), by cycling the same atmospheric forcing (1948–2004) 12 times, to stabilize the deep soil water in the model. Detailed description of CLM model, simulation, and results are provided by Oleson et al. . The atmospheric forcing was constructed by adjusting REAN1 outputs using observed monthly precipitation and temperature, satellite radiation data, and cloud cover data [Qian et al., 2006]. Because of model design (water and energy balance closure for each grid cell and each time step), and gridded observational input data set, CLM is expected to show better results for surface water and energy fluxes in MRB.
2.4. Other Data Sets and Models
 The United States Historical Climatology Network (USHCN) data from 71 stations in Indiana and Illinois are used to study the characteristics of spatial scale of climate forcing in the region. Monthly time series of precipitation and temperature for 113 years (1896 to 2008) are included in this study. USHCN data incorporates adjustment for observation biases [Karl et al., 1986; Vose et al., 2003], and artificial changes in the time series arising due to station relocation and equipment change [Menne et al., 2009]. Because gridded or denser network of tower flux observations are not available, long-term observed records of precipitation and temperature are used to supplement the analysis of point-scale measurements versus climate model grid cells.
 Three runoff data sets are also used in this study including: (1) naturalized runoff estimate for MRB from E. P. Maurer and D. P. Lettenmaier (Calculation of undepleted runoff for the GCIP region, 1988–2000, 2001, http://www.ce.washington.edu/∼edm/WEBS_runoff) (hereinafter Mauer and Lettenmaier, online report, 2001), (2) University of New Hampshire/Global Runoff Data Centre (UNH-GRDC runoff data) [Fekete et al., 2002], and (3) runoff data from Variable Infiltration Capacity (VIC) model [Maurer et al., 2002]. The naturalized runoff data from Maurer and Lettenmaier (online report, 2001) were created by adding the consumptive water use to the observed runoff with the help of observed consumptive water use estimate and VIC model output. Consumptive water use accounts for 6% of the naturalized runoff during the 1988 to 1999 period (range: 4 to 7%). UNH-GRDC runoff were created by combining the Water Balance Model outputs with the observed mean annual runoff data [Fekete et al., 2002].
 PRISM monthly precipitation and temperature data (1980 to 2004) are also used in this study as climate observations. PRISM is a high-quality spatial data set at 4 km resolution created by using point observations of precipitation and temperature with Parameter-elevation Regressions on Independent Slopes Model (PRISM) [Daly et al., 1997, 1998].
 The methodology involves: (1) regional classification of the study area (MRB) based on the climatic condition; (2) regridding of NARR (NARR_Regrid) to the climate model grid size; (3) long-term (25 years) climatology comparison between NARR, NARR_Regrid, and CLM; (4) evaluation of NARR, NARR_Regrid, and CLM with respect to AmeriFlux observations using equal sample size principle; (5) evaluation of the spatial and temporal variability in total runoff in MRB; and (6) analysis of water and energy balance closure in MRB. Each step in the methodology is briefly described below.
3.1. Regional Classification of MRB
 The four major climatic regions in MRB (Figure 1) are: (1) Cfa: warm temperate climate, fully humid, and hot summer; (2) Dfa: snow climate, fully humid, and hot summer; (3) Dfb: snow climate, fully humid, and warm summer; and (4) BSk: arid, cold steppe climate. The regional classification shown in Figure 1 is based on the digital Köppen-Geiger climate classification map provided by Kottek et al. . For climate classification, Kottek et al.  have used 0.5° resolution monthly temperature and precipitation data from Climate Research Unit (www.cru.uea.ac.uk), and Global Precipitation Climatology Center (http://gpcc.dwd.de), respectively, for 1951 to 2000. Major climatic characteristics of the four regions are listed in Table 2. Southeast MRB has temperate climate, northeast MRB has snow climate, and western MRB has arid climate.
From Kottek et al. . Tmin, minimum monthly mean temperature; Tmax, maximum monthly mean temperature; Tann, annual mean temperature; Tmon, monthly mean temperature. Dry summer and dry winter are defined as a function of monthly total precipitation during the summer (May–October) and winter (November–April) months. For dry summer, Psmin < Pwmin, Pwmax > 3 Psmin and Psmin < 40 mm; for dry winter, Pwmin < Psmin and Psmax > 10 Pwmin, where, Psmin, Psmax, Pwmin and Pwmax are minimum and maximum monthly total precipitation values during the summer and winter months, respectively. Pth (mm) is the dryness threshold for the arid region which is a function of annual average temperature. Pth = (2* Tann) if 2/3 of annual precipitation occurs in winter; (2* Tann + 28) if two thirds of annual precipitation occurs in summer; (2* Tann + 14) otherwise. Pann is annual total precipitation (mm/yr).
−3°C < Tmin < 18°C
Tmax > 22°C
neither dry summer nor dry winter
Tmin < −3°C
Tmax > 22°C
neither dry summer nor dry winter
Tmin < −3°C
Tmax < 22°C and 4 Tmon ≥ 10°C
neither dry summer nor dry winter
Tann < 18°C
Tann < 18°C
5 Pth < Pann < 10 Pth
3.2. Regridding of NARR
 One of the objectives of this study is to evaluate the prediction of ET by the land component (CLM) of a global climate model at regional scale. Because NARR's spatial resolution is notably higher (32 km) compared to CLM resolution (T42, ∼280 km), NARR outputs are regridded to CLM resolution in a two step process using the National Center for Atmospheric Research's (NCAR) Command Language (NCL; http://www.ncl.ucar.edu/). In the first step, NARR outputs are regridded to 0.5 degree (∼50 km) resolution using inverse distance weighting, and in the second step, 0.5 degree output is then regridded to T42 resolution using area average method. This two step procedure is followed because NARR has Lambert conformal conic native grid projection. NARR outputs regrided to T42 resolution are referred to as NARR_Regrid in this study.
3.3. Monthly Climatology Comparison Between NARR, NARR_Regrid, and CLM
 Monthly climatology (long-term monthly mean and interannual variability) of near surface hydroclimatic variables is prepared from NARR, NARR_Regrid, and CLM monthly outputs from 1980 to 2004 (25 years). The monthly climatology is presented with 95% confidence interval uncertainty range calculated from standard deviations of 25 years monthly outputs in each case. The uncertainty range represents the interannual variability during the analysis period. Because AmeriFlux data are not available for the 25 year period, these data are not included in the 25 years monthly climatology comparison.
3.4. CLM, NARR, and AmeriFlux Comparison
 Monthly averages of CLM, NARR, and AmeriFlux observations are compared at 16 AmeriFlux site locations in MRB. Because the spatial and temporal coverage of AmeriFlux observations is not consistent with that of CLM and NARR, equal sample size principle [Robock et al., 2003] is used for making this comparison. In equal sample size principle, point observations at a site (AmeriFlux site in this study) are compared with the model outputs (CLM and NARR) from the nearest grid cell for the available time period of observation.
3.5. Closing Water and Energy Balance for MRB
 Water and energy balance components are linked through ET as given in equations (1)–(3) below.
Where Δs is the change in storage (in the active soil layers), P is precipitation, ET is evapotranspiration, N is total runoff, Rn is net radiation including short- and long-wave radiation, Sht is sensible heat flux, Lht is latent heat flux, Ght is ground heat flux, and λ is the latent heat of vaporization. Lht and ET are used interchangeably in this study. Summer is considered to be from May to October (6 months) and winter is considered to be from November to April (6 months), unless specified otherwise.
 Any significant bias in ET will reflect bias in the estimation of runoff from the basin, because P is the constrained term in equation (1) (precipitation is observed forcing data in CLM, and precipitation observation is assimilated in NARR) and Δs can be taken as zero for long-term annual water balance. Spatial distribution of runoff is validated with the UNH-GRDC runoff [Fekete et al., 2002] and VIC runoff [Maurer et al., 2002]. To exclude the effect of water withdrawals for irrigation or water supply for cities, total runoff at the watershed outlet is compared with the naturalized runoff estimate from Maurer and Lettenmaier (online report, 2001).
3.6. Statistical Methods
 Comparison between different data sets and models are done using the following statistical measures: mean, standard deviation (interannual variability), bias (model – observation), pearson product moment correlation coefficient (correlation coefficient), square of correlation coefficient (R2), root mean square error (RMSE), semivariogram (plot of 0.5 * [square difference] against separation distance, [Kitanidis, 1997; pp. 32–40]), and statistical significance of difference in monthly mean values. For statistical significance t test is used with 95% confidence interval (α = 0.05). Ninety five percent uncertainty range for long-term monthly mean value (μ) is calculated by using equation (4) [Miller and Miller, 2004, p. 358].
where and s are mean and standard deviation of random sample of size n ( = 25) from a normal population, α is the significance level ( = 0.05), and values of t,n−1 is taken from T-distribution table. The assumption of normality for all variables (basin average each month time series from 1980 to 2004) is checked using statistical test (Shapiro-Wilk test) and graphical method (Quantile-Quantile plot) in SAS and found valid.
4.1. Monthly Climatology Comparison Between PRISM, NARR, NARR_Regrid, and CLM
 Basin average climatological mean and variability (1980–2004) for water and energy balance components in MRB are presented in Table 3. Mean annual net radiation obtained from CLM (69.7 W/m2) is lower (18%) compared to NARR (84.7 W/m2). Sensible and latent heat flux are also lower by 19% and 31%, respectively, in CLM compared to NARR. Latent heat flux represents 59% of the net radiation in CLM, and it represents 70% of the net radiation in NARR. Ground heat flux is a minor component of the energy balance equation (equation (2)). Higher interannual variability (2.2 W/m2) in ground heat flux in comparison to the overall mean (0.5 W/m2) is due to the opposing sign of ground heat flux during summer and winter months. CLM does not produce any closing error in the energy balance equation (Rn − Lht − Sht − Ght = 0.0), but NARR shows an error of −9.6 W/m2 (11% of net radiation) in the energy balance equation. Regridding of NARR (NARR_Regrid) has resulted in less than 1% reduction in the basin average monthly mean of net radiation, sensible and latent heat fluxes, in comparison to original NARR outputs. However, interannual variability (standard deviation) is reduced by 9% for net short wave, 6% for net long wave, 21% for sensible heat flux, and 20% for latent heat flux as a result of regridding NARR outputs (Table 3).
Table 3. Climatological Annual Mean (1980–2004) for the MRB From CLM and NARR Outputsa
Numbers in parentheses represent average values of monthly standard deviation. For precipitation and runoff, monthly standard deviation is multiplied by 12 to be consistent with annual total value. PRISM climatological mean is also presented for the reference purpose.
Net short wave (+down)
Net long wave (+up)
Sensible heat flux
Latent heat flux
Ground heat flux
2 m air temperature
Total runoff (N)
 Twenty-five years (1980–2004) monthly climatology of near surface air temperature (Tair) and precipitation (P) from PRISM, NARR, CLM, and NARR_Regrid are shown in Figure 4. In comparison to PRISM data, basin average mean annual temperature is 0.9°C higher in NARR, 0.2°C higher in CLM and 0.7°C higher in NARR_Regrid, and basin average annual total precipitation is 7% lower in NARR, 2% lower in CLM, and 9% lower in NARR_Regrid (Table 3). Regridding of NARR (NARR_Regrid) has resulted in 18% reduction in the interannual variability of precipitation compared to the original NARR outputs. There is no change in the interannual variability of temperature between NARR and NARR_Regrid. Interannual variability in PRISM precipitation data (446 mm/yr) is closer to NARR (420 mm/yr), and interannual variability in CLM precipitation data (328 mm/yr) is closer to NARR_Regrid (355 mm/yr). Basin average monthly precipitation from NARR, NARR_Regrid, and CLM are not statistically different (90% confidence interval) compared to PRISM precipitation data for all 12 months (not shown). Basin average monthly temperature from NARR and NARR_Regrid are statistically different (90% confidence interval) compared to PRISM temperature data during summer months (May to October), but they are not statistically different during winter months (not shown). For CLM, monthly temperature is statistically different compared to PRISM temperature data for three summer months (May to July), and they are not statistically different for the remaining 9 months. Statistical difference in summer temperature can be due to higher magnitude of mean value and lower interannual variability. For example, summer temperature mean and standard deviations are 18.6°C and 1.5°C, respectively; whereas winter temperature mean and standard deviations are 2.4°C and 2.5°C, respectively, for MRB in PRISM data.
 Spatial variability of the absolute difference between CLM and NARR_Regrid annual average latent heat flux and sensible heat flux are presented in Figures 5a and 5b, respectively. For latent heat flux, difference between CLM and NARR_Regrid shows an east-west divide. In the eastern part, CLM latent heat flux is lower compared to NARR_Regrid for all months (-38% annual average difference, Figure 5c). In the western part, opposite signs of difference in first (negative) and second half (positive) of the year cancel each other (Figure 5d), making the annual average difference smaller (−18%). In the eastern MRB, CLM sensible heat flux is higher in summer, and is lower in winter, compared to NARR_Regrid, making the annual average difference smaller (+16%; Figure 5e). In the western MRB, CLM sensible heat flux is lower in summer compared to NARR_Regrid, and annual average difference is −38% (Figure 5f). As shown in Table 3, NARR_Regrid has resulted in less than 1% reduction in the mean annual values of sensible and latent heat fluxes. Therefore, the difference in NARR and CLM sensible and latent heat fluxes in the eastern and western parts of MRB should be similar to the difference between NARR_Regrid and CLM. Thus, NARR and CLM provide spatially (east versus west) and temporally (summer versus winter) different estimate of sensible and latent heat flux in MRB.
 NARR has a 12% water balance closing error (ET + N = 1.12 P); whereas CLM does not have water balance closing error (ET + N = P). The water balance closing error in NARR is not affected by regridding of NARR outputs (Table 3). Comparison of NARR, NARR_Regrid, and CLM runoff outputs with observed data is presented in section 4.4.
4.2. AmeriFlux, NARR, and CLM Comparison
 Point-scale observations at 16 AmeriFlux sites in MRB are compared with the nearest grid cell in NARR and CLM using the equal sample size principle (section 3.4). Monthly mean values of observed hydroclimatic variables at AmeriFlux sites are given in Table 4. For 12 AmeriFlux sites, results are presented until 2004 because CLM outputs are available from 1948 to 2004. For sites that have less than 2 years of data prior to 2004 (a total of four sites), comparison extends beyond 2004, and is made only with NARR outputs. Elevations and land cover types at AmeriFlux sites and corresponding nearest grid cells in NARR and CLM data are given in Table 1.
Table 4. Statistical Summary of the AmeriFlux Monthly Observationsa
Observation: Mean Value for Given Time Period
Rg, incoming solar radiation; P, precipitation; Tair, air temperature; Lht, latent heat flux; Sht, sensible heat flux. Precipitation for U.S._Goo_GRA and U.S._WBW_DBF are not shown because of missing observations.
Year 2000 is missing from U.S._Bo1_CRO AmeriFlux data.
 Numbers of AmeriFlux sites present in Cfa, Dfa, Dfb, and BSk climate regions are 7, 5, 2 and 2, respectively. In Cfa region, three grassland sites have higher average latent heat flux/incoming solar radiation ratio (average = 0.30, range: 0.26 to 0.33), compared to three deciduous broadleaf forest sites (average: 0.27, range: 0.25 to 0.30), and one crop land site (0.23). In Dfa region, latent heat flux/incoming solar radiation ratio is higher at one grassland site (0.39) compared to four crop sites (average: 0.27, range: 0.26 to 0.28). In Dfb region, one deciduous broadleaf forest site and one closed shrub land site has same latent heat flux/incoming solar radiation ratio (0.19). In Bsk region, one evergreen needleleaf forest site has higher latent heat flux/incoming solar radiation ratio (0.25) compared to one grassland site (0.14).
 NARR and CLM monthly outputs are compared with AmeriFlux observations in terms of R2, Bias (model – observation), and RMSE (root mean square error). Both Bias and RMSE are expressed as percentage of mean observations at respective sites (Table 5). AmeriFlux site U.S._NR1_ENF, located in Rocky mountain range (elevation 3050 m) [Monson et al., 2005], shows large difference (2.6°C higher) in mean temperature compared to the nearest CLM grid cell. Hence, U.S._NR1_ENF results are not included in 11 sites average results presented in the next paragraph. Out of remaining 15 sites, only 11 sites are included in comparison with NARR because CLM outputs are not available at four AmeriFlux sites. These 11 sites are indicted in bold letters in Table 5.
Table 5. Model Performance Evaluation, Bias, and RMSE Expressed As the Percent of Observed Mean Values Given in Table 4a
NARR (R2/Bias/RMSE) (NA/%/%)
CLM (R2/Bias/RMSE) (NA/%/%)
Boldface indicates sites where NARR and CLM performances are compared.
 Comparison between NARR and CLM outputs show that incoming solar radiation (Rg) and temperature (Tair) are the two most correlated variables in these data sets. Average R2 for Rg/Tair at 11 AmeriFlux sites is 0.96/0.97 in NARR, and 0.94/0.97 in CLM. Compared to AmeriFlux data, incoming solar radiation is 24% higher in NARR (range: 19% to 32%), and 0.5% higher in CLM (range: −5% to + 5%). Average near surface air temperature is 6% higher in NARR (range: −3% to 32%), and 3% lower in CLM (range: −34% to 11%). NARR outputs show higher correlation (average R2: 0.59, range: 0.27 to 0.88) with observed precipitation compared to CLM (average R2:0.46, range:0.23 to 0.72). Average monthly precipitation is 1% lower in NARR (range: −26% to 23%), and 1% higher in CLM (range: −9% to 27%) (precipitation results include comparison at 9 sites only, because at remaining 2 sites precipitation observation was not satisfactory because of missing values). CLM outputs for latent heat flux show higher correlation (average R2: 0.80, range: 0.51 to 0.93) with observation compared to NARR outputs (average R2: 0.71, range: 0.37 to 0.92). Average monthly latent heat flux is 11% higher in CLM (range: −9% to 32%), and 59% higher in NARR (range: 7% to 136%). NARR outputs for sensible heat flux also show slightly higher correlation with observations (average R2: 0.42, range: 0.02 to 0.89) compared to CLM outputs (average R2: 0.40, range: 0.05 to 0.67). Average monthly sensible heat flux is 27% higher in NARR (range: −14% to 86%), and 2% lower in CLM outputs (range: −34% to 58%).
 Monthly mean and standard deviation (interannual variability) of latent heat flux, sensible heat flux, and precipitation from AmeriFlux observations, NARR, and CLM outputs at the best available sites (longest comparison period, average 6 years, range: 5–7 years) in each region are shown in Figure 6. Seasonal variations (shape of monthly mean during the year) of latent heat flux and precipitation are captured by both NARR and CLM (see R2 in Table 5). However, NARR latent heat flux shows higher positive bias compared to CLM as discussed previously (see bias in Table 5). Lower correlation of observed precipitation with model outputs, compared to temperature and latent heat flux, (Table 5) can be due to the multimodal (multiple peaks) nature of precipitation distribution during the year in Cfa, Dfa, and Dfb regions. Seasonal cycle of sensible heat flux show a bimodal pattern (two peaks during the year) at many AmeriFlux sites, particularly pronounced at the cropland site (e.g., U.S._Bo1_CRO in Figure 6). CLM does not capture the bimodal pattern of sensible heat flux at all sites, and CLM shows only one peak in seasonal cycle of sensible heat flux. NARR captures the bimodal pattern of seasonal variations in sensible heat flux, particularly at U.S._Bo1_CRO site. The bimodal pattern issue is discussed in detail in sections 5.4 and 5.5. NARR output and CLM input data show the observed seasonal cycle of near surface air temperature very well (not shown; see R2 in Table 5). As shown in Figure 6, regridding of NARR outputs did not change major characteristics of the results such as the bimodal pattern of sensible heat flux, higher positive bias in NARR latent heat flux compared to CLM, and multimodal nature of precipitation distribution at the four sites shown in Figure 6.
 Many studies have identified error in energy balance closure at FLUXNET sites [Wilson et al., 2002; Foken, 2008; Stöckli et al., 2008]. These errors are in the order of 20%, with underestimation of latent heat flux and sensible heat flux or overestimation of available energy [Lht + Sht = 0.8 (Rn − Ght)] [Wilson et al., 2002]. Level 4 AmeriFlux data, that uses u* (friction velocity) filtering, is expected to show better energy balance closure, because improvements in energy balance closure are found with increasing friction velocity [Wilson et al., 2002]. In the u* filtering method, measured fluxes below the threshold u* are discarded (mainly nighttime fluxes), and filled with the other observations with similar meteorological condition (gap filling). However, in Level 4 AmeriFlux data, net radiation or surface albedo variables are not available, and hence quantative evaluation of improvements in energy balance closure cannot be made at this time. In section 5.6, NARR and CLM latent heat flux/ET outputs are evaluated using a theoretical approach (Budyko curve) instead of using AmeriFlux observations.
 The results presented in this section are based on comparison of a point-scale observation (25 × 25 m) with NARR grid cell (32 × 32 km) and CLM grid cell (280 × 280 km) outputs. This comparison raises an important question about the validity of comparing point-scale observations with coarse resolution gridded climate model output. This issue is addressed in section 4.3.
4.3. Spatial Variability in Point-Scale Hydroclimatic Observations
 The issue of comparing point-scale observation with climate model grid cell outputs is addressed by looking into the spatial variability of monthly precipitation and near surface air temperature records at 71 USHCN stations in Indiana and Illinois (Figure 7). A total of 113 years (1896 to 2008) of monthly records are included in the analysis. Spatial variability is analyzed through pair-wise spatial correlation, RMSE difference, semivariance, and statistical significance of difference in the monthly observation. The average distance among pairs of sites, and the number of station pairs for each distance is given in Table 6. The average distance ranges from 38 km to 524 km between any two stations, and a total of 2485 station pairs are included in the analysis.
Table 6. Pairs of USHCN Stations and Their Distances
Number of Pairs of Stations
Mean Distance (km)
dist ≤ 50 km
50 km < dist ≤ 100 km
100 km < dist ≤ 200 km
200 km < dist ≤ 300 km
300 km < dist ≤ 400 km
400 km < dist ≤ 500 km
dist > 500 km
 Spatial variability in monthly near surface air temperature records is shown in Figure 8. Spatial correlation remains very high for all the distances (average correlation coefficient = 0.90), however its magnitude decreases with increasing distance (0.96 for 38 km distance and 0.82 for 524 km distance). The RMSE difference increases from 0.83°C for 38 km distance to 3.2°C for 524 km distance (average of 12 months). Semivariogram of temperature data (Figure 8c) shows a parabolic model with no nugget effect, suggesting that the scale of variability is larger than the sampling interval [Kitanidis, 1997, pp. 32–40]. Statistical significance test (two sample T test) of difference in the means show that the monthly temperature records are not statistically different (α = 0.05) until an average distance of 248 km (Figure 8d). Spatial correlation and RMSE difference for 248 km distance are 0.91 and 1.8°C, respectively.
 Spatial variability in monthly precipitation records is shown in Figure 9. Spatial correlation of precipitation records is lower than the spatial correlation of temperature records because of the higher variability and multimodal precipitation pattern in the region as discussed in section 4.2. Spatial correlation of monthly precipitation decreases from 0.82 for 38 km distance to 0.30 for 524 km distance (average = 0.55). The RMSE difference increases from 29 mm/month for 38 km distance to 64 mm/month for 524 km distance. Semivariogram of monthly precipitation (Figure 9c) shows an almost linear shape with small nugget effect (410 mm/month2), suggesting that most of the variability is at a scale larger than the sampling interval, but some variability may be present at a scale comparable to the sampling interval [Kitanidis, 1997, pp. 32–40]. Nugget effect is a discontinuity of semivariogram at the origin (y axis intercept), obtained by fitting linear trend line to mean semivariogram curve shown in Figure 9c. Statistical significance test (two sample T test) shows that the mean precipitation is not statistically different for 7 months (April to October) for all the distances. For 5 months (November to March), mean precipitation becomes statistically different for 441 km or greater distances (Figure 9d). Spatial correlation and RMSE differences for 248 km distance are 0.53 and 50 mm/month, respectively.
 Mean behavior of point-scale observation of monthly precipitation and temperature records suggest that point-scale measurements are not statistically different for at least 248 km distance. Similar to precipitation and temperature observations, ubiquitous observations of surface energy fluxes are not available, and hence, spatial variability analysis as presented above cannot be conducted using sparse and short-term energy flux observations. Pair-wise study is conducted using five or more years of comparative records from available neighboring stations to study the effects of climatic forcing, and land cover types on energy flux observations. The results from the pair-wise study are presented below, but it should be noted that these results are obtained by using a small sample size (minimum sample size = 5 and maximum sample size = 9).
 Monthly observations of latent and sensible heat fluxes along with temperature and precipitation are analyzed for three pairs of AmeriFlux sites including: (1) U.S._MMS_DBF, and U.S._BO1_CRO; (2) U.S._Wcr_DBF, and U.S._Los_CSH; and (3) U.S._Ha1_DBF and U.S._Ho1_ENF. Third pair of stations (U.S._Ha1_DBF and U.S._Ho1_ENF) located in the northeast United States is included in the analysis because of their longest available comparative records (9 years). Summarized results for summer (May to October) and winter (November to April) months are presented in Table 7.
Table 7. Pair-Wise Comparative Analysis of AmeriFlux Observations at Selected Sitesa
Correlation Coefficient, Tair/P/Lht/Sht
Mean Value, Tair/P/Lht/Sht
Tair/P/Lht/Sht: units are °C/mm per month/W per m2/W per m2. Values are calculated for each of the 12 months separately, and then averaged/counted for summer and winter months. NSSCM, number of statistically significant correlation month (p value ≤ 0.05); NSSDMM, number of significantly (statistical) different mean month (p value ≤ 0.05); NA, not available.
 AmeriFlux sites U.S._MMS_DBF and U.S._BO1_CRO are located 177 km apart. Average correlation coefficient for monthly temperature is 0.75 and 0.82 during summer and winter months, and is statistically significant for 4 months each in summer and winter. Correlation for other variables is low compared to temperature correlation, and these correlations are not significant for most months. The summer months account for 87% of annual total ET for U.S._MMS_DBF, and 74% of annual total ET for U.S._BO1_CRO site. Magnitude of summer latent heat flux/ET is similar at both sites (73 W/m2 at U.S._MMS_DBF site and 70 W/m2 at U.S._BO1_CRO site), and they are not statistically different for any month in summer. Sensible heat flux is significantly different for most months in summer, and the average magnitude of sensible heat flux is 18 W/m2 at U.S._MMS_DBF, and 32 W/m2 at U.S._BO1_CRO. In winter, however, the behavior is opposite with statistically different latent heat flux for all months. Winter sensible heat flux is not statistically different for 4 months at U.S._MMS_DBF and U.S._BO1_CRO site.
 AmeriFlux sites U.S._WCr_DBF and U.S._Los_CSH are located 32 km apart in north Wisconsin. Temperature data at these two sites show low correlation, 0.30 for summer, and 0.33 for winter. Low correlation in monthly temperature could be due to differences in elevation (Table 1) and terrain type (oval-shaped ridge for U.S._Wcr_Dbf, and poorly drained depression/wetland for U.S._Los_CSH), but this issue is not investigated in this study. The summer months accounts for 90% of annual total ET at these sites, and latent heat flux/ET is of similar magnitude at both sites (48 W/m2 at U.S._WCr_DBF, and 51 W/m2 at U.S._Los_CSH site). Sensible heat flux is also similar in magnitude (27 W/m2 at U.S._WCr_DBF, and 29 W/m2 at U.S._Los_CSH), and is not statistically different for 5 months in summer. During winter, sensible heat flux has different magnitude at these sites (26 W/m2 at U.S._WCr_DBF, and 17 W/m2 at U.S._Los_CSH), but the difference is not statistically significant for 4 months.
 AmeriFlux sites U.S._Ha1_DBF (in Massachusetts), and U.S._Ho1_ENF (in Maine) are located 405 km apart. Monthly temperature shows correlation of 0.72 in summer and 0.84 in winter. Other variables show relatively lower correlation (e.g., 0.38 for latent heat flux during the summer months). The summer months account for more than 80% of annual total ET at these sites. Summer latent heat flux/ET at U.S._Ha1_DBF is 56 W/m2, and 49 W/m2 at U.S._Ho1_ENF site (significantly different for 2 months). The sensible heat flux at U.S._Ha1_DBF and U.S._Ho1_ENF sites is statistically different for most months in summer, and for 3 months in winter.
4.4. Spatial and Temporal Distribution of Total Runoff in MRB
 Spatial distribution of annual total runoff based on UNH-GRDC composite runoff data, VIC model, NARR, and CLM is presented in Figure 10. Except for the UNH-GRDC data set, for which only monthly climatological mean values are available [Fekete et al., 2002], the other three model outputs represent average annual total runoff from 1988 to 1999. Data from 1988 to 1999 are presented because VIC outputs are available for that period only. All runoff data sets are shown at their original resolutions, i.e., UNH-GRDC at 0.5° resolutions (∼50 km), VIC at 0.25° resolution (∼25 km), NARR at 32 km resolution, and CLM at T42 resolution (∼280 km). The UNH-GRDC data shows some discontinuity (lower runoff) in the western part Kentucky. The VIC output follows the precipitation gradient shown in Figure 2b, and provides better geographic distribution of runoff in MRB [Roads et al., 2003]. The spatial distribution of total runoff from CLM is visually comparable to that from VIC; whereas NARR provides lower total runoff in all area of MRB.
 The intra-annual variability of total runoff is shown in Figure 11a. Naturalized observed runoff from Maurer and Lettenmaier (online report, 2001) is also included in the analysis. The monthly runoff from VIC is closer to the observation, with an RMSE of 2.7 mm/month. The UNH-GRDC data show lower monthly runoff from June to December, and the annual average RMSE for UNH-GRDC data is 6.2 mm/month. The monthly runoff from CLM is higher during winter months, and is comparable with VIC runoff during summer months. Compared to the observed runoff data from Maurer and Lettenmaier (online report, 2001), annual average RMSE in CLM monthly runoff is 7.7 mm/month. The runoff data from NARR is lower for all months, and the annual average RMSE in NARR is 12.7 mm/month compared to that of Maurer and Lettenmaier (online report, 2001).
 The annual time series of total runoff is shown in Figure 11b, and the annual total runoff statistics are presented in Table 8. VIC output data match closely with observations, with a correlation coefficient of 0.94, bias of 2%, and RMSE of 13 mm/yr. The total runoff from CLM is higher for all years, with a bias of 19%, and RMSE of 47 mm/yr. Higher annual runoff from CLM may be due to higher runoff during winter months as mentioned in the previous paragraph. Overall, CLM total runoff captures the interannual variability, with a correlation coefficient of 0.91. CLM total runoff results found in this study are consistent with findings of Oleson et al. , who also found higher overestimation of total runoff and high correlation coefficient. NARR gives lower total runoff for all years, with a bias of −62% and RMSE of 151 mm/yr. In addition, NARR data show relatively poorer correlation (correlation coefficient: 0.52) in comparison with CLM. Regridding of NARR data did not affect monthly or annual runoff results in MRB (Figure 11).
Table 8. Annual Total Runoff Statistics (1988–1999) in MRB
4.5. Results Summary
 NARR and CLM outputs are evaluated for surface water and energy fluxes in MRB using energy flux observations, and other relevant data/model outputs (e.g., runoff observations). Monthly climatology of near surface air temperature and precipitation of NARR is comparable to CLM and PRISM data. However, sensible heat flux and latent heat flux differ significantly between NARR and CLM outputs. Compared to average AmeriFlux data from 11 sites, NARR shows relatively higher biases in incoming solar radiation (24%), sensible heat flux (27%), and latent heat flux (59%); whereas CLM shows relatively smaller biases in incoming solar radiation (0.5%), sensible heat flux (−2%), and latent heat flux (11%). Similarly, annual and monthly total runoff is also better simulated by CLM compared to NARR. Based on 25 years (1980–2004) monthly climatology water and energy balance component in MRB, NARR has 11% energy balance closing error (Lht + Sht + Ght = 1.11 Rn) and 12% water balance closing error (ET + N = 1.12 P); whereas CLM does not have water and energy balance closing error by virtue of model design. Net radiation in NARR (84.7 W/m2) is higher compared to CLM (69.7 W/m2). Overall, CLM outputs provide better characterization of surface water and energy fluxes in MRB.
 The issue of comparing point-scale observations with gridded model outputs is addressed by using 113 years of monthly precipitation and temperature records at 71 USHCN stations in Indiana and Illinois. It is found that monthly precipitation and temperature are not statistically different for at least 248 km distance in Indiana and Illinois. Analysis of pair-wise energy flux observations from neighboring stations show that effects of land cover type on summer latent heat flux/ET (which is greater than 80% of annual total ET) is minimal. Sensible heat flux show higher difference compared to latent heat flux at neighboring stations with different land cover types.
5.1. Can Reanalysis Be Used As a Surrogate for Observations?
 Reanalysis data/outputs are often used as a surrogate for observations for verification of climate model outputs [Covey et al., 2003; Gates et al., 1999; Kumar et al., 2010; Lambert and Boer, 2001; Reichler and Kim, 2008] (http://www.cgd.ucar.edu/cms/diagnostics/), and other relevant hydroclimatic studies [Dominguez and Kumar, 2008; Dominguez et al., 2008; Diffenbaugh, 2009; Fall et al., 2010]. This study used an improved version regional reanalysis product (NARR) [Mesinger et al., 2006], and found that while precipitation and near surface air temperature are comparable to observed data, ET and runoff outputs have significant biases. The water and energy balance error observed in NARR could be due to: (1) assimilation of a limited number of available hydroclimatic variables in the NARR system, (2) biases in net radiation, and (3) parameterization of surface energy fluxes in NARR. While precipitation is an assimilated variable in NARR, sensible heat flux, latent heat flux and runoff are not assimilated in NARR. This finding suggests that reanalysis fields for which observations are not assimilated (e.g., sensible and latent heat flux) should be used with caution. Some aspects of surface energy flux parameterization difference between NARR and CLM are discussed in section 5.5.
5.2. Comparison of Point-Scale Observations With Gridded Model Outputs
 Reanalysis data or other climate model outputs have the advantage of being continuous in spatial and temporal domain. In contrast, AmeriFlux data are sparse in spatial domain, and are available for a relatively short temporal domain (less than 15 years). This study show that CLM outputs are comparable to point-scale energy flux observations at many sites, and CLM also produces better results (e.g., runoff for the MRB) at the continental basin scale (3.2 million sq. km) compared to NARR. Hence, in addition to providing valuable information for model development [Stöckli et al., 2008], point-scale observations are valuable data set for evaluation or verification of global and regional climate model outputs. Point-scale energy flux observations cannot replace reanalysis outputs, but can provide first-order assessment for both reanalysis data and climate model outputs. Hence, point-scale observations should be taken into consideration alongside available reanalysis outputs in hydroclimatic studies.
 Comparison of point-scale observations with climate model grid cell outputs also brings up the issue of spatial scale, and how the heterogeneity in topography, land cover and soil within a model grid cell are captured in a point measurement. Results from this study show that the issue of scale and heterogeneity are masked at monthly time scale over the grid cell size of 280 km used in this study. Vinnikov et al.  have proposed statistical models for spatial and temporal variability for soil moisture observations in the midlatitude region (former USSR), having similar formulation (first-order Markov process). Temporal averaging within a month in the same year, and averaging during same months for different years could be compensating for the spatial variability within a grid cell for ET or latent heat flux measurements shown in Figure 6. This, however, may change for different variables in different regions such as mountainous regions.
 This study also found that the correspondence between point-scale measurement and CLM grid cell output is poor in the Rocky mountain range (U.S._NR1_ENF site). Some past studies [e.g., Han and Roads, 2004] have shown that a high-resolution climate model performs better in a mountainous region compared to a low-resolution climate model such as T42 resolution (280 km) in CLM. The spatial variability analysis conducted in this study using precipitation and temperature data from Indiana and Illinois (relatively flat topography) may not remain valid for the highly variable topographic regions (e.g., Rocky Mountain region). Higher-resolution climatic models may be needed for similar analysis in mountainous regions. For example, NARR (32 km resolution) performance is better for precipitation and temperature compared to CLM forcing data (280 km resolution) at U.S._NR1_ENF site (Table 5).
5.3. Latent Heat Flux Versus Sensible Heat Flux: Effect of Land Cover
 Point-scale measurements of sensible heat flux are found to be less correlated with the nearest climate model grid cell outputs compared to latent heat flux measurements (e.g., average correlation coefficient of CLM and AmeriFlux is 0.80 for latent heat flux, and 0.40 for sensible heat flux) [see also Randerson et al., 2009]. Pair-wise comparison of AmeriFlux observations (Table 7) show that sensible heat flux has a higher difference compared to latent heat flux at neighboring stations having different land cover types (e.g., summer average latent heat flux and sensible heat flux for U.S._MMS_DBF/U.S._BO1_CRO sites are 73/70 and 18/32 W/m2, respectively). These results suggest that land surface heterogeneity (e.g., land cover type) has higher effects on sensible heat flux compared to latent heat flux; that is, land surface hydrologic response (ET/latent heat flux) is more stable compared to the land surface thermal response (sensible heat flux). This finding is consistent with some previous studies including the African Monsoon Multidisciplinary Analysis (AMMA) project where Ramier et al.  found that the stability in ET is achieved at the expense of variability in other energy (e.g., sensible heat flux) and water balance components (e.g., soil water storage). Similarly, by using 6 years of energy flux data in a Mediterranean climate region (central California), Ryu et al.  found that the interannual variability in ET is much less compared to the twofold change in annual precipitation during the observation period.
5.4. Crop Region: Bimodal Pattern of Sensible Heat Flux
 Bimodal pattern in monthly climatology of sensible heat flux is found at four crop cover AmeriFlux sites in Dfa region. For one crop cover AmeriFlux sites in Cfa region (U.S._ARM_CRO), bimodal pattern of sensible heat flux was not found. For two irrigated crop cover sites (U.S._Ne1_CRO, and U.S._Ne2_CRO), mean value of sensible heat flux is nearly zero or slightly negative during July and August months, which also coincides with highest latent heat flux months in the year (not shown). This special phenomenon was explained by Tanner and Lemon  and Monteith  as the effect of advection of dry air over cooler and water sufficient crop area. ET from irrigated crop area exceeds net radiation, and hence temperature of irrigated crop area is lower than the air temperature. The deficit in the energy demand is met by the downwind transfer of sensible heat flux from the warmer air into the cooler crop area (negative sensible heat flux). During the senescence period, ET decreases, and gradually sensible heat flux becomes the dominant energy flux [Li et al., 2005].
5.5. Model Parameterization Differences in NARR and CLM
 NARR captures bimodal pattern of sensible heat flux at crop sites (e.g., U.S._Bol_CRO), whereas CLM does not capture the bimodal pattern of sensible heat flux (Figure 6b). NARR uses the Noah land surface parameterization scheme [Chen and Dudhia, 2001], where sensible heat flux is calculated as the residual of the energy balance terms (Sht = Rn − Lht − Ght) [Sridhar et al., 2002], hence Sht can become uncoupled from temperature seasonality (one peak during the year) as Lht + Ght approaches to Rn. Whereas in CLM parameterization, sensible heat flux is calculated as heat transfer from ground/vegetated surface to atmosphere using temperature difference between atmosphere and ground/vegetated surface as potential difference and aerodynamic resistance as resistance based on Ohm's law [Oleson et al., 2004, pp. 56–64]. Hence, it is less likely that Sht can become uncoupled from temperature seasonality in CLM. Ground heat flux is calculated as the residual of the energy balance terms in CLM (Ght = Rn − Lht − Sht) [Oleson et al., 2004, p. 77]. Although the opposing signs of ground heat flux cancel each other at longer time scale (e.g., annual), ground heat flux can be substantial at daily time scale. Ramier et al.  show the mean absolute value of ground heat flux as less than 7% of (Lht + Ght) based on 2 years of observations in the AMMA project. During the wet spell, soil releases the heat to the atmosphere; whereas during the dry spell, soil gains heat from the atmosphere [Ramier et al., 2009]. Based on point-scale model run at three flux tower sites, Wang et al.  found that the ground heat flux is poorly simulated compared to latent heat flux and sensible heat flux in the CLM (version 3) model. On the other hand, Sridhar et al.  found good correspondence (statistically significant strong correlation) between measured and modeled ground heat flux at 7 flux measurement sites in Oklahoma using the Noah land surface model. There could be other parameterization differences that may be contributing to the differences in sensible heat flux monthly climatology that is seen in this study. Further investigation of this issue may require running the CLM and Noah land surface model at point scale. Santanello et al.  have compared CLM and Noah land surface model coupled with different Planetary Boundary Layer schemes, but their analysis results are limited to diurnal time scale, and are not sufficient to analyze the seasonal cycle.
5.6. Evaluation of NARR and CLM ET Using Budyko Curve
 Large differences between NARR and CLM latent heat fluxes in the eastern part of MRB (Figure 4), and energy balance closure issue with AmeriFlux observations (section 4.2), warrant further investigation of this issue using some independent approach. Budyko Curve is a top-down approach for basin average ET estimation as opposed to bottom-up process based approach, e.g., NARR and CLM. Budyko postulated that for long-term average, under very dry conditions actual ET is limited by precipitation, and under very wet conditions actual ET is limited by available energy [Budyko, 1958]. Between these two limits a number of curves have been proposed to account for increasing complexity e.g., catchment characteristics and finer temporal scale [Milly, 1994; Koster and Suarez, 1999; Zhang et al., 2001, 2004, 2008]. Figure 12 shows Budyko curve given in equation (5). This curve is applicable for the steady state condition (Δs = 0 in equation (1), long-term annual mean), and its validation over 250 catchment has been shown by Zhang et al. .
where Φ ( = ) is the dryness index, and w is the plant available water coefficient (w = 2.0 for forest, and w = 0.5 for short grass and crops).
 The performance of CLM and NARR ET in the Ohio and Tennessee Basin (the eastern part of the MRB, Regions 05 and 06, Figure 2c) are evaluated for long-term annual mean (1980–2004; Figure 12). The observed mean annual ET for the Ohio and Tennessee basins is calculated as the difference between observed mean annual precipitations (using PRISM data) and the observed mean annual streamflow data at United States Geological Survey (USGS) gauging station 03611500 (Ohio river at Metropolis, Illinois with a drainage area of 0.52 million km2). Observation falls on Budyko curve (ET/P = 0.58, and Φ = 0.80), whereas NARR show overestimation of ET (ET/P = 0.92, and Φ = 1.12), and CLM show underestimation of ET (ET/P = 0.35, and Φ = 0.85). Observed ET estimate can have some positive bias due to flow regulations upstream of the gauging station (∼5%) (Maurer and Lettenmaier, online report, 2001; see also USGS Water Data Report 2009, http://wdr.water.usgs.gov/wy2009/pdfs/03611500.2009.pdf). Lower ET/P in CLM is also consistent with overestimation of CLM runoff as discussed in section 4.4. Budyko approach also confirms that NARR overestimates ET in MRB.
5.7. Comparison of This Study With the WEBS Study
 A comparison of this study with the WEBS study for major surface water and energy budget terms (ET/P, N/P, Lht/Rn, Sht/Rn) is presented in Table 9. Partitioning of precipitation into ET and runoff has not significantly improved in NARR as compared to REAN1 and REAN2. ET approximately balances total precipitation in all three reanalysis products. In NARR, water balance closing error (ET + N = 1.12 P) is similar to REAN2 (ET + N = 1.11 P). It seems that runoff is not an important variable for climate models, as also seen in GSM, RSM and ETA results (The WEBS study). CLM which is run in offline mode produces results comparable to VIC model. CLM coupled model run (coupled with atmosphere and ocean components) is not available for evaluation. Latent heat flux has decreased and sensible heat flux has increased in NARR compared to REAN1, REAN2, GSM, and RSM. The issue of negative sensible heat flux during winter as identified in the WEBS study has improved in NARR (Figures 5 and 6).
The WEBS climatology is for 1996 to 1999, NARR and CLM (this study) is for 1980–2004, and the observation (OBS) is for 1988 to 1999. Observed ET is estimated as difference between average PRISM precipitation and average naturalized runoff for 1988 to 1999. Details of the WEBS study are given by Roads et al. .
6. Concluding Remarks
 This study highlights the differences in climate model output (CLM), observations (AmeriFlux) and reanalysis products (NARR) in hydroclimatic assessments at continental basin scale. Hydroclimatic variables for which observations are not assimilated in the reanalysis products (e.g., ET and runoff in NARR) should be used with caution for evaluation of climate model outputs. For example, evaluation of CLM using NARR only may show that CLM underestimates ET or latent heat flux, and overestimates sensible heat flux in MRB (Figure 5). However, this is not the case as found in this study because CLM ET output matches more closely with AmeriFlux data compared to NARR ET (Figure 6).
 Availability of AmeriFlux observations in recent years has proved to be an important data source to improve our understanding of land surface and atmospheric interaction such as variability in latent heat flux compared to sensible heat flux and bimodal seasonal cycle of sensible heat flux as found in this study for MRB. Issues related to scaling (point-scale measurement versus climate model grid cell outputs) and energy balance closure in Level 4 AmeriFlux data should be addressed in future studies so that AmeriFlux observations can be used more extensively in hydroclimatic studies.
 Spatial variability analysis of monthly precipitation and temperature using 71 USHCN in Indiana and Illinois show that monthly precipitation and temperature vary at a scale larger than the average distance (39 km) between any two neighboring stations in this study. Gridded network of AmeriFlux sites is needed to conduct similar studies for latent and sensible heat fluxes. Pair-wise analysis of energy flux observations at neighboring AmeriFlux sites in this study is limited by small sample size. Field experiment similar to AMMA project (same climatic condition and different land cover type) [Ramier et al., 2009] in the midlatitude region can give more information on effects of land surface heterogeneity on sensible versus latent heat fluxes.
 Surface energy flux parameterization and formulation difference between Noah land surface model (land component of NARR) and CLM discussed in this paper indicates room for further improvement in CLM (how to capture bimodal seasonal cycle of sensible heat flux, underestimation of ET in the eastern domain). Basin-scale energy and water balance study, as well as the Budyko approach, shows overestimation of ET in NARR.
 NARR data show significantly lower (62%) total runoff in MRB. This finding is consistent with other studies related to hydrologic validation of reanalysis data [Hagemann et al., 2005; Lucarini et al., 2007], as well as the WEBS study. Assimilation of observed runoff data may address this issue in future reanalysis projects. CLM has relatively improved runoff output that is comparable to a mesoscale hydrologic model runoff output (VIC in this study).
 This study is constrained by the limited number of models (CLM and NARR) and only one study region (MRB). Similar studies incorporating other models as well as other study areas will help to bring more confidence in the global climate model simulation results, which can then be used for regional-scale water resources planning and management.
 The authors deeply acknowledge the support of K. W. Oleson (NCAR) and R. Stöckli (MeteoSwiss, Switzerland) in providing the CLM3.5 outputs and valuable comments. NARR data were provided by NOAA/OAR/ESRL PSD, Boulder, Colorado, from their Web site at http://www.cdc.noaa.gov/. PRISM data were provided by the PRISM Climate Group at Oregon State University, http://www.prismclimate.org. The authors would also like to thank Laura Bowling, Suresh Rao, and Rao S. Govindaraju, all at Purdue University, for providing valuable suggestions during the course of this study. Comments from three anonymous reviewers led to significant improvement of the earlier version of this manuscript. This material is based upon the work supported partially by the National Science Foundation under grant 0619086. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.