This study evaluates three major Numerical-Weather-Prediction reanalyses (ERA-Interim, NCEP/NCAR Reanalysis I, and NCEP/DOE Reanalysis II) in modeling surface relative shortwave cloud forcing, cloud fraction, and cloud albedo. The observations used for this evaluation are decade-long surface-based continuous measurements of the U.S. Atmospheric Radiation Measurement (ARM) program from 03/25/1997 to 12/31/2008 over the Southern Great Plains site. These cloud properties from the reanalyses are evaluated at multiple temporal scales. Like the observations, all the reanalyses show a strong annual cycle, and relatively weak diurnal or inter-annual variations of the cloud properties. The reanalyses exhibit significant underestimation on the cloud properties, and the model biases in the cloud properties in general reveal a linear link to one another and are somewhat related to cloud fraction magnitude. Further examination shows that the cloud properties are strongly related to 2-m relative humidity, especially for the observations and ERA-Interim. However, the relationship between the cloud properties and 2-m temperature and specific humidity is much weaker. Also, the cloud fraction biases in the two NCEP reanalyses increase (decrease) with the relative humidity (temperature and specific humidity), but the cloud fraction biases in ERA-Interim show no (opposite) relationship with the relative humidity (temperature and specific humidity). The relative humidity biases have a positive (negative) linear relationship with the specific humidity (temperature) biases. A combined statistical analysis using the technique of Taylor diagrams and a newly developed metric “Relative Euclidean Distance” indicates that ERA-Interim and NCEP/NCAR reanalyses have the best and worst overall performance in modeling the cloud and meteorological properties examined, respectively, except that NCEP/DOE Reanalysis II ranks the best in modeling the monthly temperature and specific humidity.
 Climate prediction depends on modeling, so there is a pressing need to quantify model uncertainties and reduce model biases. Among numerous model uncertainties, the representation of clouds and associated radiative processes has been recognized as one of the major factors in global climate models (GCMs), which limit the accuracy of climate prediction [Intergovernmental Panel on Climate Change (IPCC), 2007]. As a consequence, model evaluations to identify deficiencies in the parameterization of clouds and associated processes remain a field of intensive research.
 To address this long-standing issue, the U.S. Department of Energy's (DOE's) Earth System Modeling program funded a new model evaluation project in 2009: the FAst-physics System TEstbed and Research (FASTER) project. The main objective of this multi-institutional project is to utilize various long-term measurements collected by the DOE's Atmospheric Radiation Measurement (ARM) program over the ARM sites [Stokes and Schwartz, 1994; Ackerman and Stokes, 2003] to accelerate the evaluation and improvement of the parameterizations of cloud-related fast processes in large-scale climate models. This paper is an initial evaluation of cloud and radiative properties and their links to surface meteorology in three major Numerical-Weather-Prediction (NWP) reanalyses.
 As representations of the state of the atmosphere, reanalyses are generated via a state-of-art analysis and forecast system assimilating data from a wide variety of observations including ships, satellites, ground stations and radar. The long-term, consistent global distributions of reanalyses make them particularly valuable in climate research, and have been widely used as a base-line in studying global climate change and climate modeling [Lu et al., 2005; Anderson et al., 2008; Haimberger et al., 2008; Betts et al., 2009; Simmons et al., 2010; Rye et al., 2010]. Nonetheless, reanalysis data may have time-varying biases, which limit their value for characterizing long-term climate trends [e.g.,Thorne and Vose, 2010; Dee et al., 2011a]. Furthermore, cloud observations are not directly assimilated into current reanalyses, and thus cloud-related properties of reanalyses are expected to suffer from problems resulting from deficient model parameterizations, similar to GCMs. Accordingly, observationally based evaluations of widely used reanalyses in modeling cloud and associated radiative properties are crucial to both current climate research and future development of reanalysis.
 Observationally based evaluations of reanalyses in modeling cloud and radiative properties have been conducted in previous studies [Jakob, 1999; Allan and Ringer, 2003; Chevallier et al., 2005; Betts and Viterbo, 2005; Betts et al., 2006; Bedacht et al., 2007; Betts, 2007; Weidle and Wernli, 2008; Betts et al., 2009; Xu, 2009]. Most of them are done by using satellite-based observations. Evaluations using ARM surface-based cloud and radiation measurements are limited in both the number of studies and the scope of data evaluated. For example,Walsh et al. evaluated four reanalyses in modeling 1999–2006 monthly Arctic cloud fraction and radiative fluxes by using the ARM surface-based measurements at the North Slope of Alaska of cloud base height from ceilometers and shortwave/longwave flux from sky/ground radiometers. The four reanalyses were from National Centers for Environmental Prediction National Center for Atmospheric Research, 40-yr European Centre for Medium-Range Weather Forecasts Reanalysis, North American Regional Reanalysis, and the Japan Meteorological Agency and Central Research Institute of Electric Power Industry 25-yr Reanalysis. They found that the performance of the reanalyses in modeling radiative fluxes mainly depends on their performance in modeling cloud fraction, and the systematic errors of reanalysis cloud fractions are substantial. For the purpose of investigating the applicability for forcing single-column and cloud-resolving models,Kennedy et al. evaluated reanalysis data over the ARM Southern Great Plains (SGP) site from the Modern-Era Retrospective analysis for Research and Applications (MERRA) and the North American Regional Reanalysis (NARR) using 1999–2001 ARM continuous forcing product and surface-based sounding data from Climate Modeling Best Estimates (CMBE) data [Xie et al., 2010]. They found that ARM continuous forcing and the reanalyses show good agreement with the CMBE sounding data, with biases being 0.5 K, 0.5 m s−1, and 5% for temperature, wind, and relative humidity. However, larger disagreements occur in the upper troposphere for temperature, humidity, and zonal wind, in the boundary layer for meridional wind and humidity. They also found that the phase patterns of the seasonal cloud fraction from the 3-year MERRA and NARR are similar to those from ARM radar–lidar and Geostationary Operational Environmental Satellite, and MERRA has better agreement with ARM observations on surface shortwave and longwave fluxes than NARR.
 In this study we evaluate three major NWP reanalyses in modeling surface relative shortwave cloud forcing (see section 2.3.1for the definition), cloud fraction, and cloud albedo. Multiscale (diurnal, annual and inter-annual) mean variations of the cloud properties are evaluated. We also examine model-error propagation paths through the investigation of the model biases and their links to one another, and to the 2-m temperature and moisture (relative and specific humidity) fields. Decade-long (1997 to 2008) surface-based continuous ARM value-added products (VAP) over the SGP Central Facility site are used as a standard for the evaluation. The rest of the paper is organized as follows.Section 2 briefly introduces the data and methods used. Section 3 shows the multiscale mean variations of the cloud properties and discusses potential uncertainties. Section 4 analyzes model biases and their links. Section 5evaluates the overall performance of the reanalyses in modeling the cloud properties and 2-m temperature/humidity.Section 6 summarizes this study.
2. Data and Methods
 This section briefly introduces the data (e.g., observations and reanalyses) and methods used in this study. Note that, to be consistent with shortwave radiation observations, this study only evaluates daytime (6 am to 6 pm) cloud properties and corresponding temperature/humidity.
 The observations used are the high-resolution ARM VAP (15-min SIRS data) derived from the well-calibrated surface-based Solar and Infrared Radiation System (SIRS) over the SGP Central Facility (262.51°E, 36.61°N) from 03/25/1997 to 12/31/2008 (http://www.arm.gov/instruments/sirs). Long and his coworkers [Long and Ackerman, 2000; Long et al., 2006] produced a 15-min mean data set of all-sky and clear-sky-fit surface downwelling shortwave (SW) fluxes, and a fractional sky cover (“cloud fraction” hereafter) derived from an analysis of surface measurements of downwelling total and diffuse SW irradiance, carefully screened and corrected for optically thicker overcast cases. Here, the cloud fraction is retrieved based on clear-sky surface downwelling SW total/diffuse irradiance and the effect of clouds on the SW irradiance, estimated by using the methodology developed byLong and Ackerman .
 The observed cloud fraction represents 15-min averages of 160° field-of-view (FOV) hemispheric fractional sky cover from the surface.Long et al. found good agreement (with root-mean square error <8%) between this estimate and hemispheric sky imager data. By definition this cloud fraction differs from the plane-parallel cloud fraction from a nadir view or from a climate model.Liu et al. [2011, Figure 2] found that the hourly averaged cloud fraction from the same 15-min hemispheric-view cloud fraction showed reasonable agreement (with correlation coefficient 0.86) with the hourly cloud fraction from the Geostationary Operational Environmental Satellite (GOES) satellite. More discussion on cloud fraction and associated uncertainty is deferred toSection 3.2.
 We then compute surface relative shortwave cloud forcing (SRCF) from the all-sky and clear-sky-fit SW fluxes, and cloud albedo is estimated using both SRCF and cloud fraction. In addition, we also use the 30-min averaged data streams of 2-m air temperature and relative humidity, and barometric pressure from the Surface Meteorological Observation System instruments from 04/01/2001 to 12/31/2008 for analyzing the links of the model biases in cloud properties to the 2-m meteorological conditions.
 Three NWP reanalyses are evaluated in this study: ERA-Interim, NCEP/NCAR Reanalysis I (R1 hereafter), and NCEP/DOE Reanalysis II (R2 hereafter). The abbreviations “ERA,” “NCEP,” and “NCAR” denote “European Centre for Medium-Range Weather Forecasts (ECMWF) global atmospheric reanalysis,” “National Center for Environmental Prediction,” and “National Center for Atmospheric Research”, respectively. A brief introduction is given below.
 ERA-Interim is the latest version of ECMWF's global atmospheric reanalysis available from 1989 to present [Dee et al., 2011b]. This reanalysis was archived in a horizontal resolution of T255 spherical-harmonic representation for the basic dynamical fields or N128 reduced Gaussian grid with approximately uniform 79 km spacing for surface and other grid point fields, with 60 vertical levels (P. Berrisford et al., The ERA-Interim archive, August 2009,http://www.ecmwf.int/publications/library/do/references/list/782009). The global archive has 3-hourly (for surface parameters) or 6-hourly (for upper-air parameters) time resolution, but for selected points, including the ARM SGP site, hourly data were archived and used in this study. The major improvements in ERA-Interim include the representation of hydrological cycle, the quality of stratospheric circulation, and the handling of biases and changes in the observing system. One advantage of this reanalysis is its high spatial and temporal resolutions, better for studying regional diurnal variations. Detailed description about this reanalysis can be found at the ECMWF web site athttp://www.ecmwf.int//research/era/do/get/ERA-Interim. We use: 1) hourly averaged clear-sky surface net SW flux, and all-sky surface net and surface downwelling SW fluxes, and total cloud cover, 2) hourly interval (i.e., instantaneous) 2-m air temperature and specific humidity, and surface pressure, from 03/25/1997 to 12/31/2008. The hourly data are the outputs from the first 0–12 h forecasts from twice-daily analysis. All the data used are from a Gaussian grid centered at (262.50°E, 36.84°N) over the ARM SGP site.
 Note that, ERA-Interim hourly clear-sky surface downwelling SW flux (SWdnclear) is not available so it is calculated from equation (1)using the available clear-sky surface net SW flux (SWnetclear), all-sky surface downwelling SW flux (SWdnall) and all-sky surface net SW flux (SWnetall),
Equation (1) is derived based on equation (2) in Betts et al. under the assumption that surface albedo is the same for clear-sky and cloudy-sky conditions. A small number of hourly data points were excluded from the analysis (25 out of the 12-year record) for reasons such as data truncation errors or unrealistically large derived surface albedos.
 Note also that, ERA-Interim hourly 2-m relative humidity is not available so it is calculated from the 2-m specific humidity usingequation (2) [i.e., Peixoto and Oort, 1992, equation (3.62)],
where rh, q, qs, p, es, and T represent relative humidity, specific humidity (kg/kg), saturation specific humidity (kg/kg), pressure of moist air (hPa), saturation vapor pressure (hPa), and temperature (K) of moist air. The saturation vapor pressure es is calculated using equation (3) of saturation vapor pressure, recommended by World Meteorological Organization (http://cires.colorado.edu/∼voemel/vp.html),
 Modeled pressure in the reanalyses is very tightly constrained by the observations to about 1 hPa: the standard deviation of the difference between the hourly ERA-Interim and the observations equals 0.79 hPa, and that between the 6-hourly R1 (R2) and the observations equals 1.13 (1.28) hPa. Hence the ratio of specific humidity and saturated specific humidity accurately reflects the modeled relative humidity.
2.2.2. NCEP/NCAR Reanalysis I
 R1 [Kalnay et al., 1996; Kistler et al., 2001] provides data from 1948 to the present with 6-h temporal resolution. The spatial resolution archived in this reanalysis is T62 Gaussian grid (∼210 km), or 2.5° latitude × 2.5° longitude non-Gaussian grid, with 28 vertical levels. This reanalysis was generated by the analysis and forecast system performing data assimilation using a wide variety of weather observations including ships, satellites, ground stations and radar. In this paper, we use: 1) 6-hourly averaged clear-sky and all-sky surface downwelling SW flux, and total cloud cover, 2) 6-hourly interval forecasted 2-m air temperature and specific humidity, and surface pressure from 03/25/1997 to 12/31/2008. Detailed description about the data can be found athttp://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.other_flux.html. All the data used are from a Gaussian grid centered at (262.50°E, 37.14°N) over the ARM SGP site. Note that the reanalysis forecasted data of 2-m air temperature (or specific humidity) and surface pressure are strongly impacted by the reanalysis model as stated inKalnay et al. . These data are downloaded from the NOAA's Physical Science Division web site at http://www.esrl.noaa.gov/psd/data/gridded/reanalysis/. Note also that R1 6-hourly 2-m relative humidity is not available so it is calculated usingequations (2) and (3).
2.2.3. NCEP/DOE Reanalysis II
 R2 is the second version of R1. It covers data from 1979 to present, available at the same web site as R1. R2 is believed to be an improved version of R1, with a number of errors fixed, updated parameterizations of physical processes (including a new SW radiation scheme which significantly reduced surface insolation by about 8%), and the addition of more observations [Kanamitsu et al., 2002]. Similar to R1, we use: 1) R2 6-hourly averaged all-sky surface downwelling SW flux, and total cloud cover, 2) R2 6-hourly interval forecasted 2-m air temperature and specific humidity, and surface pressure from 03/25/1997 to 12/31/2008 over the same Gaussian grid. Because R2 6-hourly 2-m relative humidity is not available, we calculated it usingequations (2) and (3). Since the archive does not provide R2 clear-sky surface downwelling SW flux, we estimate it by using R1 clear-sky surface downwelling SW flux and the following expressions for the best curve fit to the upper envelop of the available R2 all-sky surface downwelling SW flux,
where SWdnclear(t, d; R1) and SWdnclear(t, d; R2) denote R1 and R2 6-hourly averaged clear-sky surface downwelling SW fluxes,tdenotes 6-hourly time of a day, andd denotes calendar day of a year. We found that the 8% surface insolation reduction [Kanamitsu et al., 2002, Figure 8] was only valid over some calendar day ranges (4b), so we adjusted the correction coefficient over calendar day ranges as given by the expressions: (4a) to (4d). This estimated R2 6-hourly averaged clear-sky surface downwelling SW flux is shown inFigure 1, together with the all-sky surface downwelling SW flux.
 The methods used in this study are detailed in this section. We first briefly introduce the calculations of SRCF and cloud albedo, and then give a detailed description of the procedures used in this study.
 SRCF (also called “effective cloud albedo”) was first proposed by Betts and Viterbo to quantify the impact of the cloud field on the surface radiative budget over a southwest basin of the Amazon. It is a non-dimensional measure of surface SW cloud forcing (SWcld = SWdnall − SWdnclear, an upward flux), defined as
 Here, a positive value of the SW fluxes indicates a downward flux. Based on equation (5), we can calculate SRCF if all-sky and clear-sky surface downwelling SW fluxes are available.
 Furthermore, equation (5)indicates that SRCF represents the fraction of clear-sky incoming downward SW flux which is reflected and absorbed by clouds. This non-dimensional quantity offers an effective measure of surface SW cloud forcing and minimizes the influence from solar zenith angle and other non-cloud factors. More discussions and applications of SRCF can be found in previous studies [Betts et al., 2006, Betts, 2009; Betts et al., 2009; Betts and Chiu, 2010; Liu et al., 2011].
2.3.2. Cloud Albedo
Liu et al.  derived an analytical expression that quantifies the relationship between SRCF, cloud fraction, and cloud albedo,
Where f and αr denote cloud fraction, and cloud albedo, respectively. Liu et al. demonstrated that the estimated cloud albedo by using the surface-based radiation and cloud fraction shows reasonable agreement (correlation coefficient 0.69) with that from satellite measurements. In this study, cloud albedo is estimated using the same procedure, and the same surface-based observations as inLiu et al. . Note that αr implicitly includes cloud absorptance when cloud absorption cannot be ignored, esp., for strongly absorbing clouds.
2.3.3. Procedures of Evaluation
 Detailed procedures of the evaluation are described below. First, the 15-min all-sky/clear-sky surface downwelling SW flux and cloud fraction observations are averaged into hourly data. Here, only those with 4 valid 15-min data points within one hour are used. The valid 15-min data points refer to those with 15-min all-sky/clear-sky surface downwelling SW flux greater than zero and 15-min cloud fraction between 0 and 1. We use the hourly data (for example,xi(d, m, y), i, d, m, yrepresent hour, day, month, and year respectively) to calculate the mean variations of hourly all-sky/clear-sky surface downwelling SW flux and cloud fraction. Considering that the valid hourly observational data points are not evenly distributed temporally, we first calculate each-year seasonal mean values of each-hour variables ( , srepresents season, the line on the top of the function represents mean, and the line with “s” represents seasonal mean), and then average into the overall mean values of each-hour variables. Those valid hourly all-sky/clear-sky surface downwelling SW flux and cloud fraction between 6 am and 6 pm (local standard time: LST) are further averaged into daytime-mean data. Daytime-mean (6 am–6 pm LST) SRCF is calculated using daytime-mean all-sky and clear-sky surface downwelling SW flux. Those daytime-mean SRCF (>−0.05) and cloud fraction are further averaged into monthly data.
 The mean value of overall hourly cloud albedo is calculated using the mean values of overall hourly SRCF and cloud fraction. The daytime-mean cloud albedo is calculated using daytime-mean SRCF and cloud fraction for those with daytime-mean cloud fraction greater than 0.05. We use this filter because cloud albedo is not well described byequation (6) when cloud fraction is small [Liu et al., 2011]. The monthly cloud albedo is calculated by using monthly SRCF and cloud fraction. The mean variations of monthly/yearly cloud properties are the averages of the monthly cloud properties.
 Next, the cloud properties from hourly ERA-Interim and 6-hourly R1 and R2 are used to calculate the mean values of daytime hourly and 6-hourly cloud properties. The calculation procedures are the same as for the observations. Here, only those hourly/6-hourly reanalysis data concurrent to those valid hourly observations are used. The concurrent 6-hourly reanalysis data refer to those: within those concurrent 6 h the hourly observations have valid data. Further, those concurrent hourly/6-hourly reanalyses are averaged into daytime-mean, and then those daytime-mean reanalysis data concurrent to those valid daytime-mean observations are averaged into monthly data. The yearly data are the averages of monthly data. The diurnal/annual/interannual cloud properties from the reanalyses are then evaluated based on the observations.
 After that, for diagnosing the path of model-error propagation, the model biases (model minus observation) in the cloud properties and their relationships are analyzed. Further examined are the relationship between the cloud properties and 2-m temperature/humidity. To do so, we first aggregate the observed 30-min averaged 2-m air temperature, relative humidity and surface pressure into hourly data, and then use hourly temperature, relative humidity and surface pressure to calculate hourly specific humidity byequations (2) and (3). After that, using the same method as for the cloud properties, we use concurrent valid hourly data to generate daytime-mean, monthly, yearly temperature/humidity. And, the three reanalyses' hourly/6-hourly cloud properties and the meteorological variables with concurrency to the valid hourly observations are averaged into daytime-mean and then monthly data. The relationship between the daytime-mean/monthly cloud properties and temperature/humidity from all the data sets are first examined. Then, we compare the multiscale mean variations of 2-m temperature/humidity and corresponding model biases. After that, the relationship between relative humidity (or cloud fraction) biases and the temperature/humidity is examined.
 It is noted that R1 and R2 6-hourly 2-m temperature and relative/specific humidity have three daytime data points at 6 A.M., 12 P.M. and 6 P.M. (LST). Thus, the calculation of the daytime mean is not as straight-forward as the hourly/6-hourly averaged data or the hourly interval data (considering that variations within one hour are small). We examine three common ways to obtain the daytime mean from the three data points using the hourly ERA-Interim data: [(6 am + 12 pm)/2 + (12 pm + 6 pm)/2]/2, [(6 am + 12 pm + 6 pm)/3], and [6 am + 4 × (12 pm) + 6 pm]/6. The last formula is from the Simpson's rule for parabolas. The daytime mean from the three methods are compared to that from the mean of the 13 daytime hourly points. The results are shown inFigure 2, indicating that the first method has the smallest difference (standard deviation) from the 13-point averaged value. Based on this analysis, we chose the first method to calculate R1 and R2 daytime-mean temperature/humidity.
 Finally, for evaluating the overall performance of the reanalyses in modeling the cloud properties and the meteorological variables, we employ the widely used technique of the Taylor diagram [Taylor, 2001], and also develop a new metric “Relative Euclidean Distance” as a supplement to the Taylor diagram. The Taylor diagram reveals concise and easy-to-visualize second-order statistical differences between two (or more) different time series. It is especially useful for evaluating a model's performance in phase and amplitude of variations (as measured by the correlation coefficientr and standard deviation σ, respectively), and a model's “centered root-mean-square error”E (“RMS error” hereafter). This technique has been widely used in climate researches and IPCC assessment [e.g., IPCC, 2001; Anderson et al., 2004; Martin et al., 2006; Miller et al., 2006; Bosilovich et al., 2008; Gleckler et al., 2008; Pincus et al., 2008]. Briefly, the expressions for calculating r, σ, and E are shown below [Taylor, 2001],
where “Mn” or “On” denote a modeled or observed variable, defined at N discrete temporal (or spatial) points; and the subscript “M” or “O” denote “model” or “observation.” Note that, the correlation coefficient r, standard deviation σ and RMS error E are calculated without removing the periodic signals of the time series.
 As a supplement to the Taylor diagram, we develop a new metric “Relative Euclidean Distance” (D), based on Euclidean-Distance technique [e.g.,Elmore and Richman, 2001] and first- and second -order statistics,
 As can be seen from the expression (13), D measures the overall model performance. A perfect agreement corresponds to D = 0, and the overall model performance degrades as D increases.
3. Evaluation of Multiscale Cloud Properties and a Discussion on Potential Uncertainty
 This section first evaluates the cloud properties (i.e., SRCF, cloud fraction, and cloud albedo) from the reanalyses at diurnal, annual, and inter-annual temporal scales. Then, cloud fraction and potential uncertainty are further discussed.
3.1. Evaluation of Multiscale Cloud Properties
Figure 3 shows the multiscale mean variations of SRCF from the reanalyses and the observations (Figures 3a, 3b, and 3c), and corresponding relative model biases [i.e., (model minus observation) divided by observation] (Figures 3d, 3e, and 3f). The daytime each-hour standard deviation is calculated for each specific hour using seasonal-mean hourly SRCF after removing seasonal cycles (i.e., each-season mean value). The SRCF standard deviation in early morning (or late afternoon) shows slightly larger value than those at other daytime hours. The each-month standard deviations are calculated for each specific month using monthly SRCF. The SRCF shows larger standard deviation in fall/winter months than in other months. The standard deviation for annually averaged SRCF is calculated using monthly SRCF after removing seasonal cycles (i.e., each-month mean value). The cause of the large standard deviation in early morning (or late afternoon) or in fall/winter months is probably associated with the accuracy of the measurement and retrieval methods.
 It is evident that the SRCF from the observations and ERA-Interim has a strong annual cycle, with amplitude about 0.16 for the observations and about 0.10 for ERA-Interim. The SRCF peaks in March and reaches its lowest value in July. R2 exhibits a similar phase pattern with weaker amplitude of 0.05. It is noteworthy that despite having a variation amplitude (0.06) comparable to R2, R1 shows an opposite phase pattern, suggesting that R1 is the most problematic in modeling the observed phase of SRCF annual cycle, although it shows the smallest biases from April to September. The diurnal (inter-annual) variations of SRCF are weaker than the annual cycle, with amplitude ranging from 0.01 to 0.05 (from 0.04 to 0.07). For the diurnal cycle, the observations and R2 show larger (smaller) SRCF in the morning (afternoon); but ERA-Interim and R1 show an opposite phase pattern. This suggests that ERA-Interim and R1 do not model the observed phase of the SRCF diurnal cycle correctly. For the inter-annual variations, the observed SRCF reveals an increase of 0.04 from 1999 to 2004 followed by a drop of 0.06 until 2006, and the three reanalyses show similar phase pattern. One notable phenomenon is that the observed SRCF is systematically much larger than the reanalyses at all the temporal scales, suggesting that the reanalyses significantly underestimate the observed SRCF. This fact can also be seen fromFigures 3d, 3e, and 3f, where the relative biases reaches up to −50% (−60%) for the diurnal/interannual (annual) variations.
 Similarly, Figures 4 and 5 show the comparisons of the multiscale mean variations of cloud fraction and cloud albedo, respectively. Variations of the standard deviations from the observations in general follow those of SRCF. As shown in Figure 4, the multiscale mean variations of cloud fraction look remarkably similar to those of SRCF, with a strong annual cycle (amplitude up to 0.22) and relatively weak diurnal/inter-annual variations (amplitude up to 0.06/0.10). The phase patterns of the multiscale mean variations of cloud fraction look similar to those of SRCF, suggesting that large cloud fraction in general corresponds to large SRCF. Similar to SRCF, ERA-Interim and R1 do not capture diurnal-cycle phase pattern of the cloud-fraction observations, and R1 suffers from evident deficiency in modeling the annual-cycle variations. The modeled CF biases are the largest in summer for ERA-Interim, similar to the results from 3-year MERRA and NARR byKennedy et al. . But, this is not the case for R1 and R2. The underestimation of the modeled cloud fraction is also significant at all the temporal scales, ranging from −20% to −40% for the majority. Figure 5reveals a strong annual cycle (amplitude up to 0.22) and slightly weak diurnal/inter-annual variations (amplitude up to 0.16/0.12) for the modeled and observed cloud albedo. The phase patterns of the multiscale mean variations of cloud albedo mainly follow those of SRCF or cloud fraction, except for the annual cycle of the modeled cloud albedo which has the lowest value in winter season, opposite to other annual-cycle patterns. Similar to SRCF and cloud fraction, ERA-Interim and R1 show a problem in modeling the phase pattern of the observed cloud-albedo diurnal cycle. It is noted that the phase pattern of R1 cloud-albedo annual cycle agrees with the observations better than that of SRCF and cloud fraction. Note also that, cloud fraction is only one of the many variables (e.g., cloud thickness and liquid water path) affecting SRCF and cloud albedo. The differences in the phase patterns may imply different cloud types, although more research is needed to ascertain this. For all the models the underestimation of cloud albedo is small from May to September, but increases to −25% to −33% in the cold season.
3.2. Further Discussion on Cloud Fraction and Potential Uncertainty
 The observed cloud fraction shown in Section 3.1represents 15-min averages of 160°FOV hemispheric fractional sky cover from the surface [Long et al., 2006]. Although the hemispheric fractional sky cover is consistent with the surface radiation measurements evaluated in this paper and the hemispheric fractional cover has been shown to agree with the GOES satellite measurements reasonably well [Liu et al., 2011], it is desirable to examine the effect of different FOV angles on cloud fraction. Kassianov et al. , using Monte Carlo simulations of surface-based hemispherical measurements based on four-dimensional cloud fields produced by a large-eddy simulation model, analyzed the differences in hemispheric cloud fraction estimates depending on averaging time, FOV and cloud spatial structure. They found that 15 min is an appropriate averaging time and that hemispheric-view cloud fractions are almost unbiased for FOV ≤ 100°. However, for the same cloud field, hemispheric-view cloud fractions increase as FOV increases, and the difference between the hemispheric-view cloud fractions corresponding to different FOVs increases as cloud fraction decreases.Kassianov et al.  further provided the following linear fit between the 160°FOV hemispheric fractional sky cover, CF(160°), and the 60°FOV value, CF(60°),
Because CF(60°) appears to be more representative of the plane-parallel cloud fraction of the reanalyses, here we explore the correction of the CF(160°) data based on the inverse ofequation (14),
Equation (15) is used as a correction for 0.25 < = CF(160°) < = 0.918. The upper limit is chosen where the correction goes to zero, and for CF(160°) > 0.918 we set CF(60°) = CF(160°). For CF(160°) < 0.25, we reduced the correction linearly to zero, as CF(60°) = (1 − 0.1519/0.25) × CF(160°).
 The black curves in Figures 4 and 5 represent the multiscale mean variations of the cloud fraction and cloud albedo generated by using the hourly cloud fractions CF(60°)instead of CF(160°). As can be seen, the cloud fractions from CF(60°) are systematically lower than those from CF(160°) in a range from 0.02 to 0.06. The corresponding cloud albedo from CF(60°) are systematically higher than those derived from CF(160°) in a range from 0.03 to 0.07.
 From Figure 4, we see that using as a reference the derived cloud fraction, CF(60°), would reduce but not remove the low bias of the reanalyses. From Figure 5 we see that using as a reference cloud albedo derived from CF(60°) would increase the low bias of the reanalyses. However, there are other measurement comparison issues here which need further study. For example, correction (15) is based on Kassianov et al. for cloud fraction. However, our observed SRCF is also derived from hemispheric radiometers which see the cloud field over all viewing angles and solar zenith angles. This is not the same geometry as the plane-parallel assumption of the reanalyses, averaged over a horizontal grid-cell. The geometry-induced difference in SRCF may make the difference in cloud albedo spuriously larger. An ultimate solution requires considering SRCF, cloud fraction and cloud albedo simultaneously and consistently; however, it is beyond the scope of this paper. Nevertheless, the comparison of the results from CF(60°) and CF(160°) gives an estimate of possible uncertainties resulting from different FOV angles.
4. Model Biases and Their Links
 The previous section shows that, except for warm season cloud albedo, the three reanalyses in general significantly underestimate all the three cloud properties. To further explore the model-error propagation path, this section examines the range of the model biases in the cloud properties, their mutual relationships, and the relationship of the cloud properties to 2-m temperature and relative/specific humidity. The analysis has been performed for both daytime-mean and monthly data; but only the monthly results are presented below. The daytime-mean results are similar except for a larger range of their biases.
4.1. Model Biases in the Cloud Properties and Their Mutual Relationships
Figure 6are the contour plots showing the range and relationship of the monthly model biases in the cloud properties. As can be seen, the model biases of the cloud properties in the reanalyses mainly fall within a centered region, ranging from −0.20 to 0 for SRCF, from −0.25 to 0 for cloud fraction, and from −0.20 to 0.10 for cloud albedo. The points within the centered regions exhibit a positive linear relationship to one another, except for the model biases of ERA-Interim cloud fraction and SRCF. The strong positive correlation is evident in the corresponding scatterplots (Figure 7) as well, where the overlapped relationships from different reanalyses and the strength of the relationships are clearly shown.
 The relationship between the model biases in the cloud properties and the observed (a, b, and c) or modeled (d, e, and f) cloud fraction are shown in Figure 8. It is evident that the model biases in the cloud properties are large when observed cloud fractions are large, except for ERA-interim with small correlation coefficients for cloud fraction biases (−0.25) and cloud albedo biases (−0.18). There is no clear relationship between cloud fraction biases and the modeled cloud fraction. The model biases in SRCF (or cloud albedo) are large when modeled cloud fraction is large, except for R1 with small correlation coefficients.
4.2. Relationship of Cloud Fraction Biases to 2-m Temperature and Humidity
 It is expected that cloud properties are related to 2-m meteorological variables, especially for relative humidity, at least for clouds with roots in the boundary layer. Furthermore, the modeled 2-m air temperature/humidity are strongly impacted by the model used in the reanalysis [Kalnay et al., 1996]. This section examines the relationship between the cloud properties and the temperature/humidity, the multiscale mean variations of the temperature/humidity from all the data sets and the corresponding model biases, and the relationship between the model biases in the relative humidity (or cloud fraction) and the temperature/humidity. Results from concurrent monthly data are presented.
Figure 9shows the cloud properties as a function of temperature and humidity. The cloud properties are strongly related to the relative humidity for all the data sets, especially for the observations and ERA-Interim with correlation coefficients between 0.62 and 0.80. There is some relationship between the cloud properties and the temperature (or specific humidity), but the relationship is not as strong as that with the relative humidity. The ERA-Interim appears to describe the observed relationships better than the two NCEP reanalyses in general. One exception is that cloud albedo decreases with temperature (and specific humidity) in the observations but not in the reanalyses. These relationships are likely related to the processes that couple boundary layer clouds with the land surface, and to the parameterizations used in the models. The parameterization details and their effects on the results warrant further investigation.
Figure 10shows the multiscale mean variations of 2-m temperature from all the data sets and corresponding model biases. In general, the model biases are small, ranging from −2.10°C to 1.65°C, which is comparable to the standard deviation of the observations. ERA-Interim (R1) shows slightly warmer (colder) than the observations at all the temporal scales, except that R1 has warm biases in April/May. Relatively, R2 shows the smallest overall biases, as evident by the annual cycle and interannual variations. The phase patterns of the multiscale mean variations from the reanalyses agree with the observations for the three temporal scales. The diurnal (annual) cycle peaks at 3pm (July) and shows the coldest temperature at 6 am (January).
Figure 11shows the multiscale mean variations of 2-m specific humidity from all the data sets and corresponding model biases. Like the temperature, the model biases in the specific humidity are small (from −0.94 to 1.47 g/kg), comparable to the standard deviations of the observations. The specific humidity in ERA-Interim/R2 (R1) is smaller (larger) than the observations, except for R2 monthly specific humidity in January/August and annually averaged specific humidity in 2007/2008 with an opposite behavior. The multiscale mean specific-humidity variations from all the data sets have similar phase patterns, indicating that the reanalyses are capable of modeling the observed phase patterns.
Figure 12shows the multiscale mean variations of 2-m relative humidity from all the data sets and corresponding model biases. The model biases in relative humidity range from −12.52% to 14.93%, slightly larger than the standard deviations of the observations. It is evident that ERA-Interim (R1) relative humidity is systematically smaller (larger) than the observations at all the temporal scales. The underestimation (overestimation) in ERA-Interim (R1) is especially large, up to −12.52% in early morning (late afternoon) or from late Fall to early Spring. R2 relative humidity matches the observations the best among the three reanalyses, although smaller than the observations at 6 am (−7.69%) or in April and May (−7.01%). The phase pattern from all the reanalyses agrees well with the observations, with a peak in early morning, then decreases until 3pm, and then increases again (note that the 6-hourly R1 and R2 cannot show the detailed hourly pattern). The relative humidity reaches the smallest value in July, and then increases from July to December, followed by a decrease until April, and then an increase from April to May (from April to June for R1 and R2), then a drop to July. A notable phenomenon of the annually averaged relative humidity is the large drop (−12.51%) from 2004 to the hot year 2006, followed by the large increase (20.04%) to 2007, and then a drop about −13.54% to 2008.
 The relationship between the relative humidity biases and the temperature/humidity is shown in Figure 13. As shown in Figures 13a and 13b, the relative humidity biases in R1 and R2 are strongly positively related to the modeled relative humidity, with correlation coefficients greater than 0.66. The relationship between ERA-Interim relative humidity and its biases is much weaker, with correlation coefficient only 0.18. The dependence of relative humidity biases on the observed relative humidity, observed/modeled temperature and specific humidity is also weak, with correlation coefficient smaller than 0.42.Figure 13 further shows that the relative humidity biases have a positive (negative) linear relationship with the specific humidity (temperature) biases, suggesting that an overestimated (underestimated) relative humidity results from combined effects of an overestimated (underestimated) specific humidity and underestimated (overestimated) temperature.
Figure 14 shows the relationship between cloud fraction biases and the temperature/humidity. From Figures 14a, 14b, and 14c, the cloud fraction biases in R1 and R2 increase with observed and modeled relative humidity, except that ERA-Interim cloud fraction biases show no clear relationship with relative humidity.Figures 14d–14ialso shows that the cloud fraction biases in R1 and R2 decrease with observed and modeled temperature/specific humidity, while ERA-Interim has an opposite behavior. There is no clear relationship between cloud fraction biases and the temperature/humidity biases.
5. Evaluation on the Overall Performance of the Reanalyses
 This section evaluates the overall performance of the reanalyses in modeling the cloud properties and the meteorological variables. The evaluation is conducted using concurrent daytime-mean and monthly data.
Figure 15shows the Taylor diagrams of the cloud properties from daytime-mean (a) and monthly (b) data. The radial distance represents the amplitude of the variations, normalized by the amplitude of the observationally based variations. The cosine of azimuthal angle of each point gives the correlation between the reanalyses and the observations. The distance between each point and the reference point “Obs” represents the RMS error, normalized by the amplitude of the observationally based variations. As this distance approaches to zero, the modeled variations approach the observations.
 For daytime-mean cloud properties, ERA-Interim exhibits the best performance in modeling the phase of the cloud properties and the magnitude of the SRCF variations, although it slightly overestimates the magnitude of the cloud-fraction and cloud-albedo variations. R2 exhibits the best performance in modeling the magnitude of the cloud-fraction and cloud-albedo variations, although it significantly underestimates the magnitude of the SRCF variations. R2 shows slightly better phase pattern of the cloud properties than R1. R1 significantly underestimates the magnitude of the cloud properties and also shows the worst phase similarity to the observations. ERA-Interim has the smallest RMS errors of all the cloud properties while R1 (or R2) exhibits the largest RMS errors of SRCF (or cloud fraction and cloud albedo).
 For daytime-mean meteorological variables, ERA-Interim exhibits the best performance in modeling the phase of the temperature/humidity and the magnitude of the temperature variations, although it slightly underestimates the magnitude of the relative/specific humidity variations. R1 (R2) exhibits the best performance in modeling the magnitude of the relative (specific) humidity variations, and both overestimate the magnitude of all the daytime-mean meteorological variables. R1 shows slightly better phase pattern than R2 for specific humidity whereas the opposite is true for temperature and relative humidity.
 For monthly cloud properties, ERA-Interim exhibits the best performance in modeling the phase and magnitude and the smallest RMS error of the cloud properties, even though ERA-Interim significantly underestimates the observed monthly SRCF. R2 significantly overestimates (underestimates) the magnitude of the cloud-albedo (SRCF/cloud fraction) variations, with the largest RMS error of cloud albedo. R1 significantly underestimates all the cloud properties, showing the worst in modeling the phase and magnitude and the largest RMS errors of cloud fraction and SRCF.
 For monthly meteorological variables, ERA-Interim exhibits the best performance in modeling the phase and magnitude and the smallest RMS error of the temperature/humidity variations, except for R2 showing the best in modeling the magnitude of specific humidity variations. R1 (R2) shows slightly better performance in simulating the phase pattern of the specific humidity (temperature and relative humidity). R1 significantly (slightly) overestimates the magnitude of the observed monthly relative humidity (temperature, and specific humidity) variations. R2 significantly (slightly) overestimates the magnitude of the observed monthly relative humidity (temperature), with a slightly small RMS error than R1 for all the meteorological variables.
Figure 16further compares the “Relative Euclidean Distance” D values of the reanalyses in modeling the daytime-mean (Figure 16a) or monthly (Figure 16b) cloud properties and the meteorological variables. As can be seen, ERA-Interim (R1) has the smallest (largest) D values for all the cloud properties and the meteorological variables at both daytime-mean and monthly temporal scales. The only exception is that R2 has the best overall performance in modeling the monthly temperature and specific humidity. This suggests that ERA-Interim (R1) ranks the best (worst) overall performance in modeling the cloud properties and the meteorological variables among the three reanalyses, except that R2 ranks the best in modeling the monthly temperature and specific humidity.
 This study evaluates three major reanalyses (ERA-Interim, NCEP/NCAR Reanalysis I, NCEP/DOE Reanalysis II) in modeling surface relative shortwave cloud forcing (SRCF), cloud fraction, and cloud albedo. The cloud properties at diurnal, annual, and inter-annual temporal scales from all the reanalyses are first evaluated. Then, the model biases are quantified and their links to one another or to 2-m temperature/humidity are examined. The overall performance of the reanalyses in modeling the cloud properties and the 2-m temperature/humidity is evaluated using a combined statistical method (i.e., the technique of Taylor diagrams and a newly developed metric “Relative Euclidean Distance”). Decade-long (1997 to 2008) surface-based continuous ARM measurements over the Southern Great Plains (SGP) Central Facility site are used as a standard for this evaluation.
 Results show that the reanalyses significantly underestimate the cloud properties, with relative biases ranging mainly from −60% to −30% for SRCF, from −40% to −20% for cloud fraction, and from −30% to 10% for cloud albedo. The annual cycle of the models' relative biases is substantial. For SRCF, ERA-Interim has a uniform negative bias throughout the year whereas NCEP/NCAR Reanalysis I and NCEP/DOE Reanalysis II have much reduced biases in summer than in winter/spring. The annual cycles of the cloud-fraction relative biases differ between the three reanalyses. But for the derived cloud albedo, all the models show a small bias from May to September, increasing in the cold season to a large negative bias of −25% to −33%. NCEP/DOE Reanalysis II is the only reanalysis having similar diurnal phase pattern to the observed cloud properties, and NCEP/NCAR Reanalysis I is the only reanalysis showing an opposite phase pattern to the observed annual-cycle cloud fraction or SRCF.
 The model biases of the cloud properties predominantly range from −0.20 to 0 for SRCF, from −0.25 to 0 for cloud fraction, and from −0.20 to 0.10 for cloud albedo. The model biases in general exhibit a positive linear relationship to one another. The model biases in the cloud properties increase with the cloud fraction observations, except for ERA-interim model biases in cloud fraction (or cloud albedo).
 In analyzing the relationship between cloud fraction and 2-m temperature/humidity, we found that the cloud fraction biases in the two NCEP reanalyses increase (decrease) with their relative humidity (temperature and specific humidity), but the cloud fraction biases in ERA-Interim show no (opposite) relationship with relative humidity (temperature and specific humidity). There is no clear relationship between cloud fraction biases and the temperature (or specific humidity) biases. We also found that the reanalyses have small (large) biases in modeling 2-m temperature/specific humidity (relative humidity), and the relative humidity biases have a positive (negative) linear relationship with the specific humidity (temperature) biases. These findings suggest that the model biases in the cloud properties are closely linked to the near-surface temperature/humidity fields, likely through the parameterizations used to couple boundary layer clouds with the land surface.
 A combined analysis of the Taylor diagram and the relative Euclidean distance, based on first- and second- order statistics, indicates that ERA-Interim and NCEP/NCAR Reanalysis I have the best and worst overall performance in modeling the cloud properties and the meteorological variables, respectively, except that R2 ranks the best in modeling the monthly temperature and specific humidity. The findings from this study highlight the significant underestimation of the cloud properties over the ARM SGP site in the three major NWP reanalyses, and the model biases of the cloud properties are closely linked to one another, and also linked to 2-m temperature/humidity. This suggests that caution must be taken when using the reanalyses as a standard (e.g., a substitute of observations) for evaluating climate models, especially considering the large biases in the forecasted 2-m relative humidity. Furthermore, the underestimation of the cloud properties is a crucial issue in climate modeling, since it could substantially influence the estimation of the Earth's surface and atmospheric energy budget, hydrological cycle and general circulation.
 Several points are noteworthy. First, we have shown that there is a potential uncertainty in comparing the ARM hemispheric observations with the spatially averaged model data from the reanalyses which use a plane-parallel assumption. A systematic investigation to quantify the uncertainty of cloud fraction observations and to develop new approaches to reduce the observational uncertainty [e.g.,Min et al., 2008; Hogan et al., 2009] is needed to further assess observational effects discussed briefly in Section 3.2. Second, the ARM measurements and three reanalyses have different spatial and temporal resolutions, which could partially impact the analyzed results. Although aggregating large number of data points as done in this study tends to minimize the resolution effect, more work is needed to pin down the causes of the model biases shown in this study. Third, as pointed out in Liu et al. , the current observational data of cloud fraction and cloud albedo are derived separately from surface-based radiation measurements. Such a separate treatment could lead to potentially compensating errors, especially when different FOVs are involved (Section 3.2). A retrieval approach to simultaneously obtain cloud fraction and cloud albedo is desirable. Fourth, in this study we examined the relationships between the model biases in the cloud properties and the 2-m temperature/humidity. Detailed vertical profiles of temperature, humidity, vertical velocity and wind are crucial to examining the cause of the large model biases found in this study and further investigation along this line is under way. Finally, it is expected that the identified cloud-property biases in the reanalyses, their connections to near-surface meteorological variables, and the disparities between the three reanalyses arise largely from inadequate and different parameterizations of related processes/quantities in the models used to generate the corresponding reanalyses. Further relating the identified biases to specific parameterization deficiencies and improving the parameterizations call for in-depth investigation of model parameterizations, which are the challenges to the climate and NWP community in general, and to the reanalysis community in particular.
 This work is supported by the Climate System Modeling (ESM) via the FASTER project (www.bnl.gov/esm). The ARM VAP data are obtained from the ARM website (http://www.arm.gov). AKB is partially supported by NSF grant AGS-0529797. We thank Charles N. Long for helpful discussions on the ARM SGP observations, Martin Kohler for his support of ERA-Interim data set, Karl E. Taylor for useful discussions regarding the technique of Taylor diagrams, and Wesley Ebisuzaki and Don Hooper for support on the NCEP reanalyses. We also thank many colleagues at Brookhaven National Laboratory for useful discussions. The comments from the anonymous reviewers help improve the manuscript.