Evaluation of multireanalysis products with in situ observations over the Tibetan Plateau



[1] As the highest plateau in the world, the Tibetan Plateau (TP) strongly affects regional weather and climate as well as global atmospheric circulations. Here six reanalysis products (i.e., MERRA, NCEP/NCAR-1, CFSR, ERA-40, ERA-Interim, and GLDAS) are evaluated using in situ measurements at 63 weather stations over the TP from the Chinese Meteorological Administration (CMA) for 1992–2001 and at nine stations from field campaigns (CAMP/Tibet) for 2002–2004. The measurement variables include daily and monthly precipitation and air temperature at all CMA and CAMP/Tibet stations as well as radiation (downward and upward shortwave and longwave), wind speed, humidity, and surface pressure at CAMP stations. Four statistical quantities (correlation coefficient, ratio of standard deviations, standard deviation of differences, and bias) are computed, and a ranking approach is also utilized to quantify the relative performance of reanalyses with respect to each variable and each statistical quantity. Compared with measurements at the 63 CMA stations, ERA-Interim has the best overall performance in both daily and monthly air temperatures, while MERRA has a high correlation with observations. GLDAS has the best overall performance in both daily and monthly precipitation because it is primarily based on the merged precipitation product from surface measurements and satellite remote sensing, while ERA-40 and MERRA have the highest correlation coefficients for daily and monthly precipitation, respectively. Compared with measurements at the nine CAMP stations, CFSR shows the best overall performance, followed by GLDAS, although the best ranking scores are different for different variables. It is also found that NCEP/NCAR-1 reanalysis shows the worst overall performance compared with both CMA and CAMP data. Since no reanalysis product is superior to others in all variables at both daily and monthly time scales, various reanalysis products should be combined for the study of weather and climate over the TP.

1. Introduction

[2] The Tibetan Plateau (TP) is the highest plateau in the world, and its average elevation is above 4000 m with an area of about 2.5 million square kilometers. Because of its high elevation and complex terrain, the land-atmosphere interaction over the TP directly influences the energy and water budget in the local middle troposphere, which, through large-scale atmospheric circulation, also affects weather and climate over other regions. It is generally accepted that the TP is the heat source in summer [e.g., Zhao and Chen, 2001], and it plays an important role in the Asian monsoon establishment and maintenance [e.g., Ye and Gao, 1979; Yanai et al., 1992; Hsu and Liu, 2003]. The snow cover changes over the TP have a strong influence on summer rainfall in the Asian monsoon areas [Wu and Qian, 2003; Zhang et al., 2004; Zhao et al., 2007; Zuo et al., 2011]. The TP has experienced significant warming since the mid-1950s, especially in wintertime, and the warming trend exceeded other regions at the same latitude zone [Liu and Chen, 2000]. The TP is also projected to continue warming in the future [Liu et al., 2009]. You et al. [2010] confirmed the warming trends over the TP after analyzing homogenized air temperature data, but they also argued that there is no evidence for the elevation dependency of the warming magnitude in terms of both mean temperature and climate extremes [You et al., 2008].

[3] Because of complex topography, severe weather, and harsh environmental condition over the TP, it is very difficult to obtain in situ measured meteorological variables, especially for the long term over larger areas [e.g., Tanaka et al., 2003; Y. Ma et al., 2008]. Existing surface stations are mostly located at relatively low elevation areas and are usually sparse and unevenly distributed. For example, the Chinese Meteorological Administration (CMA) has established more than 750 stations over China, among which less than 80 stations are located over the TP areas, with most stations spreading over the eastern to the central part of the TP. Therefore, studies on weather and climate over the TP have to combine these in situ measurements with other data sources, such as the atmosphere reanalysis products that merge model outputs, remote sensing observations, and in situ measurements through a data assimilation system to produce retrospective estimation of meteorological variables [e.g., Kalnay et al., 1996].

[4] The most widely used reanalysis products have been developed at the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR, referred to as NRA-1) [Kalnay et al., 1996], and at the European Center for Medium-Range Weather Forecasts (ECMWF) with both ERA-40 [Uppala et al., 2005] and ERA-interim (ERA-Int) [Simmons et al., 2006]. Recently, the NASA/GSFC Global Modeling and Assimilation Office (GMAO) developed a modern-era reanalysis for 1979-ongoing (MERRA) [Rienecker et al., 2011], and NCEP released the new product from the Climate Forecast System Reanalysis (CFSR) [Saha et al., 2010]. The Global Land Data Assimilation Systems (GLDAS) [Rodell et al., 2004] utilized common reanalysis and observation-based atmospheric data to drive multiple land surface models in order to produce high-quality land surface products, and their data are available after 1979. These reanalyses have been applied in many different ways, for example, to construct land surface forcing data [e.g., Qian et al., 2006; Sheffield et al., 2006], to detect the climate trends [e.g., Trenberth and Guillemot, 1998], and to investigate the water and energy cycles between land and atmosphere [e.g., Roads and Betts, 2000; Maurer et al., 2001]. However, reanalysis products could contain uncertainties from various sources that are inherent in the assimilation processes [Smith et al., 2001], and these biases, for instance, have significant influence on the off-line land model simulations as driven by these reanalysis data [Berg et al., 2003; Wang and Zeng, 2011]. Therefore, it is necessary to evaluate various reanalysis products with available in situ measurements before they are applied to climate studies over the TP.

[5] Intercomparisons and evaluations of reanalysis products with in situ observations have been performed over individual sites, specific regions, and global land [e.g., Trenberth and Guillemot, 1998; Betts et al., 2005; Frauenfeld et al., 2005; Bosilovich et al., 2008, 2011; Wang et al., 2011; Decker et al., 2012; Rienecker et al., 2011]. ERA-Int shows a significant improvement compared with ERA-40 in the representation of global hydrological cycle and river basin hydrometeorology [Betts et al., 2009]. Wang et al. [2011] found that CFSR improved the precipitation distribution over various regions in contrast to NRA-1 and ERA-40, but overestimated downward solar radiation flux and latent heat flux over some regions. Bosilovich et al. [2008] compared annual precipitation from multireanalysis products (i.e., ERA-40, JRA-25, NRA-1, NRA-DOE) with merged satellite observations and found that the biases of the spatial pattern vary with regions in different products. Rienecker et al. [2011] found little differences between new reanalyses (i.e., ERA-Int, CFSR, and MERRA) in climate variability, although substantial differences appear in precipitation and surface fluxes. Decker et al. [2012] evaluated multireanalysis products with in situ measurements at different time scales over 33 flux tower sites across North America and found that ERA-Int generally performed best in the representation of six hourly air temperatures, and ERA-40 has the lowest bias of six hourly sensible heat flux and precipitation. Mao et al. [2010] assessed reanalysis daily extreme temperatures with homogenized observations in China and also found that the ERA-Int has the best skill scores compared with ERA-40, NRA-1, and the Japanese reanalysis (JRA-25). Over the TP, Frauenfeld et al. [2005] evaluated air temperature in ERA-40 with the station observations and found that ERA-40 underestimated the annual air temperature by about 7°C, although the correlation of both temperatures was high at the interannual scale. They also argued that a long-term climate warming trend found in the station observations was partially due to the sparse observations and land use change, but the latter was not considered in ERA-40.

[6] Previous studies over the TP largely focused on air temperature and precipitation, but just a few of them considered the radiation and other meteorological variables [Frauenfeld et al., 2005; L. Ma et al., 2008, 2009; Mao et al., 2010], primarily because of the unavailable routine measurements of these variables. Besides precipitation and air temperature, other meteorological variables are also very important factors for land-atmosphere interactions. For instance, wind speed strongly affects the energy, water, and momentum transfers between land and atmosphere, and surface radiation flux is the driving force of the surface energy balance [Yang et al., 2011].

[7] The purpose of this study is to assess the near-surface meteorological data from the five atmospheric reanalysis products mentioned above and GLDAS (referred to as six reanalyses hereafter) using available in situ measurements over the TP. The near-surface meteorological variables (including air temperature, precipitation, wind, humidity, and radiation) are evaluated at daily and monthly time scales. These results will also provide a baseline for using these reanalysis data for the weather and climate study over the TP in the future. With a focus on the sparse high-elevation data over the TP region, this study also represents the third part of our series of papers on evaluating reanalysis surface variables, with the first two papers focusing on ocean surface processes [Brunke et al., 2011] and land surface processes over data-rich North America [Decker et al., 2012], respectively.

2. Data Descriptions and Analysis Methods

2.1. Data Descriptions

[8] This study uses daily air temperature and precipitation data from 63 CMA weather stations over the TP and the relatively comprehensive near-surface meteorological data at nine stations during the Coordinated Enhanced Observing Period (CEOP)-Asia-Australia Monsoon Project (CAMP/Tibet, hereafter referred to as CAMP). The general information on observations is provided in Table 1. The long-term raw station data are usually inhomogeneous for various reasons, such as the change of station locations and the replacement of instruments [Li et al., 2009]. The daily air temperature data used in this study have been homogenized [Li and Yan, 2009].

Table 1. Summary of in Situ Observations and Reanalysis Data Used in This Studya
Data SourceVariable NamePeriodsTemporal ResolutionSpatial Resolution
  • a

    Prec, precipitation; Ta, air temperature, SWdn/LWdn, surface downward shortwave/longwave radiation flux; SWup/LWup, surface upward shortwave/longwave radiation flux; Ps, surface pressure; Qa, specific humidity; and Wind, surface wind speed. Note that while an hourly CFSR output is available, only the 6 hourly data (with a coarse horizontal resolution) are used here, because we focus on daily and monthly variables.

63 CMA stationsPrec, TaJan 1992–Dec 2001DailyStation
Nine CAMP stationsPrec, Ta, SWdn, SWup, LWdn, LWup, Ps, Qa, WindJan 2002–Dec 2004HourlyStation
MERRASame as aboveSame as aboveHourly0.5° × 0.67°
NRA-1Same as aboveSame as above6 hourly∼1.875°
ERA-40Same as aboveOnly 1992–20016 hourly∼1.25°
ERA-IntSame as aboveSame as above3 hourly∼0.7°
GLDASSame as aboveSame as above3 hourly1° × 1°
CFSRSame as aboveSame as above6 hourly∼1.875°

[9] Figure 1 shows the locations of the 63 CMA stations and nine CAMP stations over the TP. The stations are very sparse and are mostly located over the eastern TP where the elevations are generally lower than over the western part. The elevations of all nine CAMP stations are higher than 4000 m, and the data are available from October 2002 to December 2004 with large data gaps over some stations. For example, over the Amdo site (32.24°N, 91.62°E), the data are available only from October 2002 to September 2003. We compared reanalysis products only with available observational data. The details of observational information in CAMP/Tibet were described by Ma et al. [2005]. More recently, some new observational systems have been established or initiated over the TP [e.g., Y. Ma et al., 2008; Xu et al., 2008], which would provide more observational data and advance our understanding of surface climate over the TP, especially over the eastern part.

Figure 1.

Topography (in meters) and locations of the 63 CMA stations (dots) and nine CAMP stations (triangles) over the Tibetan Plateau.

[10] Six reanalysis products (i.e., MERRA, NRA-1, ERA-40, ERA-Int, CFSR, and GLDAS) are also used in this study. They are different in many aspects, such as the numerical schemes and physical parameterizations in their numerical models, qualities and quantities of observational data used in the assimilation processes, and the assimilation schemes [e.g., Decker et al., 2012]. Because surface observations are assimilated in some of the reanalyses, the station observations and the reanalyses might not be totally independent. For example, surface air temperature and humidity are assimilated in ERA-40 and ERA-Int to adjust soil moisture, and CFSR assimilates hydrological quantities from a parallel land surface model forced by the merged precipitation products from surface measurements and satellite remote sensing [Wang et al., 2011]. MERRA does not assimilate precipitation over land [Rienecker et al., 2011], while NRA-1 does not assimilate any surface observations [Kalnay et al., 1996]. Although it is difficult to determine what percentages of observed precipitation (or air temperature) data at the 63 stations have actually been assimilated in CFSR (ERA-40, or ERA-Int), observations at the nine CAMP stations from field campaigns are unlikely to be assimilated in any of the reanalysis systems. The near-surface meteorological data in GLDAS are derived from the ERA-40 (from 1979 to 1993), NRA-1 (from 1994 to 1999), and the NCEP global analysis (from 2000) in combination with the merged precipitation product from surface measurements and satellite remote sensing and with observation-based downward shortwave and longwave radiation fluxes. These data are then used to drive four land models [Rodell et al., 2004]. In this study, the upward shortwave and longwave radiation fluxes are computed from the community land model [Oleson et al., 2004]. The details of each reanalysis products can be easily found in the corresponding references and are not repeated here.

[11] Considering the common period of data availability from reanalyses and in situ observations, comparisons of the CMA and CAMP station data with reanalysis products were conducted for 1992–2001 and 2002–2004, respectively. Besides air temperature and precipitation, other near-surface meteorological variables (i.e., humidity, wind, surface pressure, and radiation) were also evaluated based on the CAMP data. The comparisons were focused on daily and monthly mean variables. Because the station distribution (Figure 1) is irregular, comparisons between grid cell products with station observations involve interpolations [L. Ma et al., 2008, 2009], which inevitably introduces new errors [Zhao et al., 2008]. To avoid these interpolation errors, data from each station are compared with those from a reanalysis grid cell covering this station.

[12] The elevation differences between each station and the corresponding grid cell are very large because of the complex topography of the TP (e.g., the stations are mostly located over relatively flat areas or at mountain valley), which could introduce the biases of surface products in the reanalyses [e.g., Frauenfeld et al., 2005; Zhao et al., 2007]. Figure 2 shows that most of the reanalysis grid cells have higher elevations than the corresponding surface stations, and their differences are up to 1700 m over some stations. With a focus on the basic evaluations of reanalyses, this study does not consider elevation corrections of the reanalysis products. If the reanalysis products are used for other purposes such as atmospheric forcing data for land models, however, the elevation corrections are necessary [e.g., Sheffield et al., 2006].

Figure 2.

Surface elevation differences between reanalysis grid cells and station locations.

2.2. Data Analysis Methods

[13] To quantify the differences between reanalysis products and observations, four statistical quantities between each reanalysis product and observations are computed at each station, including the correlation coefficients (ρ), ratio of standard deviations (σr/σobs), standard deviation of the differences (σd), and the mean bias (BIAS) (i.e., the average difference between reanalysis products and observations). Presenting all the results for the 63 CMA stations is not easy, and we use histograms and Taylor diagrams [Taylor, 2001] to facilitate comparisons in concise ways.

[14] At a given station, the above statistical quantities are quite different for different reanalyses. In order to quantify and intercompare the performances of reanalysis products in terms of each statistical quantity for all stations, a ranking scheme is utilized. Brunke et al. [2003] developed a ranking scheme to score the multibulk aerodynamic algorithms in computing ocean surface turbulent fluxes. Decker et al. [2012] applied this approach to rank the bias and standard deviation of errors between reanalysis products and flux tower measurements. In this study, we extend this ranking approach to all four statistical quantities computed from each surface meteorological variable mentioned above. At each station and for each statistical quantity of a variable (e.g., precipitation), reanalysis products are ranked from 1 to 6, with 1 given to the reanalysis with the lowest value of σd or BIAS in magnitude (or the highest correlation coefficient) and 6 given to the one with the largest value of σd or BIAS in magnitude (or the lowest correlation coefficient). Note that the ranking for the CAMP period is only from 1 to 5 because ERA-40 data are unavailable during this period. For σr/σobs, 1 is given to the reanalysis with the ratio closest to 1 and 6 (for CMA stations) or 5 (for CAMP stations) is given to the reanalysis with the ratio farthest away from 1. We then average all ranking scores from all stations. The lowest and highest values represent the closest and worst relationships between reanalysis products and observations, respectively.

3. Comparison of Station Observations With Reanalyses

3.1. Precipitation

[15] Figure 3 shows the distribution of statistical quantities computed from daily reanalyses and observations at the 63 CMA stations. The most frequent correlation is between 0.3 and 0.5 for most products. MERRA has 12 stations with correlations larger than 0.5, and ERA-40 has more than 40 stations with correlations greater than 0.4. The σr/σobs values at most stations are within 0.5–2 (particularly in GLDAS and MERRA). The most frequent σd values are between 2 and 6 mm/day. The most frequent BIAS values are between −1 and 0 mm/day in GLDAS and between 0 and 1 mm/day in others. Based on the number of stations with relatively small σd and BIAS, GLDAS and MERRA perform better than others. For instance, in GLDAS, BIAS is within −1 to 1 mm/day at 62 stations and σd is within 0 to 4 mm/day at 45 stations. Using the ranking approach, it can be seen clearly in Table 2a that daily precipitation in GLDAS is generally closer to observations than in all other products in terms of BIAS (2.03) and σd (2.37), MERRA is best in σr/σobs (2.90), and ERA-40 is best in correlation coefficients (2.62). NRA-1 has the worst score in σd (4.52), while CFSR has the worst score in correlation (5.17). ERA-Int performs worst in both σr/σobs (3.92) and BIAS (4.29). The low BIAS of GLDAS daily precipitation is not surprising, as GLDAS combines reanalysis precipitation with surface and satellite remote sensing precipitation observations [Rodell et al., 2004].

Figure 3.

Histograms of daily precipitation statistics computed from six reanalysis products and observations over the 63 CMA stations for 1992–2001. The correlation coefficient (ρ), ratio of standard derivations (σr/σobs), standard deviation of differences (σd) (mm/day), and mean bias (BIAS) (mm/day) are shown in the x axis, while the number of stations for each bin is shown in the y axis.

[16] Figure 4 shows that monthly precipitation results from all reanalyses are overall in better agreement with in situ observations compared with daily precipitation results. For instance, the correlation coefficient for more than half of the stations in all six reanalyses is greater than 0.7 for monthly precipitation. Similar to the ranking scores using daily precipitation, Table 2a shows that GLDAS has the best scores in terms of BIAS (2.03) and σr/σobs (2.22) for monthly precipitation, while MERRA performs best in correlation (1.89) and σd (2.22). NRA-1 has the worst scores in both σr/σobs (4.29) and σd (4.33), while ERA-Int has the worst score in BIAS (4.27), similar to the daily precipitation. Overall, the performances of each reanalysis over the TP depend on the specific statistical quantity analyzed, and this is consistent with the findings of Bosilovich et al. [2008] that the strength and weakness of each reanalysis varies with regions.

Figure 4.

Same as Figure 3, but for monthly precipitation, with σd and BIAS in mm/mon.

Table 2a. The Average (Across 63 CMA Stations) Ranking for Each of the Four Statistical Quantities (Correlation Coefficient, Ratio of Standard Deviations, Standard Deviation of Differences, and Mean Bias) Based on (Daily and Monthly) Precipitationa
  • a

    Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.


[17] Figure 5 shows the Taylor diagrams [Taylor, 2001] derived from the correlation coefficients and standard deviations of precipitation and air temperature averaged across the 63 CMA stations, providing another way to compare the reanalysis results with observations. The distance of a point to the point (1.00, 1.00) in the Taylor diagram indicates the relative skill of the reanalysis. For daily precipitation, all reanalyses have correlation coefficients of 0.3–0.4. The standard deviations in most reanalyses (except MERRA) are higher than the observational values. MERRA and ERA-40 have slightly higher correlations (near 0.4) than others. For monthly precipitation, the points shown in the Taylor diagram are very scattered. Correlation coefficients are greater than 0.8 in MERRA, ERA-Int, and GLDAS and are lower in others. The standard deviations of MERRA and NRA-1 are almost identical to observations, but they are much more scattered in others.

Figure 5.

Taylor diagrams of (a) daily and (b) monthly precipitation (circles) and air temperatures (stars) for 1992–2001. The correlations and ratios of standard deviations among six reanalysis products and in situ observations are first computed at each station, and then averaged across 63 stations.

3.2. Temperature

[18] Figures 6 and 7 show histograms of the statistical quantities computed from daily and monthly air temperatures, respectively. For daily temperature, the correlation is greater than 0.9 at most stations and is less than 0.7 at two stations in GLDAS only (Figure 6). From the Taylor diagram (Figure 5), the standard deviations of daily air temperature from MERRA, NRA-1, ERA-Int, and GLDAS are close to unity and are below unity in the remaining two products. Figure 5 also shows that the correlation coefficients of daily air temperature are generally much larger than those from daily precipitation, while the ratio of standard deviations is closer to unity than that of daily precipitation. This indicates that daily air temperature in reanalyses is better estimated than daily precipitation in general, which will be further discussed later. All reanalyses tend to underestimate the daily air temperature, with BIAS varying from −14°C to 6°C. For example, BIAS is below −12°C at 5 stations in NRA-1. The cold bias in daily air temperature is mainly attributed to the higher surface elevation in the reanalyses (Figure 2) [also see Frauenfeld et al., 2005]. The daily air temperature of reanalyses has greater variations compared with observations, as indicated by σr/σobs greater than 1 at most stations.

Figure 6.

Same as Figure 3 but for daily air temperature, with σd and BIAS in °C.

Figure 7.

Same as Figure 3 but for monthly air temperatures, with σd and BIAS in °C.

[19] For monthly air temperature, all reanalyses over all stations have the correlation coefficient greater than 0.8. In particular, the correlation is greater than 0.97 at nearly all stations in ERA-40, ERA-Int, and MERRA (Figure 7). The performances of GLDAS are not as good as those of the others, but the correlation is still above 0.95 at more than half of the stations. The distribution of the σr/σobs and σd (Figure 7) is similar to that obtained using the daily time series (Figure 6). NRA-1 has more stations with relatively large BIAS values (in magnitude) than other reanalyses. For instance, NRA-1 underestimates monthly air temperature by more than 9°C at 17 stations, but this occurs at 6 stations or fewer in other reanalyses (including at 2 and 4 stations only in ERA-40 and ERA-Int, respectively). The cold biases of reanalysis products (over the TP) also appear over other regions in China [L. Ma et al., 2008].

[20] The relative performances of reanalyses in daily and monthly air temperatures are provided in Table 2b. ERA-Int has the best performance in two of the four statistical quantities,based on both daily and monthly air temperature, while MERRA has the best performance in correlation. NRA-1 has the worst performance in σr/σobs and BIAS based on daily air temperature and in three of the four quantities using monthly air temperature, partly because no surface data were assimilated. As mentioned in section 2, the GLDAS air temperature data are from a combination of ERA-40 (1992–1993), NRA-1 (1994–1999), and NCEP global analysis (2000–2001). Therefore, the overall performance of air temperature in GLDAS is also poor, with poorest performances in correlation and σd.

Table 2b. The Average (Across 63 CMA Stations) Ranking for Each of the Four Statistical Quantities (Correlation Coefficient, Ratio of Standard Deviations, Standard Deviation of Differences, and Mean Bias) Based on (Daily and Monthly) Air Temperaturea
  • a

    Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.


3.3. Other Meteorological Variables

[21] As mentioned in the introduction, the severe environmental and geographical conditions restrict the number of stations available and the variables measured over the TP. Besides precipitation and air temperature, CAMP also provides observational data of surface radiation (SWdn, SWup, LWdn, and LWup), wind speed (Wind), surface pressure (Ps), and humidity (Qa) at nine stations (Figure 1 and Table 1) for the evaluation of reanalysis products.

[22] Figure 8, as an example, shows the monthly time series of surface meteorological variables from both station observations and reanalysis products at the MS3478 station (31.9°N, 91.7°E, Elev. 4620 m). While all reanalysis products capture the seasonal variation relatively well, the actual values differ significantly among reanalyses and between reanalyses and surface observations for most of the variables. For example, precipitation in June 2003 (ninth month in Figure 8) in ERA-Int doubles the amount in CFSR, and both are larger than the in situ observational value (Figure 8a). Among all reanalyses, NRA-1 shows the least consistence with the observations in almost all variables. For instance, NRA-1 significantly overestimates downward shortwave radiation (Figure 8c), partly because it did not assimilate atmospheric aerosol data. It also overestimates upward shortwave radiation (Figure 8e) because of a higher surface albedo.

Figure 8.

Comparisons of monthly meteorological variables among five reanalyses and observations at the MS3478 site (31.9°N, 91.7°E, Elev. 4620 m) from October 2002 to December 2004. The variables include (a) precipitation (mm/month), (b) air temperature (°C), (c) downward shortwave radiation (W m−2), (d) downward longwave radiation (W m−2), (e) upward shortwave radiation (W m−2), (f) upward longwave radiation (W m−2), (g) surface pressure (mb), (h) specific humidity (g(kg)−1), and (i) wind speed (ms−1).

[23] All reanalyses capture well the variations of wind (Figure 8i), air temperature (except GLDAS and NRA-1) (Figure 8b), and humidity (except GLDAS) (Figure 8h). They largely overestimate upward longwave radiation (Figure 8f) over this site and also over other CAMP sites (figures not shown). Over this site, the surface elevation biases among five reanalyses and station locations vary from 172 m (ERA-Int) to 1443 m (NRA-1). These elevation biases largely account for the cold biases (Figure 8b) and the underestimation of surface pressure (Figure 8g) in most reanalyses. The relative performances of reanalyses are compared by computing the ranking scores, as discussed in section 3. As mentioned in section 2, CAMP stations contain large missing data. Daily statistical quantities are computed for a station with data available for at least 100 days, while monthly values are computed for a station with data available for at least 12 months during the 27 month period. Using these criteria, fewer than nine stations are available for some variables; for instance, only 6 stations meet the criteria for downward longwave radiation.

[24] For the comparison of correlation coefficients, CFSR has the best scores for four of the seven variables, while GLDAS is the best for SWdn and LWup (Table 3a). In particular, the average score for GLDAS is 1.0 for SWdn, meaning that GLDAS has the highest correlation at every station among five reanalyses. NRA-1 is the worst for SWdn and Qa, and ERA-Int performs worst in LWup, Ps, and Wind (Table 3a).

Table 3a. The Average (Across Nine CAMP Stations) Ranking for Daily Surface Meteorological Variables: Coefficient (ρ)a
  • a

    See Table 1. The ranking is based on correlation coefficient. Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.


[25] For the comparison of σr/σobs, CFSR also performs best with the lowest scores for three of the seven variables and without the highest (i.e., worst) scores for any of the variables (Tables 3b and 3c). Each of the other four reanalyses has the lowest score for one of the variables but also has the highest scores for one (MERRA, ERA-Int, and GLDAS) or four variables (NRA-1).

Table 3b. The Average (Across Nine CAMP Stations) Ranking for Daily Surface Meteorological Variables: Ratio of Standard Deviations (σr/σobs)a
  • a

    See Table 1. The ranking is based on ratio of standard deviations. Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.

Table 3c. The Average (Across Nine CAMP Stations) Ranking for Daily Surface Meteorological Variables: Standard Deviation of Differences (σd)a
  • a

    See Table 1. The ranking is based on standard deviation of differences. Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.


[26] For the comparison of σd, ERA-Int has the worst score for three variables, while CFSR has the best score for three variables. The average score of 1.0 for GLDAS in SWdn and MERRA in SWup indicates that they are the best for all stations among five reanalyses in these specific quantities.

[27] For the comparison of BIAS, NRA-1 is the worst for six variables except Wind (Table 3d). CFSR again is the best for five of the seven variables, while GLDAS has the best scores in SWdn and LWdn (Table 3d). The relative ranking of reanalyses computed using monthly meteorological variables is very similar to the above results obtained using daily variables (table not shown).

Table 3d. The Average (Across Nine CAMP Stations) Ranking for Daily Surface Meteorological Variables: Mean biases (BIAS)a
  • a

    See Table 1. The ranking is based on mean bias. Both the lowest (bold) value (i.e., best performance) and highest (italicized) value (i.e., worst performance) in each column are highlighted.


4. Discussion and Conclusions

[28] Six reanalysis products (i.e., MERRA, NRA-1, ERA-40, ERA-Int, CFSR, and GLDAS) were evaluated with in situ measured surface meteorological variables over the Tibetan Plateau (TP). The evaluations were performed over 63 CMA stations for precipitation and temperature and nine CAMP stations for seven additional variables (upward and downward solar and longwave radiation fluxes, surface pressure, specific humidity, and wind speed). Stations are mostly located over the eastern TP, and daily and monthly observations at each station were compared with reanalyses at the grid cell covering the station(s). Four statistical quantities (i.e., correlation coefficient, ratio of standard deviation, standard deviation of differences, and bias) were computed and intercompared among different reanalyses. A ranking approach was also applied to quantify the relative performance of reanalysis products.

[29] It is found that no single reanalysis is superior to others for all variables at both daily and monthly time scales. Reanalysis products with the best performances are different with different variables. Compared with the CMA data, ERA-Int has the best representation of air temperature, while GLDAS precipitation, which is primarily based on the merged precipitation product from surface measurements and satellite remote sensing, is closest to observations. Compared with the CAMP data, CFSR is overall superior to others. NRA-1 is the worst compared with both CMA and CAMP data, consistent with previous studies [e.g., Zhao et al., 2008; L. Ma et al., 2008, 2009; Mao et al., 2010]. Partly because of the positive elevation biases, all reanalysis products underestimate temperature and surface pressure. The overall agreement between reanalyses and surface observations is better in air temperature than in precipitation, largely because of larger errors in both modeled precipitation forecasts and in situ measurements. For example, the wind-induced precipitation undercatch is sometimes as large as 20% [Yang et al., 2005], and the error in precipitation simulations is always one of the biggest issues in global modeling. Therefore, when the reanalyses are used to evaluate climate model outputs, much attention needs to be paid to the selection of the appropriate reanalysis products.

[30] Because the seasonal cycle of precipitation and air temperature is much stronger than their interannual variability, the results here primarily reflect the fidelity of the reanalysis seasonal cycle. To focus on the interannual variability, observations for a longer period are needed, and statistical quantities should be computed after removing the mean seasonal cycle.

[31] There are several possible reasons for the differences between reanalyses and between reanalysis products and station measurements, including the number and type of observations assimilated in each reanalysis, as discussed in section 2. Model physical parameterizations are also responsible for part of the differences. We can take the land surface model (LSM) as an example. ECMWF applied a four-layer LSM [Uppala et al., 2005; Simmons et al., 2006], while NRA-1 used a two-layer LSM. CFSR updated NRA-1 and used a four-layer LSM (Noah) [Saha et al., 2010]. MERRA coupled the Catchment-based model into its forecast model [Rienecker et al., 2011]. These LSMs are different in the partitioning of incoming precipitation into runoff, evaporation, and soil storage, as well as in the partitioning of net radiation into sensible, latent, and ground heat fluxes, which, in turn, feed back to the atmosphere.

[32] The scale mismatch is another reason for the differences between reanalyses and observations. For the reanalysis products with coarse spatial resolutions (e.g., NRA-1), a grid cell may cover several stations. As an example, results in the three MERRA grid cells centered at three stations at (39°N, 100.67°E), (39°N, 101.33°E), and (38°N, 102.67°E) would be compared with the observations at three different stations, (38.93°N, 100.43°E), (38.8°N, 101.08°E), and (38.23°N, 101.97°E), respectively. However, observations in all three stations would be averaged for comparison with the results in the single NRA-1 grid cell centered at (39.047°N, 101.25°E). The monthly precipitation at the three MERRA grid cells differs significantly and is also quite different from that at the NRA-1 grid cell. Monthly air temperatures at the three MERRA grid cells are close to each other, but are about 5°C higher than those at the NRA-1 grid cell (figure not shown).

[33] The quantitative results in this study based on the comparison of reanalysis grid cell values with point measurements may be also affected by the measurement uncertainty and representativeness. For example, the instrumental error or malfunction would introduce errors in the measurement and data collection. Tanaka et al. [2003] found that the lower-energy closure over the eastern TP is partially due to the degraded performance of the infrared hygrometer for high-frequency humidity measurements. Ma et al. [2005] also demonstrated that the energy imbalance there might be partially due to measurement errors. The data representativeness is also a serious issue over the TP because of the strong horizontal heterogeneity in elevation and land cover. In particular, topography strongly affects surface meteorological variables, such as air temperature and precipitation [e.g., Berg et al., 2003; Zhao et al., 2008]. Elevation corrections might make the comparison of point measurements with reanalysis results at different horizontal resolutions more compatible, but they would also introduce new errors because of the assumptions in the correction method.


[34] This work was supported by the Department of Science and Technology of China under grant 2009CB421403 and the National Science Foundation of China under grant 40905061 (to AW) and by NASA (NNX09A021G) and NSF (AGS-0944101) (to X. Z.). We thank NCAR for the use of the NCAR computers for obtaining ERA40, ERA-Interim, and NCEP/NCAR reanalysis data from the mass store system. The GLDAS data were acquired from the NASA Goddard Earth Sciences (GES) Data and Information Services Center (DISC). The MERRA data are hosted by the Global Modeling and Assimilation Office (GMAO), and the GES Disc is also thanked for the dissemination of the products. The CFSR data were obtained from NOAA's National Operational Mode Archive and Distribution System (NOMADS), which is maintained at NOAA's National Climatic Data Center (NCDC). The CAMP/Tibet data were obtained from the Coordinated Enhanced Observing Period (CEOP) web site, and the authors thank all the participants in the field observations of the CAMP/Tibet. Four anonymous reviewers are thanked for their valuable and insightful comments, which also helped us to discover and correct errors in some of the figures.