We assess the correspondence of reanalysis air temperatures from ERA-40, NCEP-1, and NCEP-2 with homogenized observational data from China for 1958–2001 and 1979–2001. Results indicate that climatologies for annual ERA-40, NCEP-1, and NCEP-2 air temperatures are lower than observations by −0.93°C, −2.78°C, and −2.27°C, respectively. Large negative differences for most of western China primarily contribute to this cool bias. Error analysis indicates that the internal coherence of ERA-40 data is better than NCEP-1 or NCEP-2. Although NCEP-2 air temperatures represent an improvement over NCEP-1, biases of NCEP-1 and NCEP-2 data relative to observations are still much larger than for ERA-40. Areas with positive/negative air temperature differences (dT) between reanalysis and observational data correspond to negative/positive elevation differences (dH). The high correlation coefficients of −0.94, −0.88, and −0.85 between dT and dH for ERA-40, NCEP-1, NCEP-2, and observations, respectively, illustrate that the air temperature differences between reanalysis data and observations are primarily related to elevation differences. Furthermore, a spatial and temporal comparison of trends also indicates that ERA-40 temperature changes correspond most closely to observed trends in China. In general, our comprehensive analysis of the three global reanalysis products indicates that, both on a seasonal and annual basis, ERA-40 temperatures correspond most closely to observations, and biases are due mainly to the elevation differences.
 Most of these studies on climate change in China, particularly in western China, are based on very limited data and information. The sparse and uneven distribution of meteorological sites in western China and on the Qinghai-Tibetan Plateau contributes to significant uncertainties in evaluating climate and climate change. Meteorological stations in mountainous regions of the Northern Hemisphere midlatitudes are generally located in low-elevation areas and suffer from multiple relocations during the period of record [Li, 1995; Robinson et al., 1993]. Ground-based measurements therefore potentially contain spatial and temporal inhomogeneities and other inconsistencies. Any analyses based on these data may result in biases and errors.
 Atmospheric reanalysis projects, based on frozen forecast and data assimilation systems, yield long-term records of analyzed atmospheric and modeled surface fields [Serreze et al., 2005] and provide continuous globally gridded data that have been widely used in model initializations and climatic research [e.g., Betts, 2004; Phillips et al., 2004]. Both the European Centre for Medium-Range Weather Forecasts (ECMWF) and the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) produce reanalysis products including 2-m air temperatures [Kalnay et al., 1996; Simmons and Gibson, 2000].
 NCEP/NCAR and ECMWF analysis-forecast systems differ in their model structure, physical parameterization, horizontal and vertical resolutions, and their methods of processing the input observations. Consequently, there are differences between the modeled surface fields [Betts and Beljaars, 2003]. The purpose of the ECMWF 40+ year reanalysis (ERA-40) is to produce an objective analysis of the atmosphere, making optimal use of a wide range of observing systems. The ERA-40 system has two main elements: (1) the analysis system that combines a background field in an optimal way with observations, and (2) a forecast model that provides the background field by propagating the atmospheric state from one time level to the next [Betts and Beljaars, 2003]. The original NCEP/NCAR Reanalysis Project (NCEP-1 hereafter), uses a state-of-the-art analysis/forecast system to perform data assimilation using past data from 1948 to the present and has a number of benefits: the assembly of a very comprehensive observational database, the length of the period covered, and that it is updated continuously [Kalnay et al., 1996]. Serreze and Hurst  provide a brief overview of the NCEP-1 system. Despite their many benefits, atmospheric reanalysis data products inevitably have biases. It is therefore necessary to evaluate atmospheric reanalysis data in conjunction with observations from meteorological stations before they can be reliably applied.
 Previous research on the validity of NCEP data in China [Zhao et al., 2004] indicated that although the NCEP reanalysis air temperatures capture well the general climate patterns of China, there are large biases, especially on the Qinghai-Tibetan Plateau. After the completion of the ERA-40 reanalysis, some comparison studies focusing on the Qinghai-Tibetan Plateau were performed. Results indicate that the improved NCEP reanalysis, the NCEP/Department of Energy (DOE) Reanalysis 2, (NCEP-2 hereafter) products are better than NCEP-1, but neither are as good as ERA-40 [Li et al., 2004a]. Evaluation of air temperatures from ERA-40 with those from meteorological stations on the Qinghai-Tibetan Plateau shows that although there is generally a cold bias, ERA-40 captures the interannual pattern of air temperature variation very well, and the cold bias is mainly due to the elevation differences between ERA-40 grid cells and meteorological stations on the Tibetan Plateau [Frauenfeld et al., 2005]. Continental/hemispheric-scale comparisons using data from the University of East Anglia's Climatic Research Unit (CRU), which are derived directly from monthly station data [Simmons et al., 2004], indicate that ERA-40 data better capture the major characteristics of short-term variability and the long-term trends in CRU data than NCEP-1 does. Recent work evaluating air temperatures from ERA-40 on a 2.5° × 2.5° grid scale and NCEP-2 on the T62 Gaussian grid with observational data from 730 meteorological stations in China for 1979–2001 [Zhao and Fu, 2006] further illustrates that ERA-40 provides more realistic air temperatures than NCEP-2.
 Earlier comparison studies on the Qinghai-Tibetan Plateau [Frauenfeld et al., 2005] and in China [Zhao and Fu, 2006] used ERA-40 data at 2.5° × 2.5° resolution because ERA-40 data on their native reduced Gaussian grid (hereafter, N80 grid, approximately 125 × 125 km) were not yet publicly available. Furthermore, the earlier work was based on comparisons with raw observations from Chinese meteorological stations. However, raw air temperature data from these stations still contain errors and biases due to station relocations, changes in instrumentation and observing practices, discontinuities, and potential human errors. In this study we employ ERA-40 data on their native N80 grid, and NCEP-1 and NCEP-2 data on their native T62 Gaussian grid (approximately 1.875° × 1.875°). Furthermore, we use homogenized air temperature data for all of China [Li et al., 2004b, 2006] with comprehensive error and bias corrections and quality control (see detailed homogenization methods in section 3.1). To achieve a uniform spatial scale for all data sets we select the ERA-40 N80 grid scale and apply an improved Inverse Distance Weighting (IDW) method [Pan et al., 2004] and a simple IDW method to interpolate climatology and anomalies, respectively, for both observational data and the NCEP reanalyses.
2. Data Sources
 The basic data sets used in this study include homogenized monthly air temperatures from Chinese meteorological stations for ∼1951–2004, 6-hourly ERA-40 air temperature data, available for 1957–2002 [ECMWF, 2004], and monthly NCEP-1 (1948–present) and NCEP-2 (1979–present) air temperatures from NCEP/NCAR and NCEP/DOE. Additionally, station elevations and surface geopotential height from ERA-40, NCEP-1, and NCEP-2 grid cells are used in this study.
2.1. Air Temperature Data
2.1.1. Ground-Based Air Temperature From Chinese Meteorological Stations
 The China Homogenized Historical Temperature Data sets (1951–2004), version 1.0 mean monthly air temperatures, available for 731 Chinese meteorological stations and covering the period from the beginning of record (approximately January 1951) through December 2004, are obtained from the Climate Data Center (CDC) of the National Meteorological Information Center (NMIC), China Meteorological Administration (CMA). Homogenized air temperature data are more reliable than the original raw records because they are verified by strict quality controls, including the examination of internal data consistency and homogeneity [Li et al., 2004b]. For a detailed description of the data quality control and homogenization, please see section 3.1.
 Meteorological station density in China is higher in the south and east than in the north and west (Figure 1a). Most stations were established in the 1950s and the number of stations has been constant since the 1960s (Figure 1b). Approximately 70% of stations experienced relocations [Wu, 2005], of which about 31% relocated once and about 41% relocated twice or more [Li et al., 2004b]. For example, Wutaishan Station in Shanxi Province was initiated in 1956 and relocated in 1998 [Li et al., 1999, 2004b] resulting in an elevation change from 2895.8 m to 2208.3 m. Therefore the mean annual air temperature experienced a sharp increase in 1998 (Figure 2). The homogenization applied to the observational data eliminated the potential errors caused by such station relocations. Otherwise, if homogenization could not be successfully applied, sites were simply considered to be a new station after relocation. For those data filtered by the quality control, a manual discrimination is done to double-check the validity of these data. When doing consistency verification, the Peterson-Easterling method [Peterson and Easterling, 1994] is used to create a relatively homogeneous climatology reference series. The basic idea is to select time series from nearby stations which correlate highly (i.e., as reference sites), and then combine them via spatial interpolation [Li et al., 2004b]. See section 3.1 for details on the homogenization procedures. Rigorous and objective homogeneity adjustments reduced the noise and possible errors, and improve the usability of the raw air temperature data.
 The 30-year climatology of monthly and annual air temperatures from Chinese meteorological stations was calculated for the period 1971–2000. We eliminated stations with less than 20 years of data during this 30-year reference period. This resulted in 669 remaining stations which were used in this study (Figure 1a).
2.1.2. ERA-40 Air Temperature
 ERA-40 surface reanalysis air temperature data at 6-hourly intervals from September 1957 through August 2002 are obtained from NCAR (http://dss.ucar.edu/datasets/ds117.0/). The reanalysis model's N80 grid has a spatial resolution of about 125 km, which corresponds to approximately 1.125° on the equator [Kallberg et al., 2004].
 ERA-40 air temperature at the 2-m level from forecasts are postprocessing products and obtained by interpolation between the lowest model level and the surface, assimilated with ground-based temperatures [Betts and Beljaars, 2003]. Some of our 731 stations were likely available to and used by the ECMWF. However, it is unclear exactly how many and which stations were available during which times over the September 1957 to August 2002 period. It is important to note, however, that a subset of the 731 stations likely would have been available for the reanalysis and our comparisons between ERA-40 and observations are therefore not strictly independent. ERA-40 temperatures are analyzed fields of an atmospheric reanalysis. However, the assimilated ground-based air temperatures are not used as the initial condition for the forecast in the next time step. For a broad overview of the ERA-40 project, see http://www.ecmwf.int/research/ifsdocs/CY23r4/.
 The original 6-hourly ERA-40 2-m air temperature outputs are for 0200 h, 0800 h, 1400 h, and 2000 h Greenwich Mean Time (GMT), which correspond to 1000 h, 1600 h, 2200 h, and 0400 h (the following morning) Beijing local time, the standard time in China. However, air temperatures from Chinese meteorological stations are measured at 0800 h, 1400 h, 2000 h, and 0200 h the following morning (Beijing time) throughout China. This makes it difficult to attribute potential discrepancies between daily station versus ERA-40 air temperatures as real, or an artifact of the different observing times. Therefore in this study we mainly focus on seasonal and annual timescales, so that the discrepancies due to differences in observation time can be neglected. All data used in this study are arithmetically averaged into daily, monthly, seasonal, and annual values.
2.1.3. NCEP-1 and NCEP-2 Air Temperature
 There are two versions of the NCEP reanalysis. The NCEP-2 reanalysis, covering January 1979 through the present, represents an improved version of the NCEP-1 reanalysis, which spans January 1948 through the present. Both NCEP reanalysis data sets were obtained from NOAA/NCEP (http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis.html). Monthly mean air temperature from both NCEP-1 and NCEP-2 are available on the T62 Gaussian grid with a spatial resolution of about 1.875° × 1.875° [Kalnay et al., 1996; Kistler et al., 2001].
 The NCEP-1 2-m air temperature is a standard modeled field. It represents a linear interpolation between the surface skin temperature and the temperature at the lowest model sigma level [NCEP/NCAR, 2005]. The NCEP-2 reanalysis is simply a rerun of the NCEP-1 reanalysis. It has the same input data and vertical and horizontal resolution as the NCEP-1 system but with more up-to-date physics, and corrections of known errors in NCEP-1 [Kanamitsu et al., 2002; Roads, 2003]. Over land, e.g., wintertime precipitation, surface air temperature, and surface fluxes in high latitudes are improved due to the correction of a “spectral snow” problem and additional corrections in snow cover and snow depth [Kanamitsu et al., 2002]. Despite these improvements, NCEP-2 should only be regarded as an updated and improved version of NCEP-1, but not a next-generation reanalysis. In applying these reanalysis air temperatures, it should be noted that while we evaluate 30-year climatologies for station data, NCEP-1, and ERA-40, we can only calculate the 1979–2001 climatology for NCEP-2 data (see section 4.1.1).
 Limited by the time period covered by ERA-40 air temperatures, the basic comparisons will be performed over the period 1958–2001 in this paper. To further assess NCEP-2 reanalysis data together with ERA-40 and NCEP-1 data, evaluations will be also conducted over the time period 1979–2001.
2.2. Elevation Data
 Elevation data are used to examine the effect of elevation on air temperature in all three reanalysis products and ground-based measurements. We use four elevation data sets: (1) elevations of the Chinese meteorological stations, obtained from CDC of NMIC in CMA (http://www.nmic.gov.cn/), (2) ERA-40 reanalysis model topography in N80 resolution, and (3) NCEP-1 and (4) NCEP-2 reanalysis model topography on the T62 Gaussian grid. An IDW method was applied to grid or regrid elevation data to the N80 grid scale (see details in section 3.2).
3.1. Homogenization of Air Temperature From Chinese Meteorological Stations
 Raw observed data may be inhomogeneous for many reasons. Station relocations and the analogous changes of stations’ local characteristics are critical issues that can introduce inhomogeneity into data sets. Hence homogenization is considered to be an important part of quality control.
 Raw air temperatures from Chinese meteorological stations are first examined for internal data consistency and homogeneity. A basic logic verification is performed on daily air temperatures that includes: daily maximum air temperature must be higher than and/or equal to any values recorded at the regular observing times; daily maximum minus daily minimum air temperature must be less than or equal to 24°C [Li et al., 2006], by convention of the World Meteorological Organization [WMO, 1993]. Inhomogeneity due to changes in the observing times and calculation methods of daily mean air temperature [Liu and Li, 2003] are also addressed. For example, air temperatures were measured 24, 8, or 3 times a day during 1951–1954, but changed to 24, 4, or 3 times daily since 1954. However, when measurements were made four times a day, the actual observing times were different during 1954–1960 (0100 h, 0700 h, 1300 h, and 1900 h Beijing time) from those after 1960 (0200 h, 0800 h, 1400 h, and 2000 h Beijing Time). Before 1954, daily mean air temperatures were based on the arithmetic average of all available daily values, but after 1954 changed to the average of the 4-times daily observations only. For those stations only with three measurements per day, air temperatures for 0200 h were first obtained by averaging the daily minimum air temperature with the air temperature at 2000 h. The interpolated air temperature for 0200 h was then averaged with the observations from 0800 h, 1400 h, and 2000 h to compute a daily mean air temperature [Liu and Li, 2003; Liu et al., 2005].
 After some of the aforementioned basic data quality controls, the homogeneity of the raw air temperatures was assessed via comparisons with reference series for each station. A First Difference (FD) Method developed by Peterson and Easterling  was applied to minimize any potential inhomogeneity in the reference series. For example, this includes: (1) calculating the correlation of the FD series between the target station and its neighboring stations, (2) testing how well the potential reference FD series predicts the air temperature of the target station using a multivariate randomized block permutation (MRBP) test, (3) combining the five most highly correlated stations that passed the MRBP test into a reference FD series using a weighted mean, and (4) converting the reference FD series into a reference series and adjusting the values so that the final year's value of the reference series equals the one from the target series.
 It was assumed that the reference series accurately reflects the climate of the region where the target station lies. Therefore any significant departures from this reference series would be considered a potential data discontinuity of this station. Each of the possible discontinuities was tested using a Multi-Response Permutation Procedure (MRPP) [Mielke, 1991]. If any discontinuity reached a 95% significance level, it was considered a true discontinuity. The time series was then subdivided into two parts around the discontinuity. Each of these smaller sections was again tested until no significant discontinuities were found or the time series was too short to be tested (<10 years). The last step is to adjust the value in the target series according to the departure from reference series so as to finalize the entire process of homogenization.
3.2. Spatial Interpolation Method
 Interpolation is performed at the monthly scale. For proper comparison among all four data sets, a uniform grid scale is required. We use the ERA-40 N80 grid scale. When gridding/regridding the data, the climatology for 1971–2000 and the anomalies relative to this climatology are interpolated separately.
Where j denotes the number of grid cells after interpolation; n denotes the number of points within each grid cell j; Ti denotes the mean air temperature for each point before interpolation. Wi denotes the weight of each point involved and ΔTi denotes the air temperature correction for each involved point where:
The standard environmental lapse rate, γ, is taken as 6.0 × 10−3 °C m−1 [Willmott and Matsuura, 1995, 2001]; Xi, Yi, Ei are the latitude, longitude, and elevation, respectively, of the original locations; Xj, Yj, Ej are the latitude, longitude, and elevation, respectively, of the grid cell centers after interpolation; Cx, Cy, Ce are the regression coefficients for latitude, longitude, and elevation, respectively, that are obtained by regressing the known air temperature with its latitude, longitude, and elevation; di denotes the distance between input point i and output point j; Ri denotes the radius for interpolation.
 It has been reported that the optimal radius for 4 to 10-point interpolations of irregular station data in China by the improved IDW method is 150–250 km [Pan et al., 2004]. We employ a more conservative 150 km radius. If there are more than nine stations within this radius, the nearest nine stations are used and a nine-point interpolation is applied; if there are less than nine stations, all available stations are used for interpolation; if no station was available, a missing air temperature value is assigned. The same procedure is applied for the stations’ elevation values to determine the station elevation within each N80 grid cell. Figure 1c shows the distribution of N80 grid cells (background dashed line), where black points represent the grid cell centers with station data. While it appears that some grid cells along coast are over the ocean (Figure 1c), this is a visual artifact due to the plotted resolution of the coastline. There are a total of 679 N80 grid cells with station observations across China. Having more grid cells (679) than stations (669) is because with a 150 km radius, some stations are interpolated into multiple grid cells.
 It should be noted that some grid cells might contain a different number of stations for different months due to discontinuous and irregular measurements at some sites. For example, grid cell 569 (40.93°N, 127.50°E) includes five stations from June to September, but only 4 stations for the remainder of the year. This is because Tianchi station only operates during the warm season due to its high elevation of 2623 m a.s.l., far higher than the other four stations. This creates the problem that the elevation of this grid cell after interpolation is 1107.4 m from June to September, but only 765.9 m in the other months, which introduces an air temperature bias up to 2.1°C. We therefore retain the maximum elevation for such grid cells, and adjust the temperature to the higher elevation again using a lapse rate of −0.6°C/100 m.
 The weighting equation (3) is used for the interpolation of observational data. For the NCEP reanalysis data, only equations (1) and (2) are applied for the climatology interpolation. Also, elevation differences usually contribute to climatology biases but not to the anomalies. Thus only a simple IDW method is applied to interpolate the air temperature anomalies of NCEP data.
 After the interpolation of the monthly data, the climatology values and their corresponding anomalies are combined to produce the gridded monthly air temperature, and seasonal and annual data are then computed. Here, winter refers to December, January, and February (DJF), spring to March, April, and May (MAM), summer to June, July, and August (JJA), and autumn to September, October, and November (SON). If the value in any month is missing, air temperature for that year or season is set to missing.
3.3. Basic Statistical and Error Analysis
 Based on the gridded data in N80 resolution, a series of basic statistical methods are used to analyze and evaluate the validity of reanalysis data in China. We calculate the mean and standard deviation, perform standardization to create z-scores, and calculate simple linear correlations. Additionally, Standard Deviation of the Error (SDE) and Mean Absolute Error (MAE) methods are employed to assess the biases of the reanalysis data.
3.3.1. Standard Deviation of the Error (SDE)
 SDE denotes the standard deviation of errors from their mean. As standard deviation is a method to evaluate the degree of departure from the mean value, SDE can be used to examine the consistency of data. Smaller SDEs represent better consistency.
where Φi represents air temperature, denotes the climatology of Φi, Φi′ represents the departure from climatology, and is the mean departure.
3.3.2. Mean Absolute Error (MAE)
 MAE is usually used to evaluate the accuracy of continuous variables. It measures the average magnitude of the errors in a set of forecasts without considering the signs, and hence is unambiguous and the most natural measure of average error magnitude [Willmott and Matsuura, 2005]:
 MAE is the average over the “total error” that is obtained by summing the absolute values of the errors. All the individual differences are weighted equally in the average.
4.1. Air Temperature Differences
4.1.1. Temporal Comparisons
 We calculate standard 30-year climatologies for all temperature products. Because NCEP-2 only begins in 1979, its climatology can only be calculated for the 23-year period of 1979–2001. Therefore we first examine monthly, seasonal, and annual air temperature from observations, ERA-40, and NCEP-1 for both 1971–2000 and 1979–2001, and from NCEP-2 for 1979–2001 in order to examine the magnitude of differences between the two climatology periods for each data set (Table 1). For 1971–2000, mean annual air temperatures based on observations, ERA-40, and NCEP-1, are 8.84°C, 7.91°C, and 6.06°C, respectively, and for 1979–2001, temperatures are 8.98°C, 8.01°C, and 6.14°C (Table 1). There are thus differences of −0.14°C, −0.10°C, and −0.08°C, respectively, when using 1971–2000 versus 1979–2001.
Table 1. Climatology of Monthly, Seasonal, and Annual Air Temperatures (°C) for Observations, ERA-40, NCEP-1, and NCEP-2
Differences (reanalysis minus observations)
1979–2001 minus 1971–2000
 The 1971–2000 “reanalysis minus observation” temperature differences for ERA-40 and NCEP-1 are −0.93°C and −2.78°C, respectively, and for 1979–2001 they are −0.97°C and −2.84°C. For ERA-40 and NCEP-1, the impact of the different base period is thus only 0.04°C and 0.06°C. We are therefore confident that we are not biasing our results by using 1971–2000 as the base period for the observations, ERA-40, and NCEP-1 data, and 1979–2001 for NCEP-2 data. Furthermore, this is consistent with the base periods used for the climatology interpolation.
 For China as a whole, all three reanalysis products have cold biases when compared with ground-based measurements (Table 1). Climatology differences of mean annual air temperatures between ERA-40, NCEP-1, and NCEP-2 and the ground-based measurements are −0.93°C, −2.78°C, and −2.27°C, respectively (Table 1). This indicates that the agreement between ERA-40 data and ground-based measurements is the best, supporting previous work [Zhao and Fu, 2006]. Seasonally, the ERA-40 data also agree better year-around than NCEP-1 and NCEP-2 (Table 1). The differences between ERA-40 data and the in situ measurements are smallest in winter and biggest in spring with values of −0.64°C and −1.13°C, respectively. The differences range from −2.35°C and −2.06°C in summer to −3.34°C and −2.61°C in autumn for NCEP-1 and NCEP-2, respectively, which implies that the NCEP-1 and NCEP-2 data are better in summer and worse in autumn. The NCEP-2 data also indicate slightly better agreement than NCEP-1, which is consistent with the conclusion from Zhao et al.  indicating that NCEP-2 improved the 2-m air temperatures over NCEP-1.
 Temporal variations of annual and seasonal air temperature departures (Figure 3) indicate that ERA-40 air temperature agrees more closely with ground-based measurements than NCEP-1 or NCEP-2 data, which is again consistent with previous findings [Simmons et al., 2004]. ERA-40 agreement with observations improved significantly since the early 1970s (Figure 3a) due to improved input data (both in situ and satellite remote sensing data) for the reanalysis over China [Simmons et al., 2004]. Correlation coefficients of annual data for 1979–2001 are 0.98 and 0.94 between in situ data and ERA-40 and NCEP-1, respectively, while they decrease to 0.84 and 0.66 over the period 1958–2001. Surprisingly, the correlation coefficient between NCEP-2 and in situ data is 0.89 over the period 1979–2001, slightly lower than the correlation coefficient between NCEP-1 and in situ data although in general, the NCEP-2 data are closer to observations than the NCEP-1 data (Table 1). All correlations are significant at the 95% significance level according to Student's t-tests.
4.1.2. Spatial Comparisons
 To further assess the biases between the reanalysis data and ground-based measurements, spatial evaluations of the mean annual and seasonal air temperatures for 1958–2001 and 1979–2001 across China are conducted. As there are similar bias patterns for the two periods (not shown), we mainly focus the discussion on the period 1979–2001 for all three reanalysis products.
 Generally, ERA-40 data agree more closely with the ground-based measurements than NCEP-1 and NCEP-2 data, and all three atmospheric reanalysis products perform better in eastern China than in western China (Figure 4). In eastern China, the differences between ERA-40 (Figure 4a), NCEP-1 (Figure 4b), NCEP-2 (Figure 4c) and the in situ data are within ±1.0°C, −2.0 to +1.0°C, and ±1.0°C, respectively, except for several isolated places. It appears that the NCEP-2 products decrease the cold bias substantially compared to NCEP-1 in eastern China. In western China, considerable cold and warm biases occur on the edge of mountainous areas, especially around the Qinghai-Tibetan Plateau and in the basin areas, such as the Tsaidam and Zunghaer Basins (Figure 5). On an annual basis, the extreme cold and warm biases can be up to −18.45°C and +10.25°C, respectively, for the ERA-40 data, −17.84°C and +8.11°C for NCEP-1 data, and −16.80°C and +8.25°C for NCEP-2 data. All of these extreme biases occur in western China. Using the original nonhomogenized observational data, Zhao and Fu  found a similar underestimation of ERA-40 air temperature, which is generally less than 1°C in eastern China and greater than 12°C in the western part of the Qinghai-Tibetan Plateau.
 Seasonal air temperature differences (Figure 6) between ERA-40, NCEP-1, and NCEP-2 and observations for spring, summer, autumn, and winter indicate similar biases as those in annual data, which are again greater in magnitude in western China than in the east. Furthermore, biases in winter are bigger than for other seasons.
 As all three reanalysis efforts seem to perform worse in western China than in the east, one of the major differences between these two regions is elevation and terrain. Given the complex terrain in western China and the strong impact of elevation on air temperature, we next seek to establish if, and to what extent, the elevation differences (dH) between the reanalysis grids and the station grids (Figure 7) influence the air temperature differences (dT). There are considerable differences in western China between station elevation and reanalysis model elevation. In high elevation areas, model elevation is higher than the interpolated station elevation. In low elevation areas, the model elevation is lower than station elevation. A possible explanation is that meteorological stations are usually located in easily accessible places, i.e., in potentially nonrepresentative areas such as the valleys of mountainous regions, and along the margins of basins [Frauenfeld et al., 2005; Robinson et al., 1993]. Therefore the station elevations are lower than the surrounding mountains, and higher than the basin areas. In eastern China, the elevation differences between the reanalysis grids and station grids are generally within ±200 m except for some isolated places with more than 400 m of negative elevation differences (Figure 7). This can be accounted for by the few stations actually located near the summits of mountains within some grid cells. For instance, the elevation of Taishan Station (36.27°N, 117.01°E) in Shandong Province is 1534 m, whereas the nearby elevation is only about 130 m according to the elevation of Tai’an Station. This will raise the mean elevation of any grid cells that include/interpolate Taishan Station.
 Comparing air temperature differences between reanalysis data and observations (Figure 4), it is evident that bigger positive elevation discrepancies correspond to greater negative air temperature biases, and vice versa. This is consistent with the findings of previous research on the Qinghai-Tibetan Plateau [Frauenfeld et al., 2005]. Therefore the correlations between dH and dT for the annual data are further examined (Figure 8). A strong linear correlation is evident between dH and dT, each of which has 679 data points. The statistically significant (95%-level) correlation coefficients between ERA-40 (Figure 8a), NCEP-1 (Figure 8b), and NCEP-2 (Figure 8c) and observations are −0.94, −0.88, and −0.85, respectively, and the gradients (slopes) are −0.61°C/100 m, −0.68°C/100 m, and −0.61°C/100 m, respectively. The same correlation coefficient of −0.94 and approximate lapse rate of −0.60°C/100 m for ERA-40 air temperature are also reported by Frauenfeld et al.  for the Qinghai-Tibetan Plateau. Specifically, in the grid cells with maximum negative dT mentioned above, dH between ERA-40, NCEP-1, NCEP-2 and stations are 3054.83 m, 2544.48 m, and 2453.53 m, respectively, and dH corresponding to maximum positive temperature biases are −1723.65 m, −1865.08 m, and −1968.98 m. These all represent the biggest elevation differences between reanalysis and station data.
4.2. Error Analysis
 To evaluate the internal coherence and errors of reanalysis data from ground-based measurements, error analyses for the discrepancies are conducted for the periods of 1958–2001 (Figure 9a) and 1979–2001 (Figure 9b) using SDE and MAE. Both SDEs and MAEs from ERA-40 are much smaller than those from NCEP-1 or NCEP-2. This indicates that the ERA-40 data are more coherent and closer to the ground-based measurements than the NCEP-1 and NCEP-2 data in China. SDE for the ERA-40 data decreases from 0.23°C over 1958–2001 (Figure 9a) to only 0.09°C over 1979–2001 (Figure 9b) and decreases from 0.33°C to 0.14°C for NCEP-1 data. This indicates that reanalysis products have become more consistent since 1979. Simmons et al.  attribute this improved consistency since 1979 to the amount and quality of input data, specifically increased in situ observations and satellite remote sensing data. Although MAE for annual NCEP-2 data is smaller than that for NCEP-1, 2.42°C versus 2.84°C, (Figure 9b), SDE for NCEP-2 data is slightly larger than for NCEP-1 in some months. Therefore SDE for annual NCEP-2 data during 1979–2001 is slightly higher, 0.19, than for NCEP-1 over the same period. This implies that NCEP-2 air temperatures have a decreased bias, but less internal coherence than NCEP-1 data. Seasonally, ERA-40 data indicate a low SDE during each season, and the lowest MAE in winter. However, although the seasonal variations of SDE and MAE are lower in NCEP-2 than NCEP-1 data, seasonal variations are still evident, with bigger SDEs and MAEs in spring and autumn, and smaller ones in winter and summer. This indicates that the reliability of ERA-40 data is less affected by seasonality and the ERA-40 system is more consistent than the NCEP system.
 Spatial distributions of SDE and MAE for annual and seasonal data are investigated over China for both 1958–2001 and 1979–2001. Again, we only show the results from the annual data over 1979–2001 since the results for 1958–2001 exhibit the same patterns. Both SDE (Figures 10a–10c) and MAE (Figures 10d–10f) for biases between the ERA-40 and ground-based measurements are much smaller than those for NCEP-1 and NCEP-2 data, which again indicates better performance of ERA-40 than NCEP-1 or NCEP-2 air temperatures. Similarly, smallest SDEs and MAEs occur in southeastern China, and the biggest values occur in western China for all reanalysis products. This is possibly due to the sparser density of stations and more complex topography in western China than in the other regions. Seasonal comparisons (not shown) indicate, although there are smaller SDEs and MAEs for all three reanalysis data sets in winter for China as a whole (Figure 9), SDE and MAE are greater in winter than during the other seasons in western China, which is consistent with results shown in Figure 6.
4.3. Trend Analysis
 Although in general, trend analysis based on reanalysis data is not ideal, comparisons of trends can nonetheless provide a further assessment as to the relative quality of reanalysis data sets. Both temporal and spatial trends, again, for the periods of 1958–2001 and 1979–2001, are examined. Temporal trends of annual and seasonal air temperatures during 1979–2001 (Figure 3) show that trends in ERA-40 data match more closely with observations than NCEP-1 or NCEP-2 data. The trends in annual data from observations, ERA-40, NCEP-1, and NCEP-2 are 0.45°C/decade, 0.41°C/decade, 0.29°C/decade, and 0.33°C/decade, respectively. The seasonal trends are shown in Table 2 and further indicate that trends in ERA-40 data are always closer to observations than those from NCEP-1 or NCEP-2 for each season.
Table 2. Trends in Seasonal and Annual Air Temperatures (°C/Decade) From Observations and Reanalyses During 1979–2001; Statistically Significant Trends (95%-Level) in Bold
 Spatially, statistically significant trends (95%-level) of annual air temperatures are shown in Figure 11, where red and blue circles denote positive and negative trends, respectively. Warming trends are evident across all of China (Figure 11a), especially in the north. Obviously, ERA-40 data capture most of these increasing trends, including the large warming trends in the north, while NCEP-1 data miss much of the warming in southern and southwestern China and NCEP-2 data even miss the strong warming observed in the northeast. Seasonally, ERA-40 data also capture the warming well (Figure 12), while NCEP-1 and NCEP-2 data correspond less to the observed trends. Note that NCEP-1 data indicate negative trends over the southern Qinghai-Tibetan Plateau in spring and summer. Although NCEP-2 data do not show these negative trends (with the exception of Sichuan Basin in summer), they capture much less of the seasonal warming across China, while overestimating winter warming over eastern China and the eastern part of north China.
5. Discussion and Conclusions
 The agreement of reanalysis air temperatures from ERA-40, NCEP-1, and NCEP-2 with homogenized air temperatures from meteorological stations is evaluated across China for many aspects including spatiotemporal variations, bias and error estimation, and trends analysis. A 30-year climatology is determined for 1971–2000 (1979–2001 for NCEP-2 data). An improved IDW method is employed to achieve a uniform N80 grid scale for all evaluated temperature products, resulting in 679 N80 grid cells with station data across China. Results indicate that although some assimilation methods in the NCEP-2 system are improved over the NCEP-1 system, agreement with observations is not as good as in the ERA-40 system. Similar findings were reported by Li et al. [2004a] when evaluating air temperature data on the Qinghai-Tibetan Plateau. ERA-40 air temperatures agree more closely with ground-based measurements than NCEP-1 or NCEP-2 data, both temporally and spatially. However, it should be noted that the NCEP-2 reanalysis only starts in 1979; this results in a 23-year climatology, instead of 30-years as for the other products, and hence a comparison of slightly different time periods. Evaluating the impact of the different climatology periods indicates that it has a very minor effect on the results. Nonetheless, caution must be applied in the interpretation of the results regarding the NCEP-2 analysis, in particular the climatology comparisons. Although ERA-40 air temperatures tend to agree more closely with observations than NCEP-1 or NCEP-2, there are also some potential shortcomings evident in air temperatures from the ERA-40 project.
 All three reanalyses tend to have a cold bias in air temperature data compared with ground-based measurements: there are biases in the 30/23-year climatology of −0.93°C, −2.78°C, and −2.27°C for the annual differences between observational data and ERA-40, NCEP-1, and NCEP-2, respectively. Spatial examination illustrates that large biases occur over western China where the terrain is complex. This is consistent with results from Zhao et al.  and Zhao and Fu , which indicate that ERA-40, NCEP-1, and NCEP-2 all underestimate the magnitude of air temperatures, and that differences in western China are much bigger than those in eastern China. The rates of air temperature change with height for ERA-40 and NCEP-2 data are both −0.61°C/100 m, while NCEP-1's is −0.68°C/100 m. Correlation analysis between air temperature differences (dT) and elevations differences (dH) indicates that there are significant negative correlation coefficients (R) between observations and ERA-40 (R = −0.94), NCEP-1 (R = −0.88), and NCEP-2 (R = −0.85) data, suggesting that perhaps ERA-40 air temperatures rely more on model topography than NCEP data. Frauenfeld et al.  also reported the same correlation coefficient of −0.94 and similar lapse rate of −0.60°C/100 m for ERA-40 air temperature over the Qinghai-Tibetan Plateau. This suggests the relationship between dH and dT for ERA-40 data is consistent across spatial domains. Although the reanalysis model topography might better represent the average elevation of grid cells than the interpolated elevations of ground-based stations does, it inevitably smoothes the observed elevation extremes by interpolating DEM elevation to the reanalysis model grid scale. This presents a potential problem, particularly in areas of complex terrain such as western China. Given the strong effect of elevation on air temperature, an improvement of topography in the reanalysis system may greatly enhance the accuracy of air temperature prediction.
 In general, we conclude that ERA-40 2-m air temperatures better represents observed air temperatures in China than NCEP does. Compared to the NCEP-1 system, NCEP-2 represents an improvement and NCEP-2 air temperature agrees more closely with observations than NCEP-1 data. NCEP-2 has smaller MAEs (Figure 9), but higher SDEs than NCEP-1 (Figure 9) and we find slightly lower correlation coefficients between NCEP-2 and observations than for NCEP-1 (0.89 versus 0.94 for 1979–2001). Despite improvements in NCEP-2, we argue that the internal coherence of the NCEP-2 system could still be further improved, compared to the performance of ERA-40.
 With increased social and economic development, surface processes such as the ones related to the urban heat island effect, land use and land cover change, and other anthropogenic activities are playing an increasingly important role in climate change. However, quantifying human activities is difficult because much of it is local or regional in nature, as are the impacts on climate, which results in further uncertainties and complexities. For instance, greenhouse gases and aerosols produced by the burning of fossil fuels may have opposing effects on air temperatures, potentially canceling each other; as a result, the urban heat island effect could have a different sign in different regions [Li et al., 2004c]. Land use and land cover also change greatly due to urbanization and changing agricultural practices. On the data-sparse and climatically complex Qinghai-Tibetan Plateau, land degradation and desertification caused by overgrazing and deforestation represent important land surface processes that can also affect climate. Biases caused by neglecting, over-, or under-estimating influences of human activities in reanalysis assimilation schemes are inevitable [Frauenfeld et al., 2005]. Therefore to develop an appropriate representation of the land surface could represent an important further improvement for future reanalysis efforts.
 Finally, station locations worldwide tend to be dictated by convenience, i.e., they congregate near relatively easily accessible places, and are seldom located in polar and high elevation regions. Although the inclusion of remote sensing data since 1979 enhanced the quality of reanalysis forecasts, properly sited first-hand ground-based measurements are not replaceable. Given the current distribution of surface stations, attempting continental and even regional scale reanalyses with higher spatial resolution, such as the North American Regional Reanalysis, will likely further verify and eventually improve the global reanalysis systems.
 The ERA-40 data were provided by the Data Support Section of the Computational and Information Systems Laboratory at NCAR. NCAR is supported by grants from the National Science Foundation. NCEP-1 and 2 Reanalysis data were provided by NOAA/NCEP, Camp Springs, Maryland, USA, from their Web site at http://www.cpc.ncep.noaa.gov/products/wesley/reanalysis.html. This study was in part supported by the opening fund from the State Key Laboratory of Cryospheric Sciences, Cold and Arid Regions Environmental and Engineering Research Institute, Chinese Academy of Sciences (CAS); the CAS International Partnership Project “The Basic Research for Water Issues of Inland River Basin in Arid Region” (CXTD-Z2005-2); the Chinese Academy of Meteorological Sciences, China Meteorological Administration; and the U.S. National Science Foundation's Office Polar Programs. We appreciate the thoughtful comments from the reviewers, whose suggestions helped improved this manuscript.