This paper documents various unresolved issues in using surface temperature trends as a metric for assessing global and regional climate change. A series of examples ranging from errors caused by temperature measurements at a monitoring station to the undocumented biases in the regionally and globally averaged time series are provided. The issues are poorly understood or documented and relate to micrometeorological impacts due to warm bias in nighttime minimum temperatures, poor siting of the instrumentation, effect of winds as well as surface atmospheric water vapor content on temperature trends, the quantification of uncertainties in the homogenization of surface temperature data, and the influence of land use/land cover (LULC) change on surface temperature trends. Because of the issues presented in this paper related to the analysis of multidecadal surface temperature we recommend that greater, more complete documentation and quantification of these issues be required for all observation stations that are intended to be used in such assessments. This is necessary for confidence in the actual observations of surface temperature variability and long-term trends.
 The global average surface temperature trend is the climate metric that has been most used to assess the human impact on climate change [IPCC, 2001]. The data used to assess this trend have been concluded to be robust and able to accurately define this trend in tenths of a degree per decade (Climate Change Science Program (CCSP) report, Temperature Trends in the Lower Atmosphere: Steps for Understanding and Reconciling Differences, U.S. Climate Change Science Program, Washington, D. C., available at http://www.climatescience.gov/Library/sap/sap1-1/public-review-draft/sap1-1prd-all.pdf; hereinafter referred to as CCSP report, 2006). The CCSP report concluded that with respect to global average temperature trends,
“For observations since the late 1950s, the start of the study period for this report, the most recent versions of all available data sets show that both the surface and troposphere have warmed, while the stratosphere has cooled,”
while for tropical temperatures (20°S to 20°N),
“Although the majority of observational data sets show more warming at the surface than in the troposphere, some observational data sets show the opposite behavior. Almost all model simulations show more warming in the troposphere than at the surface. This difference between models and observations may arise from errors that are common to all models, from errors in the observational data sets, or from a combination of these factors. The second explanation is favored, but the issue is still open.”
 A basic assumption in the CCSP report (2006), however, is that all significant uncertainties with respect to the surface temperature trend assessments have been resolved. This paper demonstrates that major under- and unrecognized issues with the quantification of the surface temperature trends remain.
2. Definition of a Global Average Surface Temperature
“According to the radiative-convective equilibrium concept, the equation for determining global average surface temperature of the planet is
where H……is the heat content of the land-ocean-atmosphere system…. Equation (1) describes the change in the heat content where f is the radiative forcing at the tropopause, T′ is the change in surface temperature in response to a change in heat content, and λ is the climate feedback parameter [Schneider and Dickinson, 1974], also known as the climate sensitivity parameter, which denotes the rate at which the climate system returns the added forcing to space as infrared radiation or as reflected solar radiation (by changes in clouds, ice and snow, etc.).”
 Thus T is the “global average surface temperature,” and T′ is a departure from that temperature in response to a radiative forcing f. It appears in equation (1) above as a thermodynamic proxy for the thermodynamic state of the Earth system. As such, it must be tightly coupled to that thermodynamic state. Specifically, changes in T must be proportional to changes in the radiation emitted at the top of the atmosphere. However, where is this temperature and its change with time, T′, diagnosed, and is it closely coupled?
 At its most tightly coupled, T is the radiative temperature of the Earth, in the sense that a portion of the radiation emitted at the top of the atmosphere originates at the Earth's surface. However, the outgoing longwave radiation is proportional to T4. A 1°C increase in the polar latitudes in the winter, for example, would have much less of an effect on the change of longwave emission than a 1°C increase in the tropics. The spatial distribution matters, whereas equation (1) ignores the consequences of this assumption. A more appropriate measure of radiatively significant surface changes would be to evaluate the change of the global average of T4.
 In most applications of (1), T is not a radiative temperature, but rather the temperature at a single level of the atmosphere, usually close to the ground. The CCSP report (2006) presents three separate analyses of the global surface temperature trend that use land- and ocean-based observations to evaluate T′. As they reported,
“Over land, “near-surface” air temperatures are those commonly measured about 1.5 to 2.0 m above the ground level at official weather stations, at sites run for a variety of scientific purposes, and by volunteer (“cooperative”) observers [e.g., Jones and Moberg, 2003]. These stations often experience relocations, changes in instrumentation and/or exposure (including changes in nearby thermally emitting structures), effects of land-use changes (e.g., urbanization), and changing observing practices, all of which can introduce biases into their long-term records. These changes are often undocumented.”
“‘Near-surface’ air temperatures over the ocean (‘Marine Air Temperatures’ or MATs) are measured by ships and buoys at various height from 2 to more than 25 m, with poorer temporal and spatial coverage than over land…. To avoid the contamination of daytime solar heating of the ships' surfaces that may affect the MAT, it is generally preferred to limit these to night MAT (NMAT) readings only. Observations of the water temperature near the ocean surface or “Sea Surface Temperatures” (SSTs) are widely used and are closely tied to MATs; ships and buoys measure SSTs within a few meters of the surface. The scale of the spatial and temporal coherence of SST and MAT anomalies is greater than that of near-surface air temperatures over land; thus a lower rate of oceanic sampling, in theory, can provide an accuracy similar to the more densely monitored land area.”
 These are the measured temperatures used to calculate T′. Thus there is variability in the actual height of the measurements, particularly over the oceans. Over land, most measurements are 1.5–2.0 m according to the CCSP report (2006). An important research question is whether temperatures at this level are sufficiently coupled to the radiative and thermodynamic characteristics of the Earth system. If not, (1) is not satisfied, and there are variations of T that have little to do with the underlying radiative balance. With the noise in the measure, any estimate of λ is fraught with inaccuracy.
 The observed average surface temperature over land is computed by averaging daily observed maximum and minimum temperatures. Section 3 shows that the 1.5–2-m minimum temperature over land is an example of a quantity that is not tightly coupled, given its strong sensitivity to the local land surface conditions, the overlying boundary layer thermodynamic stability, and the wind speed.
3. Difficulties With the Use of Observed Nocturnal Warming Trends as a Measure of Climate Trends
 One of the most significant features in the observed surface data set is the asymmetric warming between maximum and minimum temperatures. Minimum temperatures have risen about 50% faster than maximum temperatures in the observed surface data set since 1950 [Vose et al., 2005]. Thus this nocturnal warming is the largest component of the “global daily average” increasing temperature trends that are used as measures of global climate change and to which models have been compared. While global climate models have exhibited improved matches with the global daily average, they have not in general replicated the magnitude of the asymmetry in the surfaced observed maximum and minimum temperatures trends [Stone and Weaver, 2003]. We show in this section that there is a positive temperature bias in assessing a global average land surface minimum temperature when this temperature is sampled at 2 m or so above the surface. This warm bias has thus far been ignored in multidecadal assessments of temperature trends.
 As the boundary layer cools at night under light winds, the greatest decrease in temperature occurs near the surface [e.g., Stull, 1988; Arya, 2001]. Unlike the daytime boundary layer where convective turbulence tends to reduce vertical gradients, in the nocturnal boundary layer the cooling suppresses turbulence and enhances vertical gradients. Thus the vertical variation in temperature in light winds can be huge with temperature changes of 6°C or more often occurring within 25 vertical meters of the surface (see Figure 1). This is why great care must be taken to avoid contaminating the climate record with measurements from sites that have changed even a meter or two in their height of observation.
 During the evening transition, as the boundary layer changes from a convective regime with a gain of sensible heat to a thermodynamically stable regime with a loss of sensible heat, the vertical variation in air temperature can show a large sensitivity to wind speed. For example, Figure 2a shows the near-surface difference in air temperature (measured in shielded and ventilated housings) over a wetland site in central Colorado during the evening transition as a function of wind speed. As the net radiation decreased from slightly positive (75–100 W m−2) to negative (−100 to −75 W m−2), the temperature profile changed from a weak lapse to a strong inversion, but only during light winds. As the wind speed increased, differences between the two heights of temperature measurement decreased (a decrease in the vertical variation in temperature). Over a vertical distance of less than 1 m the temperature at the lower sensor (1.07 m) was more than 0.4°C greater than at the upper sensor (1.85 m), with this difference decreasing as the wind speed increased. In this example, nearly a 0.6°C difference in air temperature would be reported during low wind speed conditions depending on the chosen observation height.
 In addition to the temperature bias solely due to observation height as a result of large nocturnal temperature gradients, the absolute air temperature is also affected by wind speed. Using the same example as above, during the evening transition the observed air temperature tended to increase with wind speed until roughly 4 m s−1 and then decreased or remained relatively constant (Figure 2b; 1.07-m results shown, but the same result was found for the 1.85-m observation). During typical nocturnal periods (net radiation between −100 to −75 W m−2) the recorded minimum air temperature varied by nearly 10°C at low wind speeds. This observational evidence of the transition from a weak light wind cold solution to a warm windy solution provides support for the modeling evidence described by Pielke and Matsui .
 In addition to the thermodynamic stability and wind speed, the nocturnal boundary layer is sensitive to changes in land surface characteristics, such as heat capacity [Carlson, 1986; McNider et al., 1995a]. Additionally, it is also much more sensitive to external forcing such as downward longwave radiation from greenhouse gas forcing, water vapor, clouds, or aerosols than is the daytime boundary layer [Eastman et al., 2001; Pielke and Matsui, 2005]. The main reason for this sensitivity is that the nocturnal boundary layer is shallower than the daytime boundary layer. Thus heating of the surface due to infrared radiation or changes in heat capacity or conductivity of heat from the soil is distributed through a smaller air layer.
 This sensitivity of minimum temperature is a function of the nocturnal boundary layer depth, which is controlled by many parameters such as wind speed, surface roughness, and heat capacity [Shi et al., 2005]. As shown by Pielke and Matsui , the impact of increased downward longwave radiation depends on the wind speed imposed on the nocturnal boundary layer. The combination of this sensitivity and the strong vertical gradients can have major impacts on observed and modeled temperatures. Figure 3 (originally from Pielke and Matsui [2005, Figure 3]) shows that even a slight reduction in nighttime cooling (by 1 W m−2) on light wind nights results in a much greater increase in the 1.5- and 2.0-m air temperature than even the temperature a few meters higher in the nighttime boundary layer. Neither temperature is a good proxy for the temperature of the land surface itself, nor are they tightly coupled to the overlying atmosphere.
 While variation in cooling in the nocturnal boundary layer can be large, the depth of the nocturnal boundary layer is very small, often less than 200 m. Thus because temperature changes in this shallow layer can be strongly influenced by land use changes or external forcing such as downward longwave radiation from greenhouse gases, aerosols, thin clouds/contrails, et cetera, they are not a robust measure of changes in the deep heat content of the atmosphere.
 Perhaps more troubling in interpreting minimum temperature trends is that the nocturnal boundary layer has dynamic feedbacks that can amplify external forcing or changes in surface characteristics. The Pielke and Matsui  example showed that there is a large temperature response difference to surface forcing due primarily to the fact that the depth of the nocturnal boundary layer is dependent on wind speed. However, this was a static simulation that did not include the fact that the forcing itself can change the depth of the boundary layer by making it less stable. In a series of papers exploring the nonlinear dynamics of the stable boundary layer [McNider et al., 1995a, 1995b; Shi et al., 2005; Walters et al., 2007] it was shown that in certain parameter spaces the nocturnal boundary layer can rapidly transition from a cold light wind solution to a warm windy solution. In these parameter spaces, even slight changes in longwave radiative forcing or changes in surface heat capacity can cause large changes in surface temperatures as the boundary mixing changes. However, these temperature changes reflect changes in the vertical distribution of heat, not in the heat content of the deep atmosphere.
 In order for models to capture these transitions and feedbacks they require high resolution with vertical spacing of 5 m or less. Global models in general do not capture these important transitions. In these models the 1.5–2-m temperature is extrapolated from the lowest model layer and thus can only marginally reproduce actual nocturnal boundary layer behavior.
 The essence of a stable nocturnal boundary layer is to disconnect the near surface from the outer atmosphere. Thus the observed temperatures in a nighttime boundary layer are highly influenced by local conditions. In fact, Runnalls and Oke  use this physical sensitivity as a measure for detecting land use change or instrument site changes in the temperature record.
 Since T′ is constructed using the average of maximum and minimum temperatures, the inclusion of minimum temperatures from light wind nights necessarily introduces a bias in T′ as contrasted with a boundary layer average value of temperature. The use of this 1.5–2-m T′ in equation (1) will therefore overestimate the magnitude of the change of heat content (the radiative forcing) of the Earth's climate system, when compared to a model-estimated sensitivity using the lowest model layer temperature. As shown in Table 1 and confirmed by the observational evidence in Figure 2, the magnitude of the difference in the 1.5- and 2.0-m temperatures under light winds due to a 1 W m−2 forcing is on the order of 0.5 to over 1°C. This difference in nighttime cooling can be interpreted as the bias in a trend if the reduction in 1 W m−2 occurred over several decades.
Table 1. Potential Temperature Increase at Different Levels From the Experiment With −49 W m−2 Cooling to the Experiment With −50 W m−2 Coolinga
 This also means that the finding by Parker [2004, 2006] that there is no difference in trends between windy and calm nights is curious since the fundamental understanding of stable boundary layer physics requires that trends near the surface, for the same boundary layer warming, will be greater on light wind nights.
 Given these facts about the nocturnal boundary and their impact on the minimum temperatures in the observed climate data set, it is suggested that minimum temperatures measured in the surface data set are poor measures of radiative forcing in the Earth system as represented by equation (1). Further, because climate models have difficulty in resolving the processes in the stable boundary layer, the use of daily average temperatures from the models and comparing them to the daily average from the observed data set can lead to erroneous conclusions in terms of model fidelity.
 In summary, given the lack of observational robustness of minimum temperatures, the fact that the shallow nocturnal boundary layer does not reflect the heat content of the deeper atmosphere, and problems global models have in replicating nocturnal boundary layers, it is suggested that measures of large-scale climate change should only use maximum temperature trends.
4. Photographic Documentation of Poor Sitings
 Major problems with the microclimate exposure of a subset of surface Historical Climate Network (HCN) sites have been photographed [Easterling et al., 1996; Davey and Pielke, 2005]. The temperature instruments that are used in the HCN are often sited close to buildings, under trees, and near other local influences on the microclimate. These microclimate influences also change over time. Figure 4a shows an example of good site exposure (for Crawfordsville, Indiana). Figure 4b, however, shows the relative departure of the Crawfordsville surface temperature with respect to its climate division. As shown by C. Holder et al. (How consistent are surface air temperature data for climate change studies?, Climate Research, in review, 2007), this difference can be used as another measure of the micrometeorological (station scale) versus regional (division average) climate changes. As seen in Figure 4b, there are many documented station changes, each of which can alter the temperature data trends.
 The integrity of climatological observations is often compromised by poor environmental exposure of instruments. Examples of poor exposure are provided by three U.S. HCN (USHCN) sites in Kentucky (see figures for Greensburg  (Figure 5), Leitchfield_2_N  (Figure 6), and Hopkinsville  (Figure 7)). In each case, a combination of anthropogenic (e.g., asphalt and concrete surfaces, buildings) and natural features (e.g., trees and shrubs, slopes) of the microscale environment creates forcings that are not representative of the larger mesoscale environment.
 Undesired exposure and “urban bias” is not limited to large urban areas; it may occur also in rural sites, predominantly agricultural locations, and in small urban areas. Mahmood et al. [2006a] completed comparisons of temperatures involving geographically proximate stations and found that undesired exposure of instruments to their microenvironments resulted in biased measurement of temperature. While research in urban micrometeorology [Arnfield, 2003] recognizes the impacts of diverse surfaces on energy budgets, energy exchanges, and small-scale advection in urban settings, these biases also occur elsewhere. Many rural and small town stations, for example, have microscale exposures that are similar to those often associated with urban areas. As a result of poor exposure, a difference in average monthly minimum (maximum) temperatures of 3.8°C (1.6°C) was found between a pair of stations separated by a distance of 40 km (37 km) and located at similar elevations.
 The influence of exposure on temperature may also not be homogeneous over seasons. In one set of comparisons the presence of a vegetation canopy reduced warm season maximum temperature by more than 2.0°C at a USHCN site in Leitchfield, Kentucky, relative to a nearby Cooperative Observer Program (COOP) site with an open exposure. The canopy had a similar, but smaller, effect on minimum temperature. A seasonally variable bias may also be anticipated from an asphalt surface, where the surface contributes to higher maximum and minimum temperatures during the warm season and has the opposite effect during the cool season.
 Aside from seasonality, examination of comparative temperature variations at paired stations can reveal complex temporal patterns that are not stationary and are difficult to explain, especially when spatial metadata is insufficient. For example, annual growth of a nearby tree or bush can gradually alter the microclimate of a station and influence long-term temperature records. Eventually, vegetation may be trimmed, causing an abrupt shift in the time series. Likewise, an asphalt street or parking lot may gradually fade from black to gray and then later be resealed or repaved. Alternatively, a new building may appear in the vicinity of the temperature sensor producing a direct radiative forcing on temperature or an indirect forcing associated with a change in microscale circulation. Unfortunately, such continuous and discrete changes in the microenvironment of a station often go undocumented in National Weather Service station history forms. As a result, patterns of annual variability can include step changes and trend reversals that occur with no documented explanation. Therefore one of the critical deficiencies in many bias correction functions is the assumption of stationarity of bias over a period of time.
Peterson  concluded that any biases associated with the poor siting in eastern Colorado, when adjusted, did not affect estimates of regional temperature trends. However, in a response to the Peterson article, Pielke et al.  pointed out several issues which Peterson did not adequately investigate, including often undocumented station changes, ignored uncertainties in the adjustments, and land use/land cover change issues associated with climate station adjustments.
 The errors in HCN sites, however, are not simply restricted to the example locations taken in the studies of Davey and Pielke  and Peterson , i.e., eastern Colorado; rather they are representative of a more general, broader problem with the HCN sites as illustrated earlier in this section. The lack of photographic documentation of the HCN sites is a remarkable omission, since the HCN network is used as part of the global analysis of surface temperature trends, such as reported in the CCSP report (2006).
5. Influence of Trends in Surface Air Water Vapor Content on Temperature Trends
 Near-surface air temperature trends are also significantly influenced by trends in surface air absolute humidity over the same time period since even with the same amount of heat within the near-surface air, the heat would be distributed differently between sensible and latent heat of the air. This issue has not been investigated in the assessment of multidecadal surface air temperature trends.
where H is the heat in Joules, Cp is the heat capacity of air at constant temperature, T is the air temperature, L is the latent heat of vaporization, and q is the specific humidity. With no change in H, this equation can be rewritten as
 As shown by Pielke , a 1°C change in dew point temperature from 23°C to 24°C at 1000 mb caused by evaporation (which changes q from 18 to 19 g kg−1), for example, produces a −2.5°C decline in ΔT. In other words, with the temperatures used here the air temperature would have to increase by 2.5°C to produce the same change in H as a 1°C increase in dew point temperature. This effective temperature that corresponds to the moist enthalpy content of the surface air can be referred to as TE.
 Thus if a surface measuring site (e.g., an HCN site) undergoes a local reduction in tree cover such that as a result q decreases, then even if the value of H were unchanged, there will be an increase in surface air temperature. The introduction of irrigation around the site, in contrast, will result in higher values of q, but a reduction of the surface air temperature. In both cases an analysis of land surface temperature trends based on such sites would be introducing a nonspatially representative trend into the data.
Davey et al.  examined whether there are long-term trends in the surface air absolute humidity for a variety of landscape types (see Figure 8), and found significant variations as a function of type. Moreover, only if the trends in surface air absolute humidity were zero would the use of surface air temperature be an accurate measure of the trends in surface air heat over this time period. The diagnosed trends over the period of the study were up to 0.2 g kg−1 per decade and more, which corresponds, using equation (3), to tenths to single digits of a degree C in equivalent surface air temperature trend.
 Ignoring concurrent trends in surface air absolute humidity therefore introduces a bias in the analysis of surface air temperature trends. A recent paper [Rogers et al., 2007] further confirmed the need to include both trends in temperature and absolute humidity in order to describe observed temperature trends. They found for a site in Ohio, for example, that the highest moist enthalpy (TE) occurred during the summer of 1995 when both temperature and absolute humidity were very high, while the hot summers of 1930–1936 had relatively low or negative anomalies of absolute humidity, and thus lower moist enthalpy (TE).
 The surface Bowen ratio is also a measure of the impact on the surface temperature with more sensible heat flux (i.e., larger Bowen ratio) leading to higher temperatures with the same heat content. Figure 9 illustrates that different wind speeds have a substantial effect on the Bowen ratio. This results in another confounding effect on the interpretation of long-term trends in near-surface temperatures, if there is also a long-term trend in winds. As seen in Figure 9, near-calm winds have a very different behavior from the other two, yet the light wind differs by nearly the same amount as the medium and heavy wind profiles. The surface winds also are of considerable importance in how quickly temperatures will fall later in the day and the resultant minimum temperature achieved overnight.
6. Uncertainties in the Homogenization of Surface Temperature Data
 Most surface climate monitoring stations have undergone changes in location or changes in the immediate microclimate which alter the relationship between that station's observations and the unknown, actual climate of the surrounding area. The time-dependent nature of these changes can cause climate trends estimated from such stations to deviate from the actual climate trends. This issue was identified as a concern by Willmot et al. . As shown in this section, major uncertainties still remain with the quantification of the homogenization of multidecadal surface temperature trends.
 In response to this problem, three approaches are possible: (1) discard all stations with known or suspected siting or microclimate changes before estimating climate trends; (2) attempt to correct, on a station-by-station basis, errors caused by changes in location or microclimate; and (3) attempt to quantify the bias and uncertainty caused by the inclusion of contaminated stations in the climate trend estimates. These three approaches are not mutually exclusive, and item (3) should be done in any case because some contaminated stations are likely to sneak through no matter how careful the screening.
 Here we first consider adjustments associated with changes in a station's location or environment. We address the relative merits of (1) versus (2) as homogeneity corrections are currently practiced, specifically in the context of estimating long-term climate variations.
 Changes in location or microclimate can be sudden or gradual. Sudden changes are caused by station moves, instrument changes, or sudden changes in the immediate surroundings. Gradual changes are caused by instrument calibration drifts or slow changes to the immediate microclimate. Sometimes, sudden and gradual changes are closely linked, and corrections for one should not be made without corrections for the other. Hansen et al.  present the hypothetical example of a station subject to urbanization that is then moved back to a rural location. Another example would be a station around which trees grew and were finally cut down. In both cases, correcting for one type of change without the other makes the climate record less reliable.
 Adjustments for sudden changes may be based on documented station moves or undocumented change points in a station record. The relocation correction algorithm applied to the original USHCN data set uses documented station moves and estimates a correction on the basis of differences with neighboring stations before and after the station move [Karl and Williams, 1987]. A similar approach was followed in the development of the global HCN data set [Peterson and Vose, 1997] and in the data set of Jones . DeGaetano  and Pielke et al.  note that this correction algorithm has the side effect of replacing any underlying true climate trend at a particular site with the climate trend from surrounding stations for the data segment over which the correction is computed. An example of this effect is shown in Figure 10. While this may well be the best objective hypothetical climate record available for the corrected station, it is not useful for estimating regional trends, as one wishes to accumulate as many reliable but independent data points as possible.
 These relocation adjustments were not designed to preserve the underlying climate trends. In contrast, Lu et al.  utilize a statistical technique that simultaneously finds the best fit parameters for sudden changes and long-term trends. Their method does not use neighboring data at all, but instead essentially finds the correction at times of known station moves that maximizes the fit of the climate record to a linear trend. Unlike the Karl and Williams  method, this technique will fully retain the climate trend if the data record consists only of sudden changes superimposed on a steady trend. However, slow climate variations, such as the cooling observed in most data sets between 1940 and 1970, can cause the Lu et al. technique to infer sudden changes where none exist. Suppose that a long-term climate record is homogeneous except for a station move in 1955 that in reality had no discernable impact on the station's climate record. The optimal fit to a linear trend over the past century is obtained by imposing a positive jump in the temperature record at 1955 so that the period 1940–1970 shows warming of a similar magnitude as the adjoining periods (Figure 11).
 Corrections for urbanization utilize a variety of approaches for determining the magnitude of the urban adjustment. Karl et al.  compute the urbanization adjustment by regressing data from the entire network against urban population. Given a random distribution of urbanized stations, this approach probably does not introduce a bias in the climate change estimate for the network as a whole, but regional trends will be biased by regional differences in the effect of urbanization. Hansen et al.  compute linear fits to the differences between urbanized stations and neighboring stations and apply the resulting time-varying corrections to the urbanized station data. This approach in principle can account for any source of gradual microclimate change, although it is only applied to stations where urbanization is presumed to be present. The Hansen et al.  correction is similar in effect to the homogeneity corrections applied to other data sets, in that it replaces the climate trend at a particular station with the climate trend observed at neighboring stations.
 The imposition of the climate trends from neighboring stations onto the climate record of stations subject to location or microclimate changes, as can happen with discontinuity corrections or urbanization corrections, should not alter the estimated regional or global trends. However, because the data are not independent, the accuracy of such estimates will be lower than expected. Furthermore, because the “corrected” data incorporates a weighted average of trends from neighboring stations, the variability of the climate trend measurements will be underestimated. These two effects, taken together, imply that existing estimates of the accuracy of global and regional climate change measurements based on surface data, such as Folland et al. , while essential, are likely to underestimate the true uncertainty. The artificially suppressed station-to-station variability in long-term climate variations in the USHCN data may be one reason why Vose and Menne , using USHCN data, estimated far fewer stations were needed to monitor climate change than did Janis et al. , who used a more complete and less homogenized source of climate data. The Jones  data set includes homogeneity corrections based on some stations with previous homogeneity corrections, further suppressing the independence of the data.
 Homogenization of the climate record also requires adjustments for changes in instrumentation. Considerable efforts to homogenize land surface temperature data have included, for example, an adjustment for nonclimatic influences on the air sampled by surface thermometers [Peterson et al., 1998a, 1998b]. Although it is likely that “the random component of such errors tends to average out in large area averages and in calculations of temperature change over long periods” [Hansen et al., 1999], one issue is whether or not the changes of instrumentation through history are randomly distributed on a spatial and temporal basis. One major concern is that the instruments used in observing surface temperature were upgraded generation by generation and region by region. In many cases the exposure of instruments and station sites are changed from year to year and from site to site.
 Early attention to this problem came from the former president of the Royal Meteorological Society, Edward Mawley [Mawley, 1897] and former chief of the Climatological Division of the U.S. Weather Bureau, P. C. Day [Flora, 1920]. Mawley concluded on the basis of his 5-year study that uniform instrument type and instrument exposure were required not only for one nation but also throughout the world after he identified air temperature biases among six different temperature exposures.
Flora's  results indicated that the mean daily range of temperatures (maximum minus minimum) in “Sun” shelters located in an open area were 3.6 to 4.4°F greater than for those located in the “shade” (located in the shade of trees but collocated within 150 feet of the “Sun” shelters and with free circulation of air) [see Flora, 1920, Figures 10 and 11] in the summertime when the trees were in full leaf and 0.8 to 1.8°F in the wintertime when the trees were bare. These findings were the result of analyzing over 2 years of continuous side-by-side measurements. The instruments used in these two experiments are very similar to the instruments used today, that is, the Cotton Region Shelter (CRS) along with liquid-in-glass maximum and minimum thermometers.
 About 100 years later, Gallo , Davey and Pielke , and Hubbard and Lin  examined similar issues and expressed similar concerns for the land surface temperatures used in assessing climate change. The Global Historical Climatology Network (GHCN) is a widely used monthly mean surface temperature data set, and it includes about 1200 USHCN stations, which are mostly rural, and about 370 U.S. first-order stations, which are mostly airport stations in the United States and U.S. territories in the Pacific Ocean [Hansen et al., 1999].
 The most significant change in surface temperature instrumentation in the USHCN took place in the middle and late 1980s when the standard CRS was replaced with the Maximum-Minimum Temperature systems (MMTS) (these stations represent over 60% of all USHCN stations). On the other hand, the most common temperature instrument changes at the U.S. first-order stations in the last century occurred with the successive usage of the CRS, the HY-06x hygrothermometer in the early 1960s, the HY-08x hygrothermometer in the mid-1980s, and the HO-1088 Automated Surface Observing System (ASOS) hygrothermometer in the mid-1990s (these stations will hereafter be referred to as ASOS stations). Among the ASOS stations, use was made of three different HY-06x series and two different HY-08x series which were deployed in sequence. Homogeneity adjustments do not fully address the effects of these two instrument series in the long-term time series of temperature. The GHCN includes the instrument change adjustments of CRS versus MMTS on the basis of the statistical results [Quayle et al., 1991]. No instrument change adjustments other than the MMTS adjustment were directly conducted in the GHCN and USHCN.
 In order to investigate the land surface temperature uncertainties associated with the instrument changes, two subsets of the USHCN monthly maximum and minimum temperature data were collected for a comparison of MMTS versus CRS and of ASOS versus CRS. For the MMTS comparison our selection of both CRS and MMTS stations was limited to those that had no station moves, no instrument height changes, and no instrument changes except for the MMTS during the period of 1970 to 2003. The data we used were from the USHCN data set that had been adjusted for time of observation (TOB) bias. The MMTS stations were also confined to stations that had a length of monthly time series on each side of the MMTS transition of more than 171 months.
 We found 163 CRS and 116 MMTS stations from the 1221 U.S. HCN stations that met our criteria. Figure 12 shows average difference between the MMTS and CRS stations. These values were detected using the 342 continuous monthly observations, in which the MMTS was installed in month 1. The method was the same as Quayle et al.  with an anomaly correlation coefficient weighted average for interpolating surrounding CRS stations. The weighting method stems from the standard normal homogeneity test (SNHT) [Peterson et al., 1998a]. The most probable discontinuity in the series was identified in this study as a change point if the instrument change to MMTS was coincident as confirmed by the metadata. Using this criterion, the number of stations identified as inhomogeneous were 34 for maximum temperature and 24 for minimum temperature. On the other hand, the numbers of stations identified as homogenous were 27 for maximum and 24 for minimum temperature series.
 The step changes due to the instrument changeover in Figure 12 are quite different in magnitude from the two constants −0.38 and +0.28°C universally applied in the GHCN or USHCN data sets, and they vary considerably. Exact agreement is not expected because only stations with significant change points are included here. The result suggests that some MMTS stations could have more than 1°C offsets in the inhomogeneous maximum temperatures but not necessarily in the minimum temperatures at the same stations. The reverse was found for the inhomogeneous minimum stations. Some MMTS stations did not show any detected discontinuities due to the MMTS changeover.
 For the ASOS stations, there are no instrument change adjustments in the GHCN data set although previous studies found that the original HO-083 hygrothermometers showed a significant warming bias up to 1 to 2°C in monthly mean temperatures [Gall et al., 1992; Jones and Young, 1995] due to an extra heating source inside the hygrothermometer and an insufficient aspiration rate. Figures 13 and 14 illustrate the paired station difference between the ASOS and CRS stations for raw and fully adjusted data in the monthly USHCN. The differences take on their own temporal pattern from paired station to paired station and from maximum to minimum temperature.
 The current adjustments made in USHCN did result in relatively small discontinuities in some segments; however, discontinuities clearly remain when there was a transition of instruments even in the fully adjusted difference series for both pair stations, not only by visual examination but also by the SNHT statistical method. The warming biases identified by Gall et al.  and Jones and Young  for the original HO-083 were not found at the Lafayette ASOS station and at the Winnemucca station (Figure 14).
 Similarly to the MMTS changeover [Hubbard and Lin, 2006], this result suggests that although the instrument change may be indicative of the position of discontinuity, the magnitude probably depends on both the instrument transition itself and the micrometeorologically significant changes associated with coincident site moves, which in some cases enhance the bias and sometimes cancel the bias depending on the specific microclimate variations in the vicinity of the ASOS station.
 There are therefore significant uncertainties introduced from each step of the homogenization adjustment. These likely vary geographically.
 Another recent paper on the issue of problems with data homogenization is that of Runnalls and Oke . They concluded that
“Gradual changes in the immediate environment over time, such as vegetation growth, or encroachment by built features such as paths, roads, runways, fences, parking lots, and buildings into the vicinity of the instrument site typically lead to trends in the cooling ratio series. Distinct régime transitions can be caused by seemingly minor instrument relocations (such as from one side of the airport to another, or even within the same instrument enclosure) or due to vegetation clearance. This contradicts the view that only substantial station moves, involving significant changes in elevation and/or exposure are detectable in temperature data.”
 In another paper, Changnon and Kunkel  examined discontinuities in the weather records for Urbana, Illinois; a site with exceptional metadata and concurrent records when important changes occurred. They identified a cooling of 0.17°C caused by a nonstandard height shelter of 3 m from 1898 to 1948, a gradual warming of 0.9°C as the University of Illinois campus grew around the site from 1900 to 1983, an immediate 0.8°C cooling when the site moved 2.2 km to a more rural setting in 1984, and a 0.3°C cooling in a shift to MMTS in 1988. In this case the magnitude of the discontinuities could be accurately determined from concurrent observations rather than from nearby stations. The experience at the Urbana site reflects the kind of subtle changes described by Runnalls and Oke  and underscores the challenge of making adjustments to a gradually changing site.
7. Degree of Independence of Land Surface Global Surface Temperature Analyses
 The raw surface temperature data from which all of the different global surface temperature trend analyses are derived are essentially the same. The best estimate that has been reported is that 90–95% of the raw data in each of the analyses is the same (P. Jones, personal communication, 2003). That the analyses produce similar trends should therefore come as no surprise. Indeed, this overlapping of raw data between different analyses of multidecadal surface temperature trends is an issue which has not received adequate scrutiny with respect to the value added of more than one analysis.
 Surface station density is an important factor in the monitoring of surface temperature trends [Janis et al., 2004], as it relates directly to the independence of the aforementioned surface temperature trend analyses. In particular, the robustness of these separate trend analyses decreases as station density decreases as evaluated on a global 5 degree by 5 degree grid (C. A. Davey and R. A. Pielke Sr., Comparing station density and reported temperature trends for land surface sites, submitted to Climatic Change, 2007; hereinafter referred to as Davey and Pielke, submitted manuscript, 2007). The highest station densities are found over the contiguous United States, Europe, and portions of East Asia [Vose and Menne, 2004; Davey and Pielke, submitted manuscript, 2007]. Tropical regions, however, have sparser surface station coverage, so the robustness of warming estimates in these regions is relatively small until further surface temperature data can be obtained. Inadequate sampling of tropical land areas might be a significant factor in the CCSP report (2006) finding that “the majority of observational data sets show more warming at the surface than in the troposphere…” while “all model simulations show more warming in the troposphere than at the surface.”
 Additionally, the northern and southern polar regions, where some of the largest warming trends are projected to occur, have some of the lowest station densities (Figure 15). This sparse sampling reduces the robustness of the temperature trend analyses for these regions. In large portions of the polar regions, there are no surface stations providing independent verification of modeled temperature trends.
 The degree of robustness among the available surface temperature trend analyses also varies as a function of continent, as indicated by continental variations in station density (Figure 16). For example, although trend analyses are likely robust in portions of North America and Europe (Figures 16a and 16c) because of higher station densities, robustness could be questioned in portions of Africa and Asia, where significant portions of these regions have little or no surface station coverage.
8. Relationship Between In Situ Surface Temperature Observations and the Diagnosis of Surface Temperature Trends From Reanalyses
 An independent methodology can be used to further assess the ability of in situ surface air temperature trend data to robustly assess spatially representative multidecadal surface temperature trends. Kalnay and Cai  (hereafter referred to as KC) introduced an alternative approach, namely, “observation-minus-reanalysis” (OMR) method, to estimate the impact of urbanization and land use. The rationale for this approach is that a reanalysis (a statistical combination of a 6-h forecast and observations), such as the NCEP-NCAR reanalysis [Kalnay et al., 1996], is not sensitive to surface observations over land. In the computation of land surface energy transfers the model would “forget” very rapidly surface temperature observations, which are not explicitly used in reanalysis data assimilations.
 The reanalysis, combined with its model parameterizations of surface processes, creates its own estimate of surface fields from the upper air observations. Furthermore, because of the role of horizontal advection and information propagation by the global 6-h forecast used as a first guess in the assimilation of observations, the surface parameters in a reanalysis have less dependence on local characteristics than the actual surface observations. As a result, the reanalysis is not able to include surface urbanization or land use effects even though it should show climate change effects to the extent that they affect the observations above the surface [Kistler et al., 2001]. Moreover, Cai and Kalnay  showed that a reanalysis made with a frozen model (as in the case of the NCEP/NCAR reanalysis) can still detect an anthropogenic trend present in observations assimilated by the reanalysis system essentially at its full value provided abundant observations are used by the data assimilation system. It follows that it would be possible to attribute the differences between monthly or annually averaged surface temperatures derived from observations and from reanalysis primarily to urbanization/land use changes [Kalnay et al., 2006], and other local land cover effects, although a portion of differences might also be due to errors in interpolating reanalysis data to instrument height, particularly in the stable nocturnal boundary layer.
 1. Surface temperature anomalies averaged over the Northern Hemisphere (NH) derived from three reanalyses (ERA40, NNR, and NNR II (R2)) and two observations show a gradual warming trend both for reanalyses and observations (Figure 17). It is also evident that the observations exhibit a larger warming trend compared to reanalyses (Figure 17a). As a result, OMRs show a positive trend (Figure 17b), with a larger trend using NNR or R2 than ERA40. The smaller difference between the ERA40 and the observations than that between the NNR and observations arises from the fact that the ERA40 includes the radiative effect of increasing CO2 and also indirectly assimilates surface air temperatures, by using them to initialize the soil temperature and moisture, whereas NNR does not (therefore ERA40 does not permit an independent assessment of the surface temperature trends).
 2. Scatterplot of the decadal OMR trends at each grid with the annual mean Normalized Difference Vegetation Index (NDVI) [Bounoua et al., 1999] show that the decadal OMR trend (Figures 18c and 18e), respectively, is inversely proportional to the NDVI (r = −0.32, r = −0.67), demonstrating that the strong (weak) surface temperature increase response to the surface barrenness (greenness) is adequately represented by OMR [Lim et al., 2007]. The reason for the clearer response for GHCN – NNR is the absence of surface information in the NNR data assimilation procedure. However, decadal trends in GHCN observation (Figure 18a) show no significant relationship with the NDVI (r = −0.07), presumably because they reflect all climate change signals.
 The trend in NNR (Figure 18d) is significantly proportional to the vegetation index (r = 0.56), indicating that it is missing the relationship of a strong (weak) surface temperature increase response to barrenness (greenness) with low (high) vegetation index [Xue and Shukla, 1993; Dai et al., 2004; Hales et al., 2004]. This lack of reproduction of surface climate change signal is also present to a lesser extent in ERA40 (r = 0.17) (Figure 18b).
 3. The decadal OMR trends (Figure 19b) [Lim et al., 2005] as a function of land type, as obtained from Moderate Resolution Imaging Spectroradiometer (MODIS) [Friedl et al., 2002], and the trend values summarized in Table 2 [Lim et al., 2005] show that the order of magnitude in warming trends is barren area (≥0.3°C/decade) > big urban area (0.2 ∼ 0.25°C/decade) > small urban area ≈ agricultural area ≈ mixed forest (∼0.2°C/decade) > broadleaf forest (<0.1°C/decade). OMR trends reproduce the conclusion of previous modeling works that the barren or urban surface with limited soil moisture exerts a strong surface temperature increase response because of a weaker evaporative cooling process [Hales et al., 2004; National Research Council (NRC), 2005] whereas the highly vegetated areas such as low-latitudinal broadleaf forest do not because of the high soil moisture and strong transpiration/evaporative cooling.
Table 2. Decadal OMR Trends (GHCN – NNR, CRU – NNR) Averaged Over Five Land Cover Categoriesa
Land Cover Category
GHCN – NNR, °K/decade
CRU – NNR, °K/decade
From Lim et al., 2005. OMR: observation-minus-reanalysis; GHCN: Global Historical Climatology Network; NNR: NCEP/NCAR; CRU: Climatic Research Unit (CRU).
Big urban area
Small urban area
Broadleaf forest area
 Consequently, the overall results strongly support that the OMR does reflect the impact of different land cover types on surface climate change. As a result, the OMR approach facilitates isolating the impact of independent land cover types on long-term surface temperature trends by removing the large-scale temperature change signal as recorded in the reanalysis from the surface observation.
 The North American Regional Reanalysis (NARR) (http://www.emc.ncep.noaa.gov/mmb/rreanl/) [Mesinger et al., 2004] can be used to further assess the spatial representativeness of the observed surface temperature data. There is evidence that the surface data has large local influences, as reported at http://wwwt.emc.ncep.noaa.gov/mmb/rreanl/narr.ppt#296,15 in slide 15 where their reanalysis deteriorated significantly when 2-m air temperatures were assimilated (F. Mesinger, personal communication, 2004). One explanation for this deterioration is that the 2-m temperatures are not spatially representative.
 This can be seen in the example of 1979–2004 trends in the 2-m NARR surface temperatures over the United States (Figure 20a). The trends possess regional variations but do not include known local-scale variations or strong gradients that occur in small areas, especially in the western United States (Figure 20a), such as documented by Pielke et al. [2002, 2007]. The spatial changes obtained by computing the differences over 2 decades (Figure 21a) show the same patterns. As expected, the 700-mb and 500-mb trend distribution depict more uniform patterns (Figures 20b, 20c, 21b, and 21c). In addition, while an increasing trend is observed for all three levels, the 2-m anomalies exhibit a lower trend during the last decade, as expressed by the 1979–1990 and the 1991–2004 trends (Figure 22).
 The NARR data set appears to be coherent, with increasing temperature trends observed at all levels, but the trends in the 2-m surface temperature exhibit greater spatial variability and larger amplitudes compared to the 700 mb and 500 mb temperatures (Figures 20a, 20b, and 20c and Table 3). In addition, some prominent features of the 2-m trends, such as the lowest values over Colorado and Montana and the strong gradient over central California, are also found at the 700 mb level. The spatially averaged temperature anomalies at all three levels (Figures 22–24) are characterized by generally well-correlated interannual fluctuations. The intermediate 700 mb level is well-correlated with both 2 m (correlation coefficient: 0.82) and 500 mb (0.89).
Table 3. 1979–2003 NARR Temperature Trends Over the United States: Value Range and National Averagea
Trend Value Range
National Average, United States
NARR: North American Regional Reanalysis. Units are °C/10 years.
Temperature at 2 m
Temperature at 700 mb
Temperature at 500 mb
 To examine the variation of the linear trends over time and their consistency, linear trends for the 10-year running windows are computed for all levels (Figure 25). At the 2-m level it appears clearly that the large negative trends of the early 1990s account for the low value of the 1991–2004 trend mentioned earlier (Figure 22). At the 700 mb level the trends for the 10-year running average windows are much less variable on a decadal timescale. Regarding the partial trends, there is still a clear contrast between the relatively low values of the first decade and the peak of the late 1990s (up to 0.86°C), which illustrates that the 1991–2004 period has a value more than 3 times higher than that of the previous period. The 500 mb level exhibits similar patterns. Overall, Figure 25 not only illustrates the variation of the different linear trends over time but highlights the correlation of anomalies at different levels. The 2-m temperatures do have larger temporal variations in the earlier part of the record.
 To evaluate the influence of the June 1991 to June 1993 Pinatubo volcanic eruption, this time period is removed from the time series. The cooling trend that has been observed during this period has been explained by the release of a huge amount of sulfur dioxide in the atmosphere [Parker et al., 1996]. The removal of this time period results in higher trend values (Table 4), especially at the 2-m level (a 16.18% increase for the overall period and a 57% increase during the 1993–2004 period). This confirms the robustness of the NARR data set, which was able to capture the cooling trend at all three levels, as shown by the related negative anomalies in Figures 22–24.
Table 4. Temperature Anomaly Trends: Comparison Between the Full 1979–2004 Period and the 1979–2004 Minus the June 1991 to June 1993 Volcanic Episode (Pinatubo Cooling)a
Full Period°C/10 years
The volcanic episode has also been removed from the partial trends. Full period refers to the whole study period: 1979–2003, and its subperiod values refer to 1970–1990 and 1991–2004. Pinatubo refers to the whole period minus the Pinatubo volcanic episode, and its subperiod values refer to January 1979 to May 1991 and July 1993 to December 2004. In all cases the removal of the volcanic episode results in a positive change (higher trend values).
 The comparison between HCN surface temperature observations and the NARR 2-m temperature across the continental United States is a recommended next step in this analysis and will be reported in a subsequent study.
9. Influence of Land Use/Land Cover Change on Surface Temperature Trends
 With the exception of urban effects the influence of land use/land cover (LULC) change on surface temperature trends has been largely overlooked in multidecadal assessments. The influence of local landscape on surface temperature observations can be significant even without landscape change. As shown, for example, by Hanamean et al. , variations in the amount of transpiring vegetation through the growing season can affect the observed minimum and maximum temperatures. Hanamean et al. found that the percent of variance in surface temperature explained by variations in the amount of transpiring vegetation increased by a mean of 6% for the maxima and 8% for the minima over the period March–October when the amount of green vegetation was quantitatively included in the analysis.
Christy et al. , in an extensive and detailed analysis, showed that temperature trends in California varied significantly by region evidently due to land use changes. Comparison of trends between the central valley, which underwent major land use change, and those in the foothills and Sierras, with less land use change, showed marked differences. Central valley temperatures had significant nocturnal warming and daytime cooling over the period of record. The conclusion is that as a result of increases in irrigated land, daytime temperatures are suppressed due to evaporative cooling, and nighttime temperatures are warmed in part due to increased heat capacity from water in soils and vegetation. Mahmood et al. [2006b] also found similar results for irrigated and nonirrigated areas of the northern Great Plains.
 This issue is examined further using the U.S. Climate Normals of temperature and precipitation [National Climatic Data Center (NCDC), 2002] data. This is the data set of climatological values of temperature and precipitation for the most recent 30-year interval (presently, 1971–2000). In addition to defining “normal” (meaning “average” [Pielke and Waage, 1987]) temperatures for stations included in the data set, the temperature data are critical to the development of several derivative data sets, including frost/freeze probabilities and heating and cooling degree days.
Hale et al.  examined temperature trends at the normals stations before and after periods of dominant LULC change. The analysis included temperature data for stations near sample blocks in which LULC has been determined for five dates during the period 1973 to 2000 as part of a study of LULC trends (USGS Land Cover Trends Project) within the conterminous United States [Loveland et al., 2002]. The normals temperature data set includes several adjustments “made to the data before the normals were calculated” [NCDC, 2002]. These adjustments include those for time of observation biases [Karl et al., 1986] and quality control [Peterson et al., 1998a]. Additionally, temperature inhomogeneities in the data arising from changes in station location or instrumentation have been addressed based on methods described by Peterson and Easterling  and Easterling and Peterson .
 Eight additional ecoregions have been analyzed beyond the original 23 (out of 84 ecoregions within the conterminous United States) of the Hale et al.  analysis. With these additional eight ecoregions an additional 76 normals stations were analyzed that intersected Trends Project LULC analysis blocks. The results from the additional eight ecoregions and 76 normals stations (Table 5) are very similar to the previous results reported by Hale et al. .
Table 5. Comparison of Temperature Trend Results for Those Stations Included by Hale et al.  With an Additional Eight Ecoregions and 76 Additional Normals Stations Included in the Analysisa
Trend Prior to LULC Change
Trend After LULC Change
The number of stations with significant trends (positive or negative), prior to or after land use/land cover (LULC) changes, are indicated.
 Temperature trends were primarily insignificant prior to the period during which the greatest single type of LULC change occurred around normals stations. Additionally, those trends that were significant were generally divided equally between warming and cooling trends (Table 5). However, after periods of dominant LULC change, significant trends in minimum, maximum, or mean temperature were far more common, and 90% or more of these significant trends were warming trends.
 The average temperature trend for the stations with significant trends in mean temperatures prior to dominant land cover change was 0.08°C/decade, while the trend after the dominant change in land cover was 1.58°C/decade (Figure 26). These results are similar to those observed by Hale et al.  and affirm the possibility that nearby changes in LULC may be influencing the temperature trends observed at normals temperature stations.
 The impact of land surface changes on the daytime maximum temperatures was further diagnosed by developing sensitivity experiments with a coupled land surface boundary layer model. These experiments included the impact of change in the vegetation cover and density as well as the impact of drought or soil moisture availability at the monitoring site. The model was tested for sites over 10 soil textures (Clay Loam: CL; Loam: Lm; Loamy Sand: LS; Sand: Sd; Sandy Clay: SC; Sandy Clay Loam: SCL; Sandy Loam: SL; Silty Clay: SIC; Silty Clay Loam: SICL; and Silty Loam: SIL). For each of the sites/soil textures, vegetation cover, Leaf Area Index (LAI), and soil moisture were systematically varied. Vegetation cover varied as 10%, 40%, 80%, and 100%. The LAI was varied from 0.5, 2, 4, and 6. The soil moisture availability was varied from 10%, 33%, 67%, and 90% of field capacity (which depended on soil texture). The results from these 120 model runs are summarized in Figures 27a–27c. Model results further confirm that the maximum daytime temperature is highly sensitive to the soil texture, the availability of soil moisture, and the vegetation cover or density. Depending on the soil texture, the air temperature can vary by 2–3°C for each of the variables. With increasing vegetation the emissivity of the landscape as well as the albedo and soil moisture availability changes and further leads to modified daytime temperatures.
10. Discussion and Conclusions
 This paper has identified a range of issues with the use of the existing land surface temperature data to assess multidecadal trends in surface air temperature. Since the analyses from such data are so important in national and international assessments of climate change (e.g., see CCSP report (2006) and National Research Council (NRC) ), the issues that we discuss in this paper need to be evaluated in depth.
 These issues, which are either not recognized at all in the assessments or are understated, include the identification of a warm bias in nighttime minimum temperatures, poor siting of the instrumentation to measure temperatures, the influence of trends in surface air water vapor content on temperature trends, the quantification of uncertainties in the homogenization of surface temperature data, and the influence of land use/land cover change on surface temperature trends. The degree of independence of the different analyses (e.g., GISS; NCDC; and the UK Met Office) also needs to be quantified. The evaluation of the relationship between in situ surface temperature observations and the diagnosis of surface temperature trends from reanalyses will also permit a quantitative evaluation of the accuracy of the surface temperature trends in diagnosing the lower tropospheric temperature trends.
 A major conclusion is that, as a climate metric to diagnose climate system heat changes (i.e., “global warming”), the surface temperature trend, especially if it includes the trend in nighttime temperature, is not the most suitable climate metric. As reported by Pielke , the assessment of climate heat system changes should be performed using the more robust metric of ocean heat content changes rather than surface temperature trends. If temperature trends are to be retained in order to estimate large-scale climate system heat changes (including a global average), the maximum temperature is a more appropriate metric than using the mean daily average temperature. This paper presents reasons why the surface temperature is inadequate to determine changes in the heat content of the Earth's climate system.
 This work was supported by USGS contract 04CRAG0032, F/DOE/ University of Alabama in Huntsville's Utilization of Satellite Data for Climate Change Analysis (DE-FG02-04ER 63841 through the University of Alabama at Huntsville), and NASA grants THP NNG04GI84G, IDS NNG04GL61G, LULC NNG04GB87G, and NSF-ATM 0233780. Partial support for this work was provided by NSF grant ATM-0417774 at the University of Alabama in Huntsville. We thank Jim Angel of the Illinois State Water Survey for his very valuable comments on our paper. Craig Loehle is also acknowledged for his input. We thank Dallas Staley for her outstanding contribution in editing and finalizing the paper. Her work continues to be at the highest professional level.