Sensitivity of the attribution of near surface temperature warming to the choice of observational dataset



[1] A number of studies have demonstrated that much of the recent warming in global near surface temperatures can be attributed to increases in anthropogenic greenhouse gases. While this conclusion has been shown to be robust in analyses using a variety of climate models there have not been equivalent studies using different available observational datasets. Here we repeat the analyses as reported previously using an updated observational dataset and other independently processed datasets of near surface temperatures. We conclude that the choice of observational dataset has little impact on the attribution of greenhouse gas warming and other anthropogenic cooling contributions to observed warming on a global scale over the 20th century, however this robust conclusion may not hold for other periods or for smaller sub-regions. Our results show that the dominant contributor to global warming over the last 50 years of the 20th century is that due to greenhouse gases.

1. Introduction

[2] The Intergovernmental Panel on Climate Change (IPCC) reviews of the scientific literature strengthened its confidence that an increase in greenhouse gas concentrations had caused most of the observed global warming over the previous 50 years from ‘likely’ [Mitchell et al., 2001] in 2001 to ‘very likely’ [Hegerl et al., 2007] in 2007. The latest review also deduced that the anthropogenic warming was detectable in a growing number of observations of climate change. Much of the evidence for the IPCC's assessment was from detection and attribution studies [International Ad Hoc Detection and Attribution Group, 2005; Hegerl et al., 2007] which compared observations of changing climate with climate model simulations to deduce the detectability of different factors in the data. One of the most robust analyses of climate change has been on near surface temperature change (TAS). The length, coverage and relative accuracy of observations of TAS has made them very attractive for analysis and such studies were amongst the first to detect and attribute a human influence on climate change. The influence of greenhouse gas, sulphate aerosol (and other anthropogenic factors) and natural forcings have been attributed in 20th century changes in TAS. These analyses [Hegerl et al., 2007, and references therein] have almost universally used versions of one dataset of TAS, HadCRUT [Brohan et al., 2006].

[3] Datasets of observed quantities often contain uncertainties and observations of TAS are no exception. Errors in measurements and gridbox sampling, instrumental biases, and changes in the global coverage all contribute to observational uncertainties [Brohan et al., 2006]. One study [Hegerl et al., 2001] investigated the impact of gridbox sampling error - changes in the number of observations in a gridbox - and concluded that the impact on the detection of an anthropogenic influence on temperature trends was small. While other datasets of global TAS have been constructed from different institutions, using a variety of analyses methods, there has not been a methodical analysis of the impact on detection and attribution studies of the choice of observational dataset.

[4] A number of studies and assessments using HadCRUT and the same model simulations detected greenhouse gases, other anthropogenic and natural influences on 20th century TAS changes [Stott et al., 2003a, 2006a; Huntingford et al., 2006; Hegerl et al., 2007, section 9.4.1] (referred to in this study as S03, S06, H06 and IPCC07 respectively or collectively as the HadCM3 detection and attribution (HDA) studies). In this study we repeat the analysis in the HDA studies but using the most up-to-date version of HadCRUT and other available datasets of TAS to explore the impact on the detection results of the choice of dataset.

2. Data

[5] In this study four globally gridded datasets of near surface temperatures are examined. The institutions of the Climate Research Unit (University of East Anglia, UK) and the Hadley Centre (Met Office, UK) have been producing a blended dataset of near surface land and ocean surface temperatures for a number of years [Parker et al., 1994; Jones et al., 1999, 2001; Jones and Moberg, 2003; Brohan et al., 2006]. These datasets have been called ‘HadCRUTn’, with ‘n’ signifying the version of the dataset. The current dataset is called HadCRUT3 [Brohan et al., 2006] and is blended from land near surface temperatures, CRUTEM3 and sea surface temperatures, HadSST2 [Rayner et al., 2006]. In this study we use a variant of the dataset, HadCRUT3v, which is adjusted to correct for changes in variance due to the number of observations within a gridbox not being constant with time. Three other up-to-date gridded datasets of global blended land and ocean surface temperatures are widely used in climate research, GISS [Hansen et al., 2006] produced by NASA Goddard Institute for Space Studies, NCDC [Smith et al., 2008] produced by NOAA's National Climatic Data Center and JMA [Ishii et al., 2005; Japan Meteorological Agency, Global average surface temperature anomalies, 2010, available at] produced by the Japan Meteorological Agency. A variety of sources of station, ship and buoy measurements are used by the datasets, with many of the observations being used in all four datasets. The datasets differ in how the raw data has gone through quality control, homogenization adjustment and bias correction as well as the processing of the data to a gridded product. For instance HadCRUT3 and JMA have areas of missing data where no observations are available, while GISS use data within 1200 Km to calculate a value at a grid point and NCDC use large area averages from low frequency components of the data and spatial covariance patterns for the high frequency components. A summary of the main differences between the HadCRUT3, GISS and NCDC datasets is given by Kennedy et al. [2010] (see also the references for the datasets). Using these four datasets will partially sample the range of uncertainties associated with the measurements and how they are processed.

[6] The global annual mean temperature variations for all four datasets are given in Figure 1 showing the familiar long term warming over the last 100 years or so. While there are strong similarities between the datasets there are also some differences, for instance the largest warming trend between 1900 and 2009 is 0.81K/century for HadCRUT3v and the smallest is 0.69K/century for GISS (Figure 1). For the analysis in this study annual means (1/12/year to 30/11/year + 1) of all the observational datasets were estimated from the monthly means, with each grid point requiring at least 75% of the months to be available for an annual mean to be calculated, otherwise the grid point is set to the missing data value. For the following analysis 10 year means covering the decades 1900–1909, 1910–1919 etc were calculated, allowing at most 50% of the annual mean data to be missing in a grid point for a decadal mean to be calculated.

Figure 1.

Global annual mean observed near surface temperature anomalies, shown with respect to 1961–1990 period. See main text for description of datasets. Also shown are temperature trends for the 1900–2009 period for the different datasets.

[7] The climate model simulations used here have been used in a number of detection and attribution studies [Tett et al., 2002; Jones et al., 2003; Stott et al., 2003a, 2006a, 2006b] and many other studies. HadCM3 is an atmosphere-ocean coupled model [Stott et al., 2000, and references therein], which has produced ensembles of simulations for the 1860–1999 period for different combinations of forcing factors applied to the model. TAS from three sets of simulations are used, each with four initial condition ensembles members; GHG - changes in historic well mixed greenhouse gas concentrations only, ANT - changes in historic anthropogenic factors (greenhouse gas, sulphate aerosol and ozone) and NAT - changes in natural factors (solar irradiance and stratospheric volcanic aerosol). A simulation, 2800 years in length, with no changes in external forcing, CTL, is also used. The simulations and the forcing factors applied have been described in depth in previous studies [Stott et al., 2000; Tett et al., 2002] and global mean changes are described by Tett et al. [2002, Figure 3].

3. Detection and Attribution Analysis

[8] We carry out similar analyses to those reported in the HDA studies. The methodology of optimal detection has been extensively covered in these and other studies. Scaling factors for different forcing factors are deduced by regressing observed changes (the response variable) against estimates of the forced climate changes (the explanatory variables), allowing for noise in both. The spatiotemporal patterns are filtered, by projecting onto an estimate of the leading orthogonal modes of internal climate variability, and then optimized to produce estimates of the noise-free patterns or ‘optimal fingerprints’ to be used in the regression. A full description of the methodology used here is given by Allen and Stott [2003] and Stott et al. [2003b].

[9] We use the first 1400 years of HadCM3 CTL for estimating the orthogonal modes of variability (empirical orthogonal functions, EOF) ordered in magnitude of variance, and the remaining years for uncertainty analysis. The period 1/12/1899 to 30/11/1999 (1900–1999) is examined, with all the data smoothed onto ten year means, re-gridded and masked by the HadCRUT3v dataset coverage, with anomalies taken with respect to the whole period mean. All the data was projected onto spherical harmonics (T4) to reduce the spatial and temporal dimensions. All the data was projected onto the EOFs truncated to the leading 21 EOFs (deduced from the number of independent segments of 100 years in length in the 1400 years of control times 1.5 [Allen and Tett, 1999]) and then optimized by dividing by the variances of the leading EOFs.

[10] As in the HDA studies we do a three way regression, allowing a linear combination of the scaling factors of the available forced responses (GHG, ANT and NAT), to produce scaling factors for greenhouse gas alone (G), other anthropogenic (OA) and natural (N) influences [see Tett et al., 2002]. The residual of the regression is tested to see whether it is consistent with an estimate of internal climate variability, thus warning of under or over-fitting within the regression. For the following results the residual tests are passed.

[11] The best estimates for the scaling factors and the 5–95% ranges are shown in Figure 2. While not identical, the G, OA and N scaling factors are largely consistent with what has been presented in the HDA studies (auxiliary material). All three forcings combinations are detected, i.e. the scaling factor uncertainties do not span 0. Both G and OA have best estimates that are somewhat larger than 1 but only OA, with its larger uncertainty range, is consistent with a value of 1. The scaling factor for N has values consistent with 1. The actual values of the scalings have some sensitivity to the choice of truncation of the EOF space but the detections are robust across a wide range of the truncations (auxiliary material). The impact of making different choices within the analysis can be seen by comparing the reported scaling factors in the HDA studies where, for instance, the best estimates for the scaling factors for N vary between 1.32 and 1.86. In those reported results the same HadCM3 model data, similar observational datasets and same methodology were used in the analysis, but different choices were made in how the EOF patterns was created and what EOF truncation was used. The lack of consistency of the scaling factor of G with a value of 1, found in this study in the regression with HadCRUT3v, is interesting in light of the scaling results in the HDA studies, which are all consistent with 1. Scaling factors greater and inconsistent with a value of 1 can suggest one of a number of issues, such as that the model is under responsive to the forcing or that some uncertainties in the dataset are not accounted for [Allen et al., 2006]. When the analysis is repeated with the same version of the dataset used in the older studies, HadCRUT2v [Jones and Moberg, 2003], we find detections for all three signals with G having scaling values consistent with 1 and with a best estimate much closer to that presented in the HDA studies. This suggests that the difference in the result is down to the use over HadCRUT2v of HadCRUT3v when substantial improvements were made to the marine data, which has caused some differences in the temperature patterns (auxiliary material).

Figure 2.

Detection scaling factors for G (red), OA (green) and N (blue) when regressed against the four different observational datasets for the 1900–1999 period as described in the main text. Truncation of EOF space = 21. Best estimate for scaling factor shown with 5–95% limits.

[12] We repeat the regression analysis on the three remaining observational datasets, GISS, NCDC and JMA (Figure 2), re-processing the model data to have the same spatial coverage as the observational dataset being compared with. These analyses agree with the results for HadCRUT3v in finding that all signals are detected except in the case of N for the NCDC dataset. All the signals for each of the analyses have scaling factors consistent with a value of one. While the best estimates of the scaling factors vary between observational datasets, the uncertainty ranges are similar. The results are also largely insensitive to a range of analysis choices (auxiliary material). A sensitivity analysis of the 1950–1999 period (as also looked at by S06), gives consistent results for the datasets apart from GISS, although limiting the datasets to have the same coverage as HadCRUT3v suggest this result may be due to differences in the TAS patterns in areas where there is limited or no direct observations (auxiliary material).

4. Temperature Reconstructions

[13] Estimates of the attributed warming can be deduced from the scaling factors produced by the detection analyses [see Tett et al., 2002; Stott et al., 2006a]. Figure 3a shows the temperature trends for each observational dataset, after projection onto the truncated EOFs, and the related attributed temperature trends for each forcing combination examined for the 20th century (bearing in mind that the use of trends here are as summary statistics, not suggesting that the quantities being examined are purely linear in nature over the period examined). Each of the analyses on the different observational datasets suggest attributed warming of G around 0.1K per decade, greater than the observed warming trend during the 20th century of about 0.06K/decade. This G warming is partly offset with cooling from OA of around −0.06K/decade. N has very little contribution to the overall trend, despite being detected, as it has a slight warming followed by a cooling over the period. The contributions of all the forcings while consistent with the observed trend add up to a slightly lower warming, about 0.05K/decade.

Figure 3.

(a) Attributed global mean temperature trends for 1900–1999 contributing to each observational dataset using the scaling factors shown in Figure 2 and after being projected onto the truncated EOF space. Estimates of the attributed trends (represented with the asterisk symbols) of G (red), OA (Green), N (blue) and sum of the scaled contributions (turquoise). The 5–95% limits of the attributed trends are deduced from the uncertainties in the scaling factors. Trends in the observations are also shown (black asterisk symbol) with 5–95% uncertainty range representing estimate of internal climate variability deduced from model control. Also shown are trends, before the projection onto the truncated EOFs, before (dotted lines - only for the model simulations) and after (diamond symbols) being masked by the observational coverages. (b) Same as Figure 3a but showing the temperature trends for the sub-period of 1950–1999, while using the scaling factors in Figure 2.

[14] The process of projecting onto the truncated EOFs can lose some of the weaker spatiotemporal patterns and reduce some of the variability in the observations and simulations. Figure 3a shows that the trends of the observational data before being projected onto the EOFs are very similar to the trends of the data after being projected onto the EOFs. This suggests little of the variability was lost in the projection process. The masking by the observational coverage does have an impact on the model simulations (Figure 3a). For G the original trend is 0.105K/decade, but this reduces to 0.092K/decade when the data is limited to the coverage of HadCRUT3v. There is a smaller reduction for G to 0.102K/decade when limited to the GISS observational coverage. In contrast the changes in the OA trend are negligible (Figure 3a) reflecting that the spatial pattern of change of other anthropogenic forcings, predominantly sulfate aerosols, is concentrated where observational coverage across the datasets is best, but greenhouse gas warming has a high latitude component that means the global warming is sensitive to masking out areas like the Arctic where there are few direct observations and where GISS extrapolates changes.

[15] Using the scaling factors from the 100 year analysis, the contributions to temperature changes over 1950–1999 can be deduced [Stott et al., 2006a]. The increased magnitude of the trends in both G and OA (Figure 3b) reflect the larger contribution to the twentieth century climate change over the latter half of the period from those forcing factors. There are some differences in the attributed temperature trends for the HadCRUT3v analysis compared with the equivalent trends given in S06, H06 and IPCC07 for the HadCM3 analyses, with the magnitude of the G warming and OA cooling being slightly larger in this study. However, a consistent interpretation of all these results is that G warms more than the observations and both OA and N cool. For the analyses on the GISS, NCDC and JMA datasets a similar picture emerges. Our results show that previous conclusions that most of the global warming over the last 50 years of the 20th century is very likely due to greenhouse gas forcing and that greenhouse gas forcing alone would likely have resulted in greater than observed warming during that period appear to be robust to uncertainties inherent in deriving global temperature datasets.

5. Conclusions

[16] The analyses of previous detection and attribution studies looking at 20th century near surface temperature change were replicated, using the latest dataset of global TAS produced by the Met Office Hadley Centre and UEA's Climate Research Unit. While not identical, due to the differing details of the analysis, the results were very similar and consistent with the understanding of the choices made - with the detection of greenhouse gas, other anthropogenic and natural influences on 20th century TAS variations. The analysis was repeated on 3 other datasets of TAS produced by other institutions and all three contributions were also detected with very similar scaling factors. What differences there are may partially be caused by the datasets differing spatial coverages focusing on differences in the relative spatial contributions from the different anthropogenic forcings. The observational datasets are not independent of course as while there are differences in the processing of measurements, many of the original observations are shared between the datasets.

[17] Using different datasets do not change the conclusions that anthropogenic greenhouse gases are the major driver of TAS change over the 20th century. Differences in how raw observational data of TAS are processed to produce global datasets do not cause a major change in the assessments of a warming world [Jones and Wigley, 2010] over the last 100 or so years. HadCRUT3 was produced with a sophisticated error model that incorporated estimates of station, grid-box sampling error, biases and spatial coverage uncertainties [Brohan et al., 2006]. Further work is planned to include this error model to investigate the impact of those errors on detection and attribution analyses. New updates to the HadCRUT datasets are being constructed to include corrections to sea surface temperatures [Kennedy et al., 2011] and plans are in progress to attempt to better quantify the uncertainties in land near surface temperatures [Stott and Thorne, 2010]. We have not assessed how much impact the choice of dataset would have on variants of the analysis described here, such as using different models, period or region of analysis [Stott et al., 2010]. A review of history of tropospheric temperature reconstructions [Thorne et al., 2010] recommended that structural uncertainty in the datasets, as reflected in the differences between different reconstructions, should be taken into account with other types of uncertainty when being compared with models. It would be wise to also follow such a recommendation for near surface temperature studies.


[18] The authors would like to thank the institutions concerned for the work done to produce the observational datasets, in particular Phil Jones at UEA. We are also grateful to John Kennedy for help accessing the observational data and for very helpful discussions, and to the two anonymous reviewers for their comments. This research was supported by the Joint DECC and Defra Integrated Climate Programme, DECC/Defra (GA01101).

[19] The Editor thanks the two anonymous reviewers.