SEARCH

SEARCH BY CITATION

Keywords:

  • climate model evaluation;
  • model selection;
  • climate change impacts assessment;
  • climate models

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[1] Reducing uncertainty in climate projections can involve giving less credence to Atmosphere-Ocean General Circulation Models (AOGCMs) for which the simulated future climate is judged to be unreliable. Reliability is commonly assessed by comparing AOGCM output with observations. A desirable property of any AOGCM skill score is that resulting AOGCM-performance rankings should show some consistency when derived using observations from different time periods. Notably, earlier work has demonstrated inconsistency between rankings obtained for 20-year periods in the 20th century based on global and regional comparisons of simulated and observed near-surface temperature anomalies. Here, we demonstrate that AOGCM-performance rankings derived from actual temperatures, which incorporate AOGCM biases in climatological means, can be used to identify AOGCMs that perform consistently well or poorly across multiple 20-year periods in the 20th century. This result supports the use of comparisons of simulated and observed actual values of climate variables when assessing the reliability of AOGCMs.

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[2] A significant source of uncertainty in the projections of future climate that underpin most assessments of the impacts of climate change arises because different Atmosphere-Ocean General Circulation Models (AOGCMs) simulate different regional climates under the same greenhouse gas forcing [Christensen et al., 2007]. One way of reducing this uncertainty is to omit [e.g., Pitman and Perkins, 2008] or give a low weighting [e.g., Watterson, 2008] to those AOGCMs that are least able to reproduce the observed climate. The assumption is that these AOGCMs will be least able to simulate the future climate reliably. There are many different methods of testing AOGCMs against observations [e.g., Perkins et al., 2007; Tebaldi et al., 2005; Watterson, 2008] and the best way of using observations to assess the reliability of simulated future climates is the subject of ongoing research [e.g., Whetton et al., 2007].

[3] A critical question in this area is: “is the skill of an AOGCM in the past a useful guide to the skill of the AOGCM in the future?”. To explore this question, Reifen and Toumi [2009], hereafter RT09, investigated whether past performance by an AOGCM predicts future skill. They argued that if AOGCMs are ranked according to performance based on a skill score derived using observations, then ranks assigned to individual AOGCMs should show some consistency between different time periods during the observational record. They reasoned that if this were not the case, then one might doubt whether a superior performance of a subset of available AOGCMs at representing the climate of an observed time period had any bearing on the performance of the subset at representing the future climate. For example, an AOGCM that performs well for 1960–1979 should also perform well for 1980–1999. Strong performance in the earlier period, contradicted by poor performance in the later period would undermine a case that the AOGCM would be reliable in the future.

[4] RT09 examined AOGCM skill scores based on global and regional comparisons of simulated and observed 20th century near-surface temperatures. They focused on scores derived from mean temperature anomalies for five 20-year periods. They found that AOGCM-performance rankings derived from such scores were sensitive to the time period for which they were calculated. The implication is that the anomaly-based skill scores investigated by RT09 are not suitable for assessing the ability of AOGCMs to represent the future climate. However, because RT09 analysed anomalies, and ignored AOGCM biases in climatological means, their findings are not relevant to the large number of AOGCM assessments used in impacts studies that account for these biases. For example, Groves et al.'s [2008] study of water supply and demand used Tebaldi et al.'s [2005] approach to weight AOGCMs according to bias and convergence criteria, and Penman et al.'s [2010] study of snake distribution used projections developed by Suppiah et al. [2007] from a set of AOGCMs selected partly according root-mean-square errors in spatial fields of climatological means. Furthermore, Gleckler et al. [2008] provide evidence that AOGCM-performance rankings derived from some skill scores that incorporate biases in climatological mean precipitation totals or surface pressures have limited sensitivity to the time period covered by the observations from which they are calculated. Here, we extend the RT09 analysis to actual temperatures to ascertain whether the findings of RT09 hold true for an assessment that incorporates AOGCM biases in climatological mean temperatures. If actual temperatures are used to rank AOGCMs by performance, do rankings for one 20-year period predict rankings for other 20-year periods of the 20th century?

2. Data and Methods

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[5] We assess AOGCM performance in terms of 20th century simulated near-surface (2m) temperatures. The output of multiple AOGCMs is compared to observations of both actual temperatures and temperature anomalies relative to average temperatures for the 1961–1990 period. Following RT09, an analysis based on global averages and a regional analysis, based on gridded data, was performed.

[6] Following RT09, observed surface temperature anomalies from the HadCRUT3 dataset [Brohan et al., 2006] were used. A time series of global average annual mean temperature anomalies was obtained from the Met Office (see http://hadobs.metoffice.com/hadcrut3/diagnostics/comparison.html) and, for the regional analysis, 5° × 5° gridded monthly mean temperature anomalies were obtained from the University of East Anglia's Climate Research Unit (see http://www.cru.uea.ac.uk/cru/data/temperature/). Gridded annual mean values were calculated from the monthly mean values. To reduce the impact of missing monthly values, these were only calculated where more than 10 monthly values were available. Annual mean actual temperatures were calculated by adding the annual mean HadCRUT3 anomalies to average annual mean temperatures for the 1961–1990 period derived from Jones et al.'s [1999] climatology of monthly mean surface temperatures.

[7] Monthly mean actual temperatures from 17 AOGCM simulations of the climate of the 20th century (20C3M simulations) were obtained from the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset [Meehl et al., 2007]. Although monthly mean atmospheric data from 20C3M simulations were available for 24 AOGCMs we analysed one simulation from each of the 17 AOGCMs considered by RT09 (see Table S1 of the auxiliary material). Annual mean actual temperatures were calculated from the monthly mean values and then spatially averaged to obtain a time series of global average annual mean temperature anomalies for each AOGCM. For the regional analysis, linear interpolation between AOGCM gridpoints was used to re-grid the annual data to the 5° × 5° grid of the observations. For each AOGCM, annual mean temperature anomalies were calculated by subtracting the average annual mean temperatures for the 1961–1990 period for the AOGCM under consideration from the annual mean actual temperatures.

[8] Simulated and observed 20-year mean actual temperatures and temperature anomalies were calculated for the periods 1900–1919, 1920–1939, 1940–1959, 1960–1979 and 1980–1999. The spatial coverage of the gridded HadCRUT3 anomalies varies with time. We follow RT09 in reducing the influence of this variation on our regional analysis by omitting any gridbox with less than 10 years of data in any of the 20-year periods.

[9] Following the global analysis of RT09, we rank AOGCMs on global performance for each 20-year period according to the absolute differences between simulated and observed global average 20-year temperatures. A regional analysis is also performed for the Europe and Siberia regions defined by RT09 (35–60°N, 0–45°E and 50–70°N, 60–130°E respectively) and a contiguous USA region not considered by RT09 (gridboxes more than 75% occupied by land within the region 25–50°N, 65–125°W). Following the regional analysis of RT09, we rank AOGCMs on regional performance for each 20-year period according to the gridbox-wise root-mean-square difference between the set of simulated 20-year temperatures for the gridboxes comprising the region under consideration and the corresponding set of observed 20-year temperatures.

[10] RT09 defined “turnover” as the percentage of AOGCMs ranked in the top n for one time period that are not among the top n for the following time period. A turnover of 100% occurs when none of the top n AOGCMs for the first period appears in the top n for the second period. If the turnover is zero, then the top n AOGCMs are the same for both periods. For values of n between 1 and 17, RT09 calculated the mean turnover across the four transitions between 20-year periods for global performance and performance for Europe and Siberia. We repeat this analysis and extend it to performance rankings based on actual temperatures and to the USA region. We also apply the turnover concept to the bottom ranked n AOGCMs.

3. Results

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[11] Figure 1 shows the mean turnover of the top n and bottom n AOGCMs for the globe and regions for rankings derived from temperature anomalies (blue lines) and from actual temperatures (black lines) (see Figure S1 of the auxiliary material for Siberia). Figures 1 and S1 include the 95% confidence intervals of mean turnover for 17 AOGCMs that perform randomly relative to each another in each 20-year period. These were estimated by generating 100,000 sets of random rankings, each comprising a ranking for each 20-year period, and taking the 2.5th and 97.5th percentile mean turnover for each value of n.

image

Figure 1. Mean turnover of top n AOGCMs (solid lines) and bottom n AOGCMs (dashed lines) for rankings derived from actual temperatures (black) and anomalies (blue) and, for the globe only, from detrended anomalies (orange) and anomalies with trend magnified by a factor of 10 (red). Grey shading indicates 95% confidence intervals for AOGCMs ranked at random.

Download figure to PowerPoint

[12] For the globe, Europe and Siberia, the mean turnover of the top n AOGCMs derived from temperature anomalies behave broadly as described by RT09 (see Figure 3 of RT09). Some differences from RT09 exist due to differences in the derivation of annual global average temperatures and in the coverage of the gridded annual mean temperatures underlying the regional analyses. For all regions, including the USA, mean turnover is high for low values of n (e.g., for all regions, mean turnover exceeds 60%, or three AOGCMs, for n = 5). This suggests little consistency in the set of best performing AOGCMs between the 20-year time periods when anomalies are analysed. Indeed, consistent with RT09, the values of mean turnover for performance rankings derived from temperature anomalies are not clearly distinguishable from those for randomly ranked AOGCMs.

[13] Turning to rankings based on actual temperatures, the values of mean turnover of the top n AOGCMs are much less than those derived from anomalies and are well outside the 95% confidence intervals of mean turnover for AOGCMs ranked at random. For all regions, values of mean turnover are less than 20%, or one AOGCM, for n = 5. Mean turnover is never greater than 50% for any value of n. This suggests some consistency in the set of best performing AOGCMs between the 20-year time periods. For both AOGCM rankings derived from temperature anomalies and from actual temperatures, mean turnover of the bottom n AOGCMs generally behaves in a similar way to mean turnover of the top n AOGCMs.

[14] Figure 2 examines the consistency in the performance of individual AOGCMs between the 20-year time periods for the globe, USA and Europe (see Figure S2 of the auxiliary material for Siberia). Figure 2 shows the ranks assigned to individual AOGCMs for the five 20-year periods derived from temperature anomalies and derived from actual temperatures. There is little consistency across the 20-year periods in the anomaly-based ranks assigned to individual AOGCMs (as indicated by the long vertical lines for the anomaly-based ranks). For each region, at least one AOGCM ranked 1st in one period is ranked 17th in another period (e.g., NCAR-CCSM3 for Europe).

image

Figure 2. AOGCM-performance ranks derived from actual temperatures and anomalies. Low values of rank, at the top of the plots, indicate good performance. Each point represents a rank for a single AOGCM for one of the five 20-year periods. Points may overlay each other, so, in an individual plot, less than five points will be visible for AOGCMs that have the same rank in multiple periods. Each vertical line represents the range of ranks for a single AOGCM across the five 20-year periods. AOGCMs are sorted horizontally by mean rank across the five time periods with the best performing AOGCMs on the left.

Download figure to PowerPoint

[15] In contrast, and consistent with the lower values of mean turnover for actual temperatures (Figures 1 and S1), Figures 2 and S2 reveal that some AOGCMs have consistently low or high ranks across the time periods, corresponding to consistent good or poor performance respectively. Ensembles of well-performing and poor-performing AOGCMs in terms of actual temperatures can also be identified from Figures 1, 2, S1, and S2. For example, for the globe, there is zero mean turnover in the top four AOGCMs and the bottom 13 AOGCMs, suggesting that an ensemble of four AOGCMs (GISS-ER, UKMO-HadCM3, ECHAM5/MPI-OM, ECHO-G) outperforms the ensemble of the 13 remaining AOGCMs in all of the 20-year periods. Our analysis only relates to one skill score and different AOGCMs may well rank differently on other scores. Our key point is, if temperature anomalies are used, there is no consistency in AOGCM rankings. If actual temperatures are used, a consistent ranking becomes clear for the regions considered in this paper.

4. Discussion and Conclusions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[16] The reason why AOGCM-performance rankings based on actual temperatures are more consistent than those based on temperature anomalies is explained by Figure 3. This shows simulated and observed time series of 20th century annual global average actual temperatures and temperature anomalies relative to the 1961–1990 period. Although time series for the globe are shown, the following discussion also applies to the regions examined in this paper. The general cold bias in global average temperatures simulated by most AOGCMs [Randall et al., 2007] is evident in the actual temperatures (Figure 3, top). Importantly, the bias for individual AOGCMs is consistent throughout the 20th century and, for each AOGCM, the absolute difference between simulated and observed global average 20-year temperatures is largely determined by its bias in mean temperature averaged over the entire 20th century. Since different AOGCMs have different biases, the relative performance of the AOGCMs is consistent over the five 20-year periods.

image

Figure 3. Time series of simulated (thin coloured lines) and observed (thick black lines) global average annual mean (top) actual temperatures and (bottom) temperature anomalies relative to the 1961–1990 period.

Download figure to PowerPoint

[17] In the case of temperature anomalies, the contribution of differential biases in 20th century mean temperature is removed. The relative performance of the AOGCMs derived from global average 20-year temperature anomalies is therefore determined by differential biases in the magnitude of multi-decadal temperature variability and in the warming trend over the 20th century. There is no evidence of consistent AOGCM-performance rankings arising from consistent biases in the magnitude of multi-decadal variability. The values of mean turnover for rankings derived from temperature anomalies with the linear warming trend removed (Figure 1, orange line) are indistinguishable from those for randomly ranked AOGCMs. This is consistent with much of the multi-decadal variability in temperature being due to unforced internal variability of the climate system. A well-performing AOGCM might be expected to reproduce the statistical properties of such variability but there is no reason to expect unforced variability simulated by an AOGCM to coincide temporally with unforced variability in an observed time series. Related to the low signal-to-noise ratio for greenhouse warming in the 20th century, the differences in trends between the AOGCMs are too small to produce consistent AOGCM rankings (Figure 1, blue line). Even if the linear trend is magnified by a factor of 10 (Figure 1, red line), values of mean turnover are still greater than those derived from actual temperatures (Figure 1, black line), although they are generally outside the 95% confidence intervals of mean turnover for randomly-ranked AOGCMs.

[18] An implicit assumption of RT09 and this study is that differences between observed and simulated temperatures are a function of AOGCM performance only. However, uncertainties in the observational data may also contribute to these differences. Estimates of time-varying uncertainties in the gridded HadCRUT3 dataset have been made that, for land gridboxes, account for errors in weather station records, sampling error within gridboxes and uncertainties in corrections for biases due to urbanisation effects and thermometer exposure changes [Brohan et al., 2006]. Uncertainty estimates for the time series of global average temperature anomalies are also available and, in addition to the aforementioned uncertainties, account for the effect of limited and time-varying spatial coverage of the gridded data. Uncertainties in observational data have been largely neglected by RT09 and this study, although both studies attempt to reduce the influence of the time-varying spatial coverage of the data. It is therefore possible that this paper underestimates the temporal consistency in AOGCM performance based on actual temperatures. We expect the underestimation to be small and also to apply to performance based on temperature anomalies. We expect the temporal inconsistency of AOGCM performance based on temperature anomalies demonstrated by RT09 to be robust to it.

[19] In conclusion, RT09 demonstrated that AOGCM-performance rankings based on anomalies relative to climatological means can be inconsistent over time. They concluded that such rankings are not useful in assessments of the reliability of future climate changes simulated by AOGCMs. We demonstrate that AOGCM-performance rankings based on actual values, which incorporate biases in climatological means, can be consistent over time. The findings of RT09 are therefore not applicable to all AOGCM assessment methods, including methods commonly used to inform impacts studies.

Acknowledgments

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

[20] We acknowledge the modelling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP's Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multi-model dataset. Support of this dataset is provided by the Office of Science, US Department of Energy. We thank Janice Bathols of CSIRO for assistance with the CSIRO archive of CMIP3 data and John Kennedy of the Met Office for advice on the HadCRUT3 dataset. This work was partly funded by the CSIRO Climate Adaptation National Research Flagship and by the Australian Research Council via LP0883296. Two anonymous reviewers provided valuable comments.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Data and Methods
  5. 3. Results
  6. 4. Discussion and Conclusions
  7. Acknowledgments
  8. References
  9. Supporting Information

Auxiliary material for this article contains a table and two figures.

Auxiliary material files may require downloading to a local drive depending on platform, browser, configuration, and size. To open auxiliary materials in a browser, click on the label. To download, Right-click and select “Save Target As…” (PC) or CTRL-click and select “Download Link to Disk” (Mac).

Additional file information is provided in the readme.txt.

FilenameFormatSizeDescription
grl27202-sup-0001-readme.txtplain text document2KReadme.txt
grl27202-sup-0002-ts01.pdfPDF document711KTable S1. AOGCMs considered in the analysis.
grl27202-sup-0003-fs01.epsPS document97KFigure S1. Mean turnover of top n AOGCMs and bottom n AOGCMs for Siberia for rankings derived from actual temperatures and anomalies.
grl27202-sup-0004-fs02.pdfPDF document746KFigure S2. AOGCM-performance ranks for Siberia derived from actual temperatures and anomalies.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.