Climate models reproduce the observed surface warming better than one would expect given the uncertainties in radiative forcing, climate sensitivity and ocean heat uptake, suggesting that different models show similar warming for different reasons. It is shown that while climate sensitivity and radiative forcing are indeed correlated across the latest ensemble of models, eliminating this correlation would not strongly change the uncertainty range of long-term temperature projections. However, since most models do not incorporate the aerosol indirect effects, model agreement with observations may be partly spurious. The incorporation of more detailed aerosol effects in future models could lead to inconsistencies between simulated and observed past warming, unless the effects are small or compensated by additional forcings. It is argued that parameter correlations across models are neither unexpected nor problematic if the models are interpreted as conditional on observations.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Detection and attribution studies show that most of the observed surface warming over the last fifty years is ‘very likely’ (>90% probability) caused by anthropogenic forcing, and ‘very unlikely’ due to internal variability or known natural forcings [Hegerl et al., 2007]. These conclusions are based on comparing spatio-temporal patterns between observations and models (allowing the amplitudes of the responses to different forcings to vary), rather than just the time evolution of global temperature. Yet, the agreement between the simulated and observed global temperature is often used as a supporting argument in the model evaluation process, and certainly as a visual demonstration of consistency between the theoretical understanding of the climate system, its implementation in general circulation climate models (GCMs) and the observed trends [Intergovernmental Panel on Climate Change (IPCC), 2007, Figure SPM.4, FAQ 8.1, Figure 1]. It is assumed that a successful hindcast of temperature changes over the 20th century increases our confidence in projections of future warming. Indeed, constraining models on past trends improves their agreement in future projections, and can be used to produce probabilistic projections [Allen et al., 2000; Knutti et al., 2002; Stott and Kettleborough, 2002].
 This study uses the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3), a set of simulations with different GCMs used in the IPCC Fourth Assessment Report (AR4) [IPCC, 2007]. Recently published consensus estimates of projected warming [Meehl et al., 2007; Knutti et al., 2008] are constrained by observations but do not explicitly use global temperature agreement of the CMIP3 models with observations to define model weights or as a measure of confidence. However, a simulation of the past global temperature in agreement with observations in the CMIP3 models is seen as a prerequisite for a consistent explanation of human induced climate change.
 The agreement between the CMIP3 simulated and observed 20th century warming is indeed remarkable [Hegerl et al., 2007, Figure 9.5a]. But do the current models simulate the right magnitude of warming for the right reasons? How much does the agreement really tell us? Kiehl  recently showed a correlation of climate sensitivity and total radiative forcing across an older set of models, suggesting that models with high sensitivity (strong feedbacks) avoid simulating too much warming by using a small net forcing (large negative aerosol forcing), and models with weak feedbacks can still simulate the observed warming with a larger forcing (weak aerosol forcing). Climate sensitivity, aerosol forcing and ocean diffusivity are all uncertain and relatively poorly constrained from the observed surface warming and ocean heat uptake [e.g., Knutti et al., 2002; Forest et al., 2006]. Models differ because of their underlying assumptions and parameterizations, and it is plausible that choices are made based on the model's ability to simulate observed trends.
 To reproduce the observed surface warming over the industrial period, a high (low) climate sensitivity can be combined with a small (large) net radiative forcing and/or a high (low) ocean heat uptake. A small (large) total forcing is usually the result of a strong (weak) negative aerosol forcing. There is no correlation between the climate sensitivities of the CMIP3 models and their respective heat uptake efficiencies (the heat flux into the ocean per unit global surface warming at the point of CO2 doubling in a 1%/yr CO2 increase scenario). This is not surprising, since both quantities are diagnostic quantities that usually come out at the end of the model development process. Being determined by a large number of interacting processes and feedbacks, they are not easily tunable parameters in GCMs. Radiative forcing is not available for most CMIP3 models, but can be diagnosed from an energy balance approach (see Forster and Taylor  for details). It is shown in Figure 1d (cyan circles) that climate sensitivity and total forcing are weakly correlated (r = −0.5) in CMIP3, in agreement with the results from Kiehl  from an older set of models (Figure 1d, cyan asterisks). The uncertainty in the radiative forcing of the CMIP3 models is about 10% [Forster and Taylor, 2006] and introduces some uncertainty in the correlation, but for the purpose of this study, it is sufficient to conclude that there is a correlation of about −0.5 between climate sensitivity and radiative forcing, which seems to be robust for several generations of models. Models, therefore, simulate similar warming for different reasons, and it is unlikely that this effect would appear randomly. While it is impossible to know what decisions are made in the development process of each model, it seems plausible that choices are made based on agreement with observations as to what parameterizations are used, what forcing datasets are selected, or whether an uncertain forcing (e.g., mineral dust, land use change) or feedback (indirect aerosol effect) is incorporated or not.
 To understand the behavior of the CMIP3 ensemble, the Bern2.5D climate model of intermediate complexity [Stocker et al., 1992; Knutti et al., 2002] is used here. A large ensemble of simulations is generated to explore the response of global temperature. Climate sensitivity is varied between 1 and 7°C and the magnitude of the aerosol radiative forcing (direct plus indirect) time series can be changed by setting a time-independent scaling factor. Ocean parameters are kept fixed here for simplicity (see below for a discussion), standard radiative forcing time series are used [Joos et al., 2001; Knutti et al., 2002] for the past, and the SRES A2 scenario [Nakicenovic and Swart, 2000] is prescribed up to the year 2100. From the large ensemble, a first subset is chosen in which climate sensitivity is uncorrelated with total forcing (Figure 1a, each grey point represents one simulation). Both quantities are approximately normally distributed and the means and standard deviations are chosen similar to the CMIP3 models (3.2 ± 0.7 °C climate sensitivity for doubling CO2, 1.8 ± 0.5 Wm−2 radiative forcing). The mean simulated warming (black line) and uncertainty (grey band, one standard deviation) relative to the 1900–1950 average for the past and future are shown in Figures 1b and 1c, respectively. The observed warming (blue line, Figure 1b) agrees well with the ensemble model mean (black). The second subset is chosen with a weak correlation of −0.5 between climate sensitivity and radiative forcing (Figure 1d, dots) similar to the CMIP3 models (Figure 1d, cyan circles) and those models shown by Kiehl  (Figure 1d, cyan asterisks). Note that the black lines in the first column show the theoretical relationship (solid) and uncertainty (dashed) between total forcing and climate sensitivity based on the energy balance equation Q = F − T/S where Q = 0.7 ± 0.2 Wm−2 is an estimate of the observed global ocean heat uptake, T = 0.7°C is the observed global surface warming, F is the total forcing and S is the equilibrium climate sensitivity (see Kiehl  for details). The correlation reduces the uncertainty band in temperature (Figures 1e and 1f, red band) by about 20% with the mean response virtually unchanged (response from Figures 1b and 1c repeated in grey/black for comparison). The mean and uncertainty (Figures 1e and 1f, cyan thick and thin lines) of the CMIP3 models are consistent with observations, but the uncertainty is somewhat wider, in particular in the 20th century, because internal variability is considered. Note that not all CMIP3 models have simulated the A2 scenario [Meehl et al., 2007, Table 10.4]. For Figures 1g–1l, a subset of the simulations with a correlation between climate sensitivity and forcing of about −0.8 (i.e., much higher than CMIP3) is selected. This reduces the uncertainty band by more than half in the 20th and early 21st century, compared to Figures 1b and 1c. By 2100, the effect becomes small again, since the aerosol effect is small then compared to the greenhouse gas forcing.
 The total radiative forcing in the CMIP3 models is 1.8 Wm−2 on average (year 2000–2005 mean), in good agreement with observed estimates of 1.6 [+0.6 to +2.4] Wm−2 for anthropogenic plus 0.12 Wm−2 for solar forcing [Forster et al., 2007]. However, only 7 of 23 models include the first indirect aerosol effect, and only 5 include the second indirect effect. So why does the set of CMIP3 models reproduce the observed warming so well without considering all forcings, and what would happen if they were included? (Note that for this discussion, it is irrelevant whether these are considered a forcing or a feedback, in any case warming would be partly suppressed by including them).
 To illustrate the effect of a larger aerosol forcing, the total aerosol forcing time series in the Bern2.5D model is scaled such that the current total forcing is decreased by 0.5 Wm−2 (Figures 1j–1l) compared to the standard case (Figures 1d–1f). This is sufficient to introduce a mismatch between simulated and observed warming (Figure 1k). Reducing the total forcing by 1 Wm−2 (Figures 1m–1o) would reduce warming over the 20th century to about half of the observed (Figure 1n) and lead to an obvious inconsistency between models and observations. Warming trends for the future however remain similar.
 Simulations with different models including aerosol indirect effects suggest a top of atmosphere forcing of the total aerosol effects centered around −1.5 Wm−2, with an uncertainty range extending beyond −2.5 Wm−2 [Lohmann et al., 2007], much larger than typically considered in the CMIP3 models (about −0.5 Wm−2 for the direct effect). On the other hand, recent comparisons of aerosol models with satellite data indicate that the aerosol indirect effect may be much smaller [Quaas et al., 2006]. For CMIP3 to remain consistent with observed warming trends when including an additional forcing of only −0.5 Wm−2, the climate sensitivity distribution would need to be shifted upward by at least 2°C (mean 5.2°C, shown in Figures 1p–1r). The reason is that the short term transient warming is not very sensitive to climate sensitivity [e.g., Knutti et al., 2005]. The consequence of this however would be a much larger long-term warming (Figure 1r). A larger aerosol forcing could also be partly compensated by a smaller ocean heat uptake, larger internal climate variability, larger natural forcings, a different magnitude of other known forcings or by including new forcings. Internal unforced variability can be estimated by the ensemble spread in GCMs and is small for global temperature [Stott et al., 2000]. The good agreement of observed and simulated warming also favours an external forcing. Long-term trends in solar forcing have recently been revised downward rather than upward [Forster et al., 2007]. Changes in known forcings or the discovery of new forcings on the order of 0.5 Wm−2 or larger also seem rather unlikely. Some studies suggest that many GCMs models mix heat too effectively into the deep ocean [e.g., Forest et al., 2006] compared to the Levitus et al.  dataset. On the other hand, the average of all GCMs agrees well with a newer dataset [Domingues et al., 2008] showing somewhat larger warming and less decadal variability than Levitus et al. . The average heat uptake in the Bern2.5D model 1955 to 1995 is 14·1022J (±7·1022J, one standard deviation), similar to Levitus et al. . Ocean parameters are not varied in the Bern2.5D model for this study. Even if the ocean heat uptake is uncertain, revisions are unlikely to modify today's energy budget by more than a few tenths of a Wm−2, and are therefore likely smaller than (and unable to compensate for) the potential changes in radiative forcing when introducing all aerosol effects.
 Finally, an interesting hypothesis is that the aerosol indirect effect and climate sensitivity could be correlated in models, since they both depend partly on parameterizations of the hydrological cycle. Changes in model parameters (e.g., cloud microphysics) may result in compensating effects in climate sensitivity and total aerosol effect, such that the 20th century warming is relatively robust but future warming will be quite different in different model version (as in Figure 1r).
3. Discussion and Conclusions
 First, the most likely and obvious (although not the only) interpretation from the results just above is that the total aerosol effect is smaller than suggested by most aerosol models. While earlier results [Knutti et al., 2002; Anderson et al., 2003; Forest et al., 2006] had suggested that before based on simpler models, this is the first study to place the current CMIP3 models, their simulated warming, forcing and climate sensitivity, the observed warming and a large ensemble of simulations with a simpler model into direct comparison. In contrast, if the additional aerosol forcings (or feedbacks) not considered in CMIP3 nevertheless turn out to be large (i.e., exceeding an additional −0.5 Wm−2), taking them into account will decrease the simulated warming and may result in a mismatch between simulated and observed 20th century warming. In that case, the current agreement between simulated and observed warming trends would be partly spurious, and indicate that we are missing something in the picture of causes and effects of large scale 20th century surface warming. An alternative possibility is that other forcings are larger, as for example suggested recently for black carbon [Ramanathan and Carmichael, 2008]. Constraining the aerosol effects from data, models and from the observed warming trends [Knutti et al., 2002; Anderson et al., 2003; Forest et al., 2006; Quaas et al., 2006; Lohmann et al., 2007] is therefore a critical step in order to decide whether our understanding of human influence on climate and our climate models are consistent with observed trends.
 Second, the question is whether we should be worried about the correlation between total forcing and climate sensitivity. Schwartz et al.  recently suggested that “the narrow range of modelled temperatures [in the CMIP3 models over the 20th century] gives a false sense of the certainty that has been achieved”. Because of the good agreement between models and observations and compensating effects between climate sensitivity and radiative forcing (as shown here and by Kiehl ) Schwartz et al.  concluded that the CMIP3 models used in the most recent Intergovernmental Panel on Climate Change (IPCC) report [IPCC, 2007] “may give a false sense of their predictive capabilities”.
 Here I offer a different interpretation of the CMIP3 climate models. They constitute an ‘ensemble of opportunity’, they share biases, and probably do not sample the full range of uncertainty [Tebaldi and Knutti, 2007; Knutti et al., 2008]. The model development process is always open to influence, conscious or unconscious, from the participants' knowledge of the observed changes. It is therefore neither surprising nor problematic that the simulated and observed trends in global temperature are in good agreement. The point is that the simulation of 20th century global temperature should no longer be seen only as a prediction performed at the end of the model development process, but as a model result like, e.g., mean annual sea ice cover, or the spectrum of ENSO, which are used during the model development process to compare with observations. Rather than as an independent verification, the observed warming may be seen as a constraint on the model parameter space, as is routinely done in simpler models [e.g., Knutti et al., 2002; Forest et al., 2006]. Agreement between simulated and observed global temperature itself therefore merely indicates a consistent explanation of the observed trends with the assumed model and forcing. Indeed formal attribution of temperature trends has always been based on spatio-temporal patterns rather than the simulated amplitude of global temperature change to a set of forcings [Hegerl et al., 2007] and can even be accomplished when subtracting global mean trends.
 The above takes a Bayesian viewpoint, in which the CMIP3 models are seen as some posterior distribution given observations (of means, variability and trends). However this naïve direct interpretation as a Bayesian posterior should be avoided because the ensemble does not sample the full uncertainty of the models and observations, because some models perform worse than others and because the prior distribution is unclear [Stott et al., 2006; Tebaldi and Knutti, 2007]. But if the uncertainty of future projections in CMIP3 is constrained by observations, this inevitably introduces correlations across parameters. For other probabilistic methods [e.g., Knutti et al., 2002; Stott et al., 2006] this is well accepted, so why should it be problematic here? Since the mean climatology provides only a weak constraint on the future, why should we not look at trends to improve the models? We are not giving a “false sense of predictive capability” when showing simulated and observed warming next to each other, but simply stating what has been known before, namely that different sets of parameters in one or several models can reasonably fit the available observations. Propagating each parameter set (i.e., each model version) forward into projections (ideally in large ensembles) leads to an uncertainty or probability of future changes conditional on past observations (note that the assumption here is that the structural error in simulating global temperature is small, which is supported by energy balance models (e.g., Figure 1), but formally difficult to quantify). But there are other sources of uncertainty, e.g., the carbon cycle and structural model uncertainties in statistical frameworks, which are not considered in CMIP3. The consensus estimates of future warming uncertainty are therefore known to be larger than in CMIP3 and are based on many more lines of evidence [Meehl et al., 2007; Knutti et al., 2008].
 I argue that the current agreement of model simulated and observed warming (given the other forcings) points towards a relatively small total aerosol effect. There is a correlation between climate sensitivity and total radiative forcing in the CMIP3 models, and removing that correlation would not increase uncertainties in future projections beyond the consensus estimates [Knutti et al., 2008]. But from a Bayesian point of view it is natural that observations lead to correlations of parameters across models, or that those observations constrain sets of parameters while possibly not constraining them individually. The projection uncertainty of CMIP3 should, therefore, be interpreted as at least partly conditional on past observed warming trends. The iconic figure showing agreement between simulated and observed global temperature over the 20th century should not be interpreted itself as the attribution of anthropogenic influence on climate. Just because we can build a model that replicates 20th century global temperature (and nothing else) doesn't imply that the model is correct. The figure shows that the combined natural and anthropogenic radiative forcings are a consistent explanation for the observed changes in these models, whereas natural forcings alone cannot explain the observations. The natural forcings fail to explain the observed spatio-temporal patterns even if their response is inflated. Projections over the next few decades and their uncertainties are not sensitive to the magnitude of the aerosol forcing (see Figures 1l and 1o) as long as the sulphate to greenhouse forcing ratio remains similar [Allen et al., 2000].
 I acknowledge fruitful discussions with Myles Allen, Chris Bretherton, Philip Brohan, Jonathan Gregory, Gareth Jones, William Ingram, Peter Stott and other colleagues. I also acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP's Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multi-model dataset. Support of this dataset is provided by the Office of Science, U.S. Department of Energy.