1 Main Comment
Racherla et al. [2012, hereinafter RSF12] analyzed two decades, 1968–1978 and 1995–2005, comparing time slices of global and regional climate models simulations with analyses of observations and in particular verifying the skill of models at capturing the changes in seasonal mean surface air temperature and precipitation between the two decades over 11 regions of the U.S. The global simulation is from the Goddard Institute for Space Studies (GISS)-ModelE2 coupled atmosphere-ocean global climate model (AOGCM), integrated on a 2° by 2.5° grid and incorporating anthropogenic and natural forcings with a detailed representation of gas phase, sulfate, black carbon, nitrate, and secondary organic aerosol chemistry. Dynamical downscaling was achieved with a version of the Weather Research and Forecasting (WRF) atmospheric regional climate model (RCM), integrated over a North American domain on a 45 km mesh with 216 by 126 cells in the west-east and south-north directions, respectively. WRF was driven by atmospheric fields and sea surface temperature (SST) and sea ice concentrations (SIC) from the GISS-ModelE2 simulation. The WRF model did not include anthropogenic aerosol forcings nor land use changes. RSF12 found very modest skill at reproducing the observed trend of temperature and precipitation over the past 37 years for their single AOGCM simulation, and very little improvement if any from dynamical downscaling with the higher-resolution RCM.
Does this constitute a failure for the global climate model or, as purported by Kerr , a failure for dynamical downscaling? We do not think so, due to the use of an inadequate experimental protocol. The results of the experiment as designed were strongly influenced by the presence of internal variability and sampling errors, which masked the rather small climate changes that may have occurred as a consequence of changes in forcing during the period considered.
Statistical theory informs us that any average calculated from a limited sample is affected by sampling error. If we note by σ2 the variance of seasonal mean associated to interannual variability (IAV), then an N year mean seasonal climatology has an associated uncertainty (average square error) equal to s2 = σ2/N, assuming independence of seasonal means across years. The difference between two N year means would have an associated uncertainty equal to S2 = 2 σ2/N, assuming constant IAV across years. In RSF12, the change between two 10 year means is calculated over a 37 year period (1967–2005), with an expected square error equal to S2 = 2 σ2/10. We note that the corresponding 37 year mean has an expected square error equal to s2 = σ2/37; clearly, the change statistics is much more prone to sampling error than the average statistics, as S2 = 7.4 s2.
The larger sampling error, combined with the fact that the changes are small during the historical period considered, makes the use of simulated trend to assess climate models difficult to apply despite its formal appeal. We note that the situation may be quite different in some future, say 2050, when climate changes could be calculated between two 30 year periods such as 2020–2050 and 1970–2000. Assuming that anthropogenic emissions of greenhouse gases continue over the next decades, and based on current understanding of greenhouse effect, we expect that the combined effects of the stronger climate trend then and longer averaging period would concur to increasing the signal-to-noise ratio, assuming of course that models can adequately reproduce the observed features of the real climate, in terms of their time mean and variability, and changes thereof.
de Elía et al.  estimated the time over which one can expect that the climate change would exceed natural variability, and they introduced the concept of “Expected number of Years before Emergence” (EYE). Their results indicate for example EYE values ranging from 20 to more than 50 years for wintertime temperature for various regions of North America. Using a related concept of “Time of Emergence” (TOE) for projections over Europe, Maraun  obtained that across wide areas, the local trends for heavy summer precipitation emerge only late in the 21st century or later. Given that these values are based on model-simulated internal variability, which as noted by Lovejoy  tend to be underestimated by models, especially at low frequencies, the emergence of a signal in the real world may be even longer. Using large ensemble of CCSM3 projections for this century, Deser et al. [2012, Figure 2] have shown that many members were required to detect a significant response over a 23 year period from now over the U.S.: generally, more than three members were needed for temperature and more than 12 for precipitation.
Should climate models be expected to capture the changes in surface air temperature and precipitation between two historical decades? The answer is that the processes responsible for the trend need to be represented in the model and in the case of a nested model in the driving boundary conditions as well. Climate models could be expected to capture past and future climate changes in two ways: (1) from the memory of the specified initial conditions (IC) of the atmosphere, land, and oceans and (2) from the prescribed evolution of boundary conditions (BC) forcing. Current understanding of Earth system predictability indicates IC memory time scales of a few weeks for the atmosphere, and a few months to a few years for land surface conditions; the ocean on the other hand exhibits variability across a wide range of time scales because of its multiple heat reservoirs and modes of variability. BC forcings include contributions that are nearly constant in time (e.g., orography, land-sea mask, and astronomical parameters) and others that vary in time. The latter may be further subdivided in natural (e.g., volcanoes and sun cycles) and anthropogenic (e.g., greenhouse gases and aerosols (GHGA) resulting from the fossil fuel burning and land use changes) forcings. In the context of atmosphere-only regional models, BC forcings would also include the prescribed distribution and evolution of SST and SIC, as well as atmospheric lateral BC. The fluctuations of the climate system can be decomposed into its free and forced components, referred to as natural variability and climate changes, respectively.
Numerical weather prediction (NWP) operates on the basis of the time evolution from the specified IC, while climate change projections have traditionally been approached from the perspective of changes in BC forcings affecting the forced component of the climate system. Recent changes in paradigm are taking place, however, emphasizing the role of IC for near-term climate predictions, e.g., Giorgi  stated that “....because of the long time scales involved in ocean, cryosphere and biosphere processes a first kind predictability component also arises. The slower components of the climate system (e.g. the ocean and biosphere) affect the statistics of climate variables (e.g. precipitation) and since they may feel the influence of their initial state at multi decadal time scales, it is possible that climate changes also depend on the initial state of the climate system.” While decadal prediction represents a colossal challenge [e.g., Solomon et al., 2011], there are, however, clear indications from the climate forecast systems participating in the fifth Coupled Model Intercomparison Project (CMIP5) of skill in predicting regional-scale temperature anomalies over the past 50 years; most of the skill results from changes in atmospheric composition, but also partly from the initialization of the predictions [Doblas-Reyes et al., 2013].
Given that an AOGCM is used by RSF12, the only contact the AOGCM has with real-world chronology is through specified anthropogenic and natural BC forcings from GHGA, a rather weak forcing compared to that of SST and SIC anomalies as would be the case of an atmosphere-only GCM in Atmospheric Model Intercomparison Project-type experiments [e.g., Gates, 1992]. Hence, it is important to recall that natural variability in the AOGCM simulation is not going to synchronize with reality except if by luck.
The relatively small amplitude of recent past climate changes makes most challenging the detection of observed changes and causal attribution to changes in forcing agents, particularly at regional scale, because natural variability, both in observed and modeled changes, blurs the climate trends resulting from changes in climate forcings [e.g., Hegerl and Zwiers, 2011]. The Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC AR4) [2007, Figure SPM.4] presented observed surface temperature anomaly for the period 1906 to 2005 and “plumes” diagrams of climate models simulations using natural-only and both natural and anthropogenic historical forcings. The width of the plumes represents the 5 to 95% range of simulation results and accounts for the combined effects of different forcings being used, the different response of participating models to specified forcings, and natural variability in model simulations. Over all continents, the observation line falls within the models simulations plumes, which is interpreted favorably in terms of the skill of the ensemble of global model simulations to reproduce observed changes. As stated in IPCC AR4 [2007, chapter 9], “When human factors are included, the models also simulate a geographic pattern of temperature change around the globe similar to that which has occurred in recent decades.” Note that while ensemble averaging of simulated results can be used to filter out models' natural variability, there is no equivalent way of overcoming natural variability in observations, save by time averaging, which is only applicable under constant climate forcing conditions. Hence, acknowledging the presence of natural variability is clearly required for an adequate verification of model simulations against observations.
In recent years the assessment of climate models performance has gradually shifted from comparing time mean variables to their time variations. IPCC AR4 [2007, chapter 8] stated that “… developments in AOGCM formulation have improved the representation of large-scale variability over a wide range of time scales. The models capture the dominant extratropical patterns of variability… The atmosphere-ocean coupled climate system shows various modes of variability that range widely from intra-seasonal to inter-decadal time scales.” Lovejoy , however, noted that “Analysis of several simulations of the past millennium shows that their low-frequency variability using “reconstructed forcings” is somewhat too small compared to the observed variability.”
Although not widespread, the use of recent past climate changes to assess climate models performance, as promoted by RSF12, has nevertheless been used already in a few studies. For example, Pierce et al.  analyzed CMIP3 ensemble simulations from 21 AOGCM for the 1960–1999 period during which the observed trend in the western U.S. is +0.10°C/decade. They found that “because of the importance of natural variability in a limited domain, it is not uncommon for models with a strongly positive ensemble-averaged trend to have individual realizations with a negative trend. A single model realization does not provide a reliable estimate of the warming signal.” They emphasized the importance of ensembles of simulations: “…enough realizations must be chosen to account for the (strong) effects of the models’ natural internal climate variability. In our test case, 14 realizations were found to be sufficient…” Räisänen  analyzed the skill of 21 AOGCMs participating to IPCC AR4 to reproduce the observed trend from 1955 to 2005. A 50 year period was used to reduce the effect of internal variability in observations and a 21 model ensemble to minimize the effect of internal variability in model results. A spatial correlation of 0.48 for temperature and 0.23 for precipitation was obtained between the multimodel mean trend and the observations. He noted that climate changes in individual model simulations were more strongly affected by internal variability and were less similar to the observed changes than the multimodel means. Using an ensemble of simulations of the Hadley Centre's most recent AOGCM with improved treatment of volcanoes and mineral and anthropogenic aerosol processes including their direct and indirect effects, Booth et al.  showed that the model exhibited a rather remarkable success in reproducing the observed historical trends in the North Atlantic SSTs. This work instills some optimism for decadal prediction when suitable forcings are incorporated in models.
To what extent the imperfect reproduction of the observed trends as obtained by RSF12 is due to model structural errors is mute. An experiment could be designed to separate structural errors from natural variability effects, using a “perfect prognosis” approach similar to the “identical twins” experiments reported by Lorenz  for NWP. The experiment would consist of making a reference simulation from an AOGCM that would henceforth be considered as the truth and used for verification of an ensemble of simulations performed with the same model but initialized from slightly different IC, discarding a sufficiently long period to decrease the influence of IC on ensuing simulations. Because the member simulations would be compared to a simulation of the same model, they would not be influenced by model's structural errors, and any failure of the members at reproducing the trend of the reference run could be unambiguously attributed to internal variability and sampling effects.