Causes and implications of persistent atmospheric carbon dioxide biases in Earth System Models



The strength of feedbacks between a changing climate and future CO2 concentrations is uncertain and difficult to predict using Earth System Models (ESMs). We analyzed emission-driven simulations—in which atmospheric CO2levels were computed prognostically—for historical (1850–2005) and future periods (Representative Concentration Pathway (RCP) 8.5 for 2006–2100) produced by 15 ESMs for the Fifth Phase of the Coupled Model Intercomparison Project (CMIP5). Comparison of ESM prognostic atmospheric CO2 over the historical period with observations indicated that ESMs, on average, had a small positive bias in predictions of contemporary atmospheric CO2. Weak ocean carbon uptake in many ESMs contributed to this bias, based on comparisons with observations of ocean and atmospheric anthropogenic carbon inventories. We found a significant linear relationship between contemporary atmospheric CO2 biases and future CO2levels for the multimodel ensemble. We used this relationship to create a contemporary CO2 tuned model (CCTM) estimate of the atmospheric CO2 trajectory for the 21st century. The CCTM yielded CO2estimates of 600±14 ppm at 2060 and 947±35 ppm at 2100, which were 21 ppm and 32 ppm below the multimodel mean during these two time periods. Using this emergent constraint approach, the likely ranges of future atmospheric CO2, CO2-induced radiative forcing, and CO2-induced temperature increases for the RCP 8.5 scenario were considerably narrowed compared to estimates from the full ESM ensemble. Our analysis provided evidence that much of the model-to-model variation in projected CO2 during the 21st century was tied to biases that existed during the observational era and that model differences in the representation of concentration-carbon feedbacks and other slowly changing carbon cycle processes appear to be the primary driver of this variability. By improving models to more closely match the long-term time series of CO2from Mauna Loa, our analysis suggests that uncertainties in future climate projections can be reduced.

1 Introduction

Anthropogenic emissions of radiatively active greenhouse gases into the atmosphere, especially carbon dioxide (CO2), are rapidly increasing the burden of these gases and altering the Earth's climate [IPCC, 2007; Raupach and J.G. Canadell, 2010]. This perturbation of the global carbon cycle is expected to induce feedbacks from the terrestrial biosphere and oceans on future CO2 concentrations and the climate system. These climate–carbon cycle feedbacks are highly uncertain, difficult to predict, and potentially large [Denman et al., 2007]. Understanding and predicting the strength and direction of feedbacks is critically important for estimating future atmospheric CO2 concentrations and, therefore, accurately predicting the effects and extent of climate change.

Models of Earth's climate system are used to predict responses to human and natural forcings into the future, while hindcasts are used to judge the ability of individual models to reproduce observed patterns. Current generation Earth System Models (ESMs) attempt to capture the complex interactions and feedbacks between climate, terrestrial and ocean ecosystems, and human activities. Scenarios describing alternative prospective future socioeconomic, technological, and environmental conditions are used to generate a consistent set of chemical, biological, and land use data to drive ESMs [Moss et al., 2010]. The results from such ESM simulations are valuable for diagnosing the magnitude of mitigation efforts required to stabilize CO2 levels in the atmosphere under various scenarios, taking into account carbon cycle responses and feedbacks. Traditionally, such models were provided with a trajectory of CO2 and other greenhouse gases consistent with scenario assumptions about population, energy resources and consumption, and agricultural policies and practices. Recently, as improvements to the representation of biogeochemical processes on land and in the ocean and better atmospheric chemistry have been added to ESMs, scenario-derived emissions of radiatively active gases, consistent with plausible natural and anthropogenic influences, are used to force ESMs. Concentration-driven simulation results are frequently analyzed to evaluate the mean carbon stocks and fluxes and to constrain biosphere processes and feedbacks in land and ocean models [Friedlingstein et al., 2006; Arora et al., 2013; Anav et al., 2013]. They also provide the opportunity to estimate emissions scenarios consistent with a specific trajectory of atmospheric CO2 [Jones et al., 2013]. Emission-driven simulations, in contrast, provide the opportunity to assess the implications of biases resulting from uncertainties associated with ecosystem processes and feedbacks as the effects of those uncertainties propagate through the coupled ESM.

Friedlingstein et al. [2003, 2006] developed a framework for analysis of climate–carbon cycle feedbacks and applied it to 11 coupled climate–carbon cycle atmosphere–ocean general circulation models for the Coupled Climate–Carbon Cycle Model Intercomparison Project (C4MIP). Friedlingstein et al. [2003, 2006] introduced model sensitivities of land and ocean carbon sinks to climate (γL and γO, respectively) and to atmospheric CO2 concentration (βL and βO, respectively) as metrics of climate-carbon and concentration-carbon feedbacks, respectively, to complement the overall climate sensitivity to atmospheric CO2 parameter, α, in common use. In their study, Friedlingstein et al. [2006] found that the model sensitivities of the land carbon sinks to climate (γL) varied by almost a factor of 9 and to concentration (βL) by almost a factor of 14. Moreover, the models varied by a factor of almost 8 in their gain (g) of the climate–carbon cycle feedback. Arora et al. [2013] performed a similar analysis for nine ESMs participating in the Fifth Phase of the Coupled Model Intercomparison Project (CMIP5) [Taylor et al., 2012] and found that γL varied by almost a factor of 6, βL varied by almost a factor of 7, and the emissions-derived gain (gE) varied by more than a factor of 6. The emissions-derived gain (gE) is analogous to the Friedlingstein et al. [2006] gain (g) for concentration-forced simulations. The multimodel mean feedback parameters, and their standard deviations, were lower in the nine CMIP5 models than in the C4MIP models, with γL being 26% weaker and βL being 32% weaker. These differences may be partially explained by differences in the future emissions scenarios used in the two studies. Nevertheless, these results point to very large uncertainties in the response of terrestrial biosphere models to climate change and rising CO2 concentrations and in the overall strength of the feedbacks they predict. While the framework developed by Friedlingstein et al. [2006] is useful for evaluating the overall strength of feedback responses within a given model and for comparing concentration and climate sensitivities between models, it provides no indication about the likelihood of any model being correct. In addition, multiple factors contribute to the apparent strength of the βL, βO, γL, and γO sensitivities, and the concentration and climate sensitivities interact with each other nonlinearly through biological and chemical processes [Gregory et al., 2009].

In the studies described above characterizing carbon cycle feedback processes, no comparisons were made to observations. This is the next crucial step for reducing uncertainties associated with future scenarios of global climate change. Recent research has made initial steps in this direction. Cox et al. [2013] used the observed relationship between the growth rate of atmospheric CO2 and tropical temperature as a constraint to reduce predicted uncertainty in the land carbon storage sensitivity to climate change (γL) in the tropics in C4MIP models. Similarly, Gillett et al. [2013] used the ratio of warming to cumulative emissions of CO2 to estimate a transient response to cumulative emissions from observations for comparison with 12 CMIP5 models. Such innovative use of contemporary measurements to constrain carbon cycle responses to climate change is important for reducing the range of uncertainty in future climate change projections [Randerson, 2013]. Moreover, comparisons with data sets derived from the synthesis of measurements collected over a wide geospatial range can provide constraints on individual processes and on carbon cycle responses that are sensitive to initial conditions. Todd-Brown et al. [2013] compared soil carbon stocks from 11 CMIP5 models with the Harmonized World Soil Database and the Northern Circumpolar Soil Carbon Database. Despite reasonable global-scale agreement with these observations, most ESMs failed to reproduce grid-scale soil carbon variations, suggesting that key processes may be missing in the majority of ESMs.

The goal of this paper is to identify long-term CO2 biases in emission-driven simulation results produced by ESMs participating in CMIP5 and describe the causes and implications of those biases for future climate projections during the middle and latter half of the 21st century. In our analysis, we developed a new approach using contemporary inventory observations and structural information about feedbacks within the CMIP5 models to constrain future CO2 predictions and to reduce uncertainties associated with the range of possible CO2 mole fractions consistent with the Representative Concentration Pathway (RCP) 8.5 emissions scenario.

2 Methods

2.1 Model Descriptions

We analyzed historical and future emission-driven simulation results produced using ESMs for CMIP5. The historical simulations, referred to as experiment 5.2 or esmHistorical [Taylor et al., 2012], were forced with spatially distributed CO2 emissions reconstructed from fossil fuel consumption estimates [Andres et al., 2011] for the period 1850–2005. The future simulations, referred to as experiment 5.3 or esmrcp85 [Taylor et al., 2012], were forced with projected CO2 emissions for the period 2006–2100, following the scenario described by the Representative Concentration Pathway (RCP) 8.5 [Moss et al., 2010]. Model output was obtained primarily from the Earth System Grid Federation, an international network of distributed climate data servers [Williams et al., 2011].

Simulation results were produced by fully coupled ESMs with interactive terrestrial and marine biogeochemistry models, which feature climate–carbon cycle feedback mechanisms. Since the simulations were forced with CO2 emissions, these models prognostically computed global atmospheric CO2 mole fractions, which represent an integration of physical, chemical, and biological processes on Earth and their interactions and feedbacks with the climate system. The ESMs employed different aerosol emissions, land use change processes, and process parameterizations, leading to a range of different aerosol and greenhouse gas concentrations, radiative forcings, and climate interactions. The ability of models to accurately reproduce the observed atmospheric CO2 mole fraction trajectory over the historical period provides a broad indication of model fidelity, a necessary but not sufficient condition for credible ESM performance. Each of the models that generated output used in this study is listed in Table 1.

Table 1. Models That Generated Output Used in This Studya
  Component Models and Resolutions
ModelModeling Center (or Group)AtmosphereLandOceanSea Ice
  1. a

    Atmospheric CO2 required unit correction.

  2. b

    Ocean carbon flux required unit correction.

  3. c

    FGOALS-s2 model provided no ocean carbon fluxes.

  4. d

    GFDL-ESM2g and GFDL-ESM2m output available beginning January 1861.

  5. e

    HadGEM2-ES output available for December 1859 through November 2099; annual atmospheric CO2 obtained directly from Hadley Center.

  6. f

    IPSL-CM5A-LR monthly atmospheric CO2 obtained directly from IPSL.

  7. g

    MPI-ESM-LR provided three esmHistorical realizations and one esmrcp85 realization.

  8. h

    Atmospheric CO2 mole fraction was computed from three-dimensional output.

BCC-CSM1.1 [Wu et al., 2013]Beijing Climate Center, ChinaAGCM2.1BCC_AVIM1.0MOM4_L40SIS
 Meteorological Administration,(2.875°×2.875°,L26)(2.875°×2.875°)math formulamath formula
BCC-CSM1.1(m) [Wu et al., 2013]Beijing Climate Center, ChinaAGCM2.2BCC_AVIM1.0MOM4_L40SIS
 Meteorological Administration,(1.125°×1.125°,L26)(1.125°×1.125°)math formulamath formula
BNU-ESMb [Dai et al., 2003, 2004;Beijing Normal University,CAM3.5CoLM3 andMOM4p1 and IBGCCICE4.1
College of Global Change andChina(2.875°×2.875°,L26)BNUDGVM (C/N)math formulamath formula
Earth System Science, 2012]  (2.875°×2.875°,L10)  
CanESM2c [Arora et al., 2011]Canadian Center for ClimateCanAM4CLASS2.7 and CTEMCanOM4 and CMOCCanOM4
 Modeling and Analysis, Canada(2.81°×2.81°, L35)(2.81°×2.81°)(1.41°×0.94°, L40)(2.81°×2.81°)
CESM1-BGCd [Hurrell et al., 2013;Community Earth SystemCAM4CLM4 (0.9°×1.25°)POP2 and NPZDCICE4
Keppel-Aleks et al., 2013;Model Contributors, NSF-DOE-(0.9°×1.25°, L30) math formulamath formula
Long et al., 2013]NCAR, USA    
FGOALS-s2.0e [Bao et al., 2013;LASG, Institute of AtmosphericSAMIL2.4.7CLM3 and VEGAS2.0LICOM2.0CSIM5
Liu e al., 2012; Lin et al., 2013]Physics, CAS, China(1.67°×2.81°, L26)(1.67°×2.81°)math formulamath formula
GFDL-ESM2g, GFDL-ESM2mf [DunneNOAA Geophysical FluidAM2 (2°× 2.5°, L24)LM3 (2°× 2.5°)MOM4SIS
et al., 2012; 2013]Dynamics Laboratory, USA  math formulamath formula
HadGEM2-ESg [Collins et al., 2011;Met Office Hadley Center, UKHadGAM2 and UKCAMOSES2 and TRIFFIDHadGOM2 andHadGOM2
Jones et al., 2011] (1.25°×1.875°, L38)(1.25°×1.875°)diat-HadOCCmath formula
    math formula 
INM-CM4h [Volodin et al., 2010]Institute for Numerical(2°×1.5°, L21)(2°×1.5°)(1°×0.5°, L40)(1°×0.5°)
 Mathematics, Russia    
IPSL-CM5A-LRa [Dufresne et al., 2013]Institut Pierre Simon Laplace,LMDZ4ORCHIDEE (3.75°×1.9°)ORCA2 and PISCESLIM2
 France(3.75°×1.9°, L39) math formulamath formula
MIROC-ESMa [Watanabe et al., 2011;Japan Agency for Marine-EarthMIROC-AGCM andMATSIRO and SEIB-DGVMCOCO3.4 and NPZDCOCO3.4
Oschlies, 2001]Science and Technology,SPRINTARS(2.875°×2.875°,L6)(1.5°×1°, L44)(1.5°×1°)
 Atmosphere and Ocean Research(2.875°×2.875°,L80)   
 Institute (University of Tokyo),    
 and National Institute for Environ-    
 mental Studies, Japan    
MPI-ESM-LRgh [Maier-Reimeret al., 2005;Max Planck Institute forECHAM6JSBACH (2.81°×2.81°)MPIOM and HAMOCCMPIOM
Raddatz et al., 2007; Brovkin et al., 2009]Meteorology, Germany(2.81°×2.81°, L47) (1.5°×1.5°, L40)(1.5°×1.5°)
MRI-ESM1 [Yukimoto et al., 2011;Meteorological Research Institute,GSMUVHAL and MRI-LCCM2MRI.COM3MRI.COM3
Nakano et al., 2011; Yukimoto et al., 2012;Japan (0.75°×0.75°)(1°×0.5°, L51)(1°×0.5°)
Obata and Shibata, 2012] (0.75°×0.75°, L48)   
NorESM1-ME [Bentsen et al., 2013;Norwegian Climate Center,CAM4-OsloCLM4 (1.9°×2.5°)MICOM and HAMOCCCICE4
Iversen et al., 2013; Tjiputra et al., 2013]Norway(1.9°×2.5°, L26) math formulamath formula

2.2 Model Output

Monthly output of prognostic atmospheric CO2 and surface ocean CO2 flux from emission-driven ESM simulations were analyzed to evaluate the evolution of the carbon cycle over the twentieth and 21st centuries. Atmospheric CO2 was obtained either as the total atmospheric mass of CO2 and converted to mole fraction or as the atmospheric CO2 mole fraction at every atmosphere model layer. In the latter case, the global mole fraction was calculated as the area-weighted mean of CO2 in the lowest atmosphere level. Surface ocean CO2 flux was integrated spatially to determine global carbon uptake and further integrated over time to estimate the global change in ocean carbon inventory. While the net terrestrial CO2 flux was available for some models in the form of net biospheric productivity, here annual land carbon uptake was calculated as the difference between the prescribed annual anthropogenic emissions and the sum of the annual change in atmosphere and ocean carbon inventories. Therefore, the change in land carbon storage for a given year was estimated as

display math(1)

where Fi was the total anthropogenic fossil carbon emissions from all sources i (fossil fuel burning and cement production) for that year, ΔCA was the change in atmospheric carbon storage for that year, and ΔCO was the change in ocean carbon storage for that year. A single trajectory of annual anthropogenic carbon emissions, derived from the experimental forcing, was used in calculations for all model results. Carbon fluxes due to land use change were included implicitly in ΔCL and were not included explicitly in total fossil carbon emissions math formula. We assumed Fi in each model followed the historical and RCP 8.5 time series on the Potsdam Institute for Climate Impact Research site (∼mmalte/rcps/index.htm). We also assumed the individual ESMs were at steady state at the beginning of the historical simulation (i.e., that drift in the control simulation was minimal).

2.3 Comparing CMIP5 ESMs With Long-Term Carbon Cycle Observations

Atmospheric CO2 mole fraction observations used for comparison with model projections of atmospheric CO2 over the historical period were the same as those used to force the corresponding concentration-driven simulations, which were not analyzed here. Compiled by Tom Wigley and Malte Meinshausen, these “end-of-year CO2 concentrations” consist of a combination of 75 year smoothed Law Dome ice core data [Etheridge et al., 1996] up to 1832, 20 year smoothed Law Dome ice core data for 1823–1958, the Keeling Mauna Loa record, with 0.59 ppm subtracted (which is the Mauna Loa mean minus the NOAA global mean over 1982–1986) for 1959–1981, and the NOAA global mean value for 1982–2008 [Conway et al., 1994]. Development of these and related forcing data for preindustrial control, twentieth century, and RCP simulations are described by Meinshausen et al. [2011].

Sabine et al. [2004] used inorganic carbon measurements from the World Ocean Circulation Experiment and the Joint Global Ocean Flux Study, both conducted in the 1990s, and the tracer-based ΔC* separation technique to estimate the global oceanic anthropogenic CO2 sink for the period 1800–1994. Their ocean inventory estimate of 118±19 PgC accounts for approximately 48% of the total emissions from fossil fuel burning and cement production. They subtracted this ocean inventory estimate and the change in atmospheric inventory over the same period of 165±4 PgC from the estimate of cumulative emissions of 244±20 PgC to obtain a cumulative terrestrial biosphere source of 39±28 PgC.

More recently, Khatiwala et al. [2009] applied a Green's function model for ocean tracer transport, estimated from tracer and salinity data using a maximum entropy deconvolution technique, to simulate the time evolution of the ocean inventory and uptake rate of anthropogenic CO2 for the period 1765–2008. They estimated the ocean inventory and uptake rate in 2008 to be 140±25 PgC and 2.3±0.6 PgCyr−1, respectively. When they adjusted the estimate to include the Arctic Ocean and marginal seas not represented in the Global Ocean Data Analysis Project (GLODAP) database, the global inventory increased by approximately 11 PgC. Using annual estimates of anthropogenic emissions and the atmospheric inventory, including uncertainties, they produced a trajectory for the terrestrial carbon budget, indicating that the terrestrial biosphere was a source of anthropogenic CO2 until the 1940s, after which it became a sink. Tans [2009] performed a similar mass balance calculation using an empirical pulse response function constrained by the integrated ocean uptake in 1994 [Sabine et al., 2004], the 1993–2002 uptake rate centered on late 1997 from atmospheric oxygen measurements [Manning and Keeling, 2006], and the 1995–2000 uptake rate estimate from an ocean inverse model [Gruber et al., 2009]. Deriving net terrestrial emissions as a residual, instead of including land use emissions explicitly due to their large uncertainty, Tans [2009] also found that net terrestrial emissions were positive before 1940 and were negative thereafter, making their cumulative contribution in 2008 small.

Khatiwala et al. [2013] produced a newly updated global ocean anthropogenic carbon sink trajectory through 2010 using the Green's function model. A cumulative sum of this ocean uptake provided an ocean anthropogenic carbon inventory estimate for 2010 of 150±26 PgC. Adding a partial estimate for accumulation in marginal seas and coastal areas from Lee et al. [2011] of 8.6±0.6 PgC yielded a more spatially comprehensive estimate of 160±26 PgC. Since the Lee et al. [2011] estimate was a lower bound, the upper bound was constrained using multiple Community Climate System Model-based simulations, resulting in a range for the inventory outside the GLODAP region of 9–14 PgC. However, Khatiwala et al. [2013] ultimately computed a “best estimate” inventory for the GLODAP region in 2010 of 143 PgC by averaging results from three different inversion methods, including the Green's function model. Using the above range for marginal seas and coastal areas, they provide a 2010 global best estimate inventory of 152–157 PgC. Selecting the midpoint value yields a final estimate of 155 PgC with an uncertainty of ±20%. Here we scaled the Green's function time series to obtain the 155 PgC best estimate for 2010 and to account for marginal seas and coastal areas.

The Sabine et al. [2004] and Khatiwala et al. [2013] data-based estimates with uncertainties provide valuable global constraints on model carbon cycle processes and feedbacks. However, these inventory estimates must be further adjusted to the 1850 equilibrium starting date of model simulations. The Sabine et al. [2004] ocean inventory estimate for 1994 was adjusted by subtracting the difference between the 1850 and 1800 ocean inventory estimates from the scaled Khatiwala et al. [2013] 1765–2010 time series, yielding 109±19 PgC. Similarly, the scaled Khatiwala et al. [2013] ocean inventory best estimate for 2010 was adjusted by subtracting the 1850 value, yielding 141±38 PgC. Using the adjusted trajectory of ocean uptake and applying equation (1) on the time series through 2010, we calculated total carbon accumulation in the ocean and on land from 1850 to 2010 with their uncertainties based on Khatiwala et al. [2013] uncertainty estimates (Figure S1). Land and ocean carbon sinks computed using this approach were consistent with combined estimates reported by Ballantyne et al. [2012]; however, here the net land flux included land use emissions.

2.4 A Framework for Constraining Future Trends

One approach for reducing uncertainties using contemporary observations is to identify relationships between contemporary variability and future trends within the models and constrain the contemporary variability using observations. This strategy was employed by Hall and X. Qu [2006], who evaluated the strength of the springtime snow albedo feedback (ΔαS/ΔTS) from 17 models used for the IPCC Fourth Assessment Report and compared them with the observed springtime snow albedo feedback from the International Satellite Cloud Climatology Project and ERA-40 reanalysis data. They found a linear relationship between model predictions of seasonal and 21st century snow albedo feedbacks. Hall and X. Qu [2006] assumed that this relationship, which represents consistency in the structure of these models, accurately reflects functional behavior in nature and used observational estimates of the contemporary seasonal cycle snow albedo feedback to constrain the longer-term snow albedo feedback that occurs in the models during the 21st century. More recently, this approach was applied by Cox et al. [2013] to the carbon cycle. In this latter study, the authors were able to show that the long-term climate sensitivity of tropical carbon fluxes was related to this same sensitivity on interannual timescales. By using contemporary observations, they were able to narrow the likely range of future model scenarios, showing that the likelihood of forest dieback was probably overestimated in earlier work.

As described below, we found a similar linear relationship over decadal timescales between contemporary and future atmospheric CO2 mole fractions in CMIP5 emission-driven simulations. Specifically, models that had higher positive biases in atmospheric CO2 mole fraction by the end of the observational era in 2010 tended to predict higher atmospheric CO2 levels during the 21st century for the RCP 8.5 scenario than models that more closely matched the observations. We used this relationship, and implicitly the collection of CMIP5 models, to construct a hypothetical model that was tuned to contemporary observations, hereafter referred to as the contemporary CO2 tuned model, or CCTM.

The CCTM estimate was obtained using the following approach. First, we computed a linear regression between atmospheric CO2 at each future year (yaxis) and atmospheric CO2 during 2010 (x axis), defined as the 5 year mean for 2006–2010. In particular, we repeatedly applied the regression formula

display math(2)

where xi was the 2010 CO2 mole fraction, yi was the future CO2 mole fraction, and εi was an error term for every model i=1,…,n. β0 and β1 were two parameters, representing the y-intercept and the slope of the resulting line, respectively. Here, n=17, representing the 17 separate simulations from 15 models from the CMIP5 collection. For every future interval, the error term εi was minimized using ordinary least squares to yield a linear regression model,

display math(3)

where math formula was the predicted future CO2 mole fraction from the linear regression model that minimizes the residual, math formula. The least squares estimates for the parameters were calculated using a standard algebraic approach. As shown in the results section below, these regressions were statistically significant through 2100, although uncertainties increased through time. Second, we estimated the intersection of this regression with a vertical line representing NOAA Global Monitoring Division (GMD) observations in 2010 (384.6±0.5 ppm; a 5 year mean centered on 2008) Conway et al., [1994]. This intercept at each time interval and the 95% confidence limits on the intercept comprised our CCTM estimate.

The CCTM estimate allowed us to inquire what might be the impact of tuning ESMs to capture the observed recent trajectory of global atmospheric CO2. This approach takes advantage of the collection of CMIP5 models—including the wide range of sensitivities of gross land and ocean carbon fluxes to elevated CO2 and climate changes, residence time distributions of carbon in ocean and land reservoirs, and feedbacks—to create an estimate with a zero bias at the end of the observed record. As such, it can be thought of as a “black box” approach to representing the carbon cycle. It was useful in developing approaches for analyzing ESM uncertainties because of the long-term bias persistence observed for this set of models.

We also developed a similar multimodel constraint on the evolution of ocean and land cumulative flux (inventory) time series to better understand why atmospheric CO2 biases were so persistent. As described in the results, observational uncertainties were considerably higher for the ocean and land inventories, and, as a consequence, it was not possible to reduce uncertainties in future estimates by the same amount as for atmospheric carbon dioxide.

2.5 Calculating Climate Implications of CO2 Biases

Individual models directly calculate radiative forcing and surface temperature responses to anthropogenic CO2 and therefore have different climate sensitivities (α). For our analysis, we chose to use a standard method for approximating radiative forcing and subsequent temperature changes to equitably assess the climate implications of CO2 biases across all models. We adopted the method described by Boucher and Reddy [2008], who employed an impulse response function (IRF) to describe the evolution of atmospheric CO2 and global surface temperature. First, we followed Ramaswamy at el. [2001] to approximate radiative forcing due to anthropogenic CO2 at time t as

display math(4)

where m was set equal to 5.35 Wm−2and [CO2] (t0) was defined as 284 ppm in the year 1850. Second, we calculated CO2-corrected predictions of future surface temperature following the method of Boucher and Reddy, [2008, Appendix A]. This method approximates the delayed response of surface temperature to radiative forcing as a sum of two exponentials with adjustment times of 8.4 and 410 years. Coefficients c1 and d1 in the exponentials [Boucher and Reddy, 2008 Appendix A] were multiplied by 0.895 to obtain a transient climate response of 1.9 K per doubling of CO2 as reported by Gillett et al. [2013] for the mean of the CMIP5 ESMs. For details of the surface temperature correction, please see supporting information. While using the IRF from another ESM might alter the mean temperature change per unit of radiative forcing presented here, it would not change the order among models. We note that this calculated temperature change accounts only for CO2-driven climate change and does not include observed cooling due to aerosols or contributions from other greenhouse gases like methane (CH4), nitrous oxide (N2O), and chlorofluorocarbons (CFCs).

2.6 Quantifying Uncertainty

Sabine et al. [2004] provided uncertainty estimates for their estimate of the ocean anthropogenic carbon inventory. Khatiwala et al. [2013] used the GLODAP/WOA05 databases to generate global estimates of historical anthropogenic CO2 ocean uptake, and they propagated uncertainties from these databases through their Green's function model to provide uncertainties for these uptake estimates. We used these uncertainties in quadrature to provide an uncertainty range for the Khatiwala et al. [2013] inventory and propagated them through equation (1) to provide estimates of uncertainty for land carbon accumulation. For the linear regression models used here to construct the CCTM estimate, 95% confidence intervals were calculated and propagated into estimates of atmospheric CO2, radiative forcing, and temperature change. For purposes of uncertainty comparison, the 95% confidence interval ranges for the CCTM were compared with the 95th percentile of the range for the multimodel distribution, assuming a normal distribution.

3 Results

3.1 Contemporary Biases

Comparison of ESM prognostic atmospheric CO2 mole fraction over the historical period with observations indicated that ESMs, on average, had a high bias in their predictions of contemporary atmospheric CO2 (Figure 1a). For the multimodel mean, this high bias was persistent from 1946 throughout the twentieth century (Figure 1b). By the end of the historical model simulation period (2005), the multimodel mean was 5.6 ppm above observations and the models ranged from 21.7 ppm below to 26.2 ppm above the observed CO2 mole fraction of 378.8 ppm. Of the 19 historical simulations from 15 ESMs included in this analysis, only two predicted a CO2 mole fraction well below observations in 2005. By 2010, near the end of the observational record, the multimodel mean was 7.9 ppm higher than the global mean CO2 mole fraction reported by NOAA GMD [Conway et al., 1994]. This bias was probably a conservative estimate of the true multimodel mean bias because fossil fuel emissions from the RCP 8.5 scenario during 2006–2010 (8.6 PgCyr−1) were slightly lower than the observed emissions (8.7 Pg C yr−1) [Peters et al., 2013; Le Quéré et al., 2013].

Figure 1.

(a) Most ESMs exhibit a high bias in atmospheric carbon dioxide (CO2) mole fraction. The predicted atmospheric CO2 mole fraction for the 19 historical simulations shown here ranges from 357 to 405 ppm at the end of the CMIP5 historical period (1850–2005). (b) The multimodel mean is biased high from 1946 throughout the remainder of the twentieth century, ending 5.6 ppm above observations in 2005.

3.2 Causes of the Contemporary Bias

Most ESMs exhibited a small or moderate low bias in ocean carbon accumulation from 1870 to 1930 when compared with adjusted estimates from Khatiwala et al. [2013], but most ESMs were contained within the envelope of observational uncertainty after 1930 (Figure 2a). Ocean carbon accumulation ranged from 88 to 261 PgC, with a multimodel mean of 145 PgC, as compared with observational estimates of 142±38 PgC through year 2010. Excluding the two outlier models that had unlikely land contemporary sink estimates (FGOALS-s2.0 and MRI-ESM1), the range of ocean carbon accumulation was reduced to 101–210 PgC with a mean of 141 PgC at 2010, a better match with observations. However, most ocean models achieved this correspondence with observational estimates primarily as a consequence of high biases in atmospheric CO2 mole fraction. Normalizing ocean carbon accumulation with atmospheric accumulation math formula provided a measure of the strength of ocean carbon storage in emissions-forced simulations that partially accounted for atmospheric CO2 biases. Performing this normalization and comparing with adjusted ocean inventories from Sabine et al. [2004] for 1994 (Figure S2) and from Khatiwala et al. [2013] for 2010 (Figure 3) indicated that the majority of models were near or below the observed ratio. Across the different models, the ocean/atmosphere ratio ranged from 0.42 to 0.99, with a multimodel mean of 0.61, which compared well with the observational estimate of 0.64±0.15 in 2010. Excluding the same two outlier models (FGOALS-s2.0 and MRI-ESM1), the range of the ocean/atmosphere ratio was reduced to 0.42–0.91, with a mean of 0.58.

Figure 2.

(a) Ocean and (b) land anthropogenic carbon inventories from CMIP5 models compared to estimates from Khatiwala et al. [2013]. Most ESMs exhibit a low bias in ocean anthropogenic carbon accumulation from 1870 to 1930 as compared with adjusted estimates from Khatiwala et al. [2013]. While some models enter the envelope of observational uncertainty later in the twentieth century, this was often a consequence of the increasing high bias in atmospheric CO2 mole fractions. ESMs had a wide range of land carbon accumulation responses to increasing atmospheric CO2 and land use change, ranging from a cumulative source of 170 PgC to a cumulative sink of 107 PgC in 2010. In these figures, solid colored lines represent historical simulation results and the extending dashed line segments represent the first 5 years of the RCP 8.5 simulations. The shaded polygon represents the uncertainties surrounding the adjusted observational estimates of ocean and land carbon accumulation, and the error bars correspond to the ±20% uncertainty in the Khatiwala et al. [2013] best estimate of ocean carbon accumulation for 2010.

Figure 3.

Reconstructed atmospheric CO2 levels and observationally based estimates of ocean carbon uptake from Khatiwala et al. [2013] provide constraints on carbon inventories in the ocean, and on land when combined with fossil fuel and atmospheric CO2 observations. While ocean carbon accumulation appears adequate in some model results, ocean carbon accumulation in most ESMs shows a low bias once normalized by atmospheric accumulation (lower right panel).

Terrestrial biosphere models within ESMs also had a wide range of responses, with both positive and negative net carbon accumulation throughout the twentieth century (Figures 3 and S2). Terrestrial and ocean carbon accumulation compensated for one another (R=−0.91, Figure S3), reducing the bias in predicted atmospheric CO2. This compensation effect was exemplified by the INM-CM4 model, which had the correct atmospheric CO2 in 2005, but had strong ocean uptake that was balanced by weak land carbon uptake. During the second half of the twentieth century, the land carbon sink was persistent with high rates during the 1990s and 2000s (Table 2). Thought to be due to changes in human land use (i.e., reduced deforestation, new afforestation, and secondary regrowth of previously cleared land), wildfire suppression [Girod et al., 2007; Hurtt et al., 2002], and enhanced forest growth due to rising atmospheric CO2 levels and higher rates of nitrogen deposition [Pan et al., 2011; Phillips et al., 2009], this growing land sink reinforced rising ocean uptake rates and resulted in a doubling of global carbon uptake between 1960 and 2010 [Ballantyne et al. 2012]. Although the multimodel mean distribution of land sinks closely matched the observations, individual model estimates varied widely. BCC-CSM1.1-M, CESM1-BGC, FGOALS-s2.0, GFDL-ESM2M, HadGEM2-ES, INM-CM4, and NorESM1-ME tended to underestimate land sinks, whereas CanESM2 and MRI-ESM1 tended to overestimate them (Figure 2b).

Table 2. Comparison of Observationally Based Estimates of Decadal Atmosphere, Ocean, and Land Uptake Rates of Anthropogenic CO2 With CMIP5 Model Predictionsa
 1960–1969 1970–1979 1980–1989 1990–1999 2000–2010
ModelFFEAtmOcnLnd FFEAtmOcnLnd FFEAtmOcnLnd FFEAtmOcnLnd FFEAtmOcnLnd
  1. a

    Only models that provided both atmospheric CO2 and ocean carbon fluxes are included in the multimodel means. Positive values represent additions to the atmosphere, ocean, or land reservoirs, and negative values represent losses. All units are PgCyr−1.

Multimodel Mean
BCC-CSM1.1-M 1.81.3−0.2  3.01.7−0.1
BNU-ESM 1.81.1−0.0  4.32.0−0.1
CanESM2 r1  4.01.6−0.1  5.72.3−0.3
CanESM2 r2 2.10.9−0.1  3.91.7−0.2
CanESM2 r3  4.92.0−0.7
CESM1-BGC 2.41.2−0.6  3.21.5−0.1  4.01.9−0.4  4.22.2−0.2
FGOALS-s2.0 2.12.1−1.3  3.52.8−1.6  3.63.4−1.5  3.94.0−1.6  4.94.6−1.8
GFDL-ESM2G 1.71.4−0.1
GFDL-ESM2M 1.41.5−0.0
HadGEM2-ES 2.71.6−1.3  2.91.8−0.1  3.82.0−0.4
INM-CM4 2.11.7−0.8  3.12.4−0.9  3.52.8−0.9  3.63.3−0.7  5.04.0−1.2
MIROC-ESM  4.02.2−0.1
NorESM1-ME 2.01.4−0.5  3.01.7−0.2  3.62.1−0.3  4.02.5−0.3

3.3 Implications of Contemporary Atmospheric CO2 Biases in CMIP5 Models

High atmospheric CO2 biases produced radiative forcing during the latter half of the twentieth century that was too large in the affected ESMs (Table 3). For the year 2010, the multimodel mean atmospheric CO2 mole fraction was 7.9 ppm above observations, corresponding to a radiative forcing that was 0.10 Wm−2higher than that obtained from the observed atmospheric CO2 mole fraction. The integrated effect of the radiative forcing bias from the multimodel mean during the nineteenth and twentieth centuries led to CO2-induced temperature change that was 0.06°C higher by 2010 than an estimate derived from the observed CO2 trajectory. Across all ESMs, the temperature change bias for 2010 ranged from −0.20°C to 0.24°C. Because land and ocean carbon uptake rates are likely to be reduced with climate warming (negative γL and γO), these temperature biases have the potential to further reinforce atmospheric CO2 biases in the 21st century, leading to persistent and divergent biases into the future for many aspects of the climate system, unless compensated for by biases in concentration-carbon feedbacks (βL and βO) or climate sensitivities (α). Atmospheric CO2 mole fraction projections out to 2100 under the RCP 8.5 scenario for all ESMs are shown in Figure S4. Corresponding anthropogenic carbon inventories for the ocean and land out to 2100 are shown in Figure S5.

Table 3. Atmospheric CO2 Mole Fraction, Radiative Forcing, and Resulting Temperature Changes for Each of the CMIP5 ESMs, the Multimodel Mean, the CCTM Estimate, and the Combination of Observed and RCP 8.5 Projection for the Years 2010, 2060, and a
 CO2 Mole Radiative Cumulative ΔT xx-xx
 Fraction (ppm) Forcing (Wm−2) ΔT (°C) Bias (°C)
Model201020602100 201020602100 201020602100 201020602100
  1. a

    Values are 5 year means for the time periods 2006–2010, 2056–2060, and 2096–2100.

BCC-CSM1.1390603945 1.704.036.43 0.972.394.02 0.030.02−0.01
BCC-CSM1.1-M396619985 1.784.166.65 1.042.494.16
BNU-ESM382602963 1.594.026.53 0.902.334.07 −0.04−0.040.04
CanESM2 r13946411024 1.754.366.86 0.982.584.30
CanESM2 r23926411023 1.724.356.85 0.982.574.30
CanESM2 r33966411025 1.784.356.87 1.012.584.30
CESM1-BGC4076971121 1.924.807.34 1.122.854.64 0.180.480.61
FGOALS-s2.0404636993 1.894.316.70 1.092.574.23
GFDL-ESM2G395616967 1.774.146.56 1.042.494.12
GFDL-ESM2M400621964 1.834.186.54 1.092.524.13
HadGEM2-ES411636983 1.984.316.64 1.182.604.20
INM-CM4386591897 1.643.926.15 0.922.363.86 −0.02−0.01−0.17
IPSL-CM5A-LR375573908 1.483.756.22 0.862.213.87 −0.08−0.16−0.16
MIROC-ESM3986581121 1.814.507.35 1.062.674.58 0.120.300.55
MPI-ESM-LR r1383590948 1.603.916.45 0.952.314.03 0.01−0.060.00
MRI-ESM1361516778 0.741.893.33 −0.20−0.48−0.70
NorESM1-ME3916671070 1.724.577.09 0.982.684.46 0.040.310.43
Multimodel Mean392621980 1.724.186.63 1.002.484.17
CCTM Estimate385600948 1.624.016.45 0.942.374.03 
Historical + RCP 8.5385590917 1.633.916.27 0.942.323.93 0.00−0.05−0.10

3.4 Persistence of Biases Into the Future

To explore the persistence of atmospheric CO2 biases beyond the present, we examined the relationship between 5 year mean contemporary and future atmospheric CO2 mole fractions from ESMs. Figure 4a reveals a strong linear relationship between the predicted sizes of contemporary and future atmospheric CO2 biases in 2060 with a coefficient of determination R2=0.70. This correlation declined to R2=0.54 in 2100 (Figure 4b) probably as a consequence of varying climate–carbon cycle feedbacks taking effect in different models. Because model biases in atmospheric CO2 mole fraction are persistent, biases at year 1850 affect biases at year 2010. To investigate the impact of different model baselines, we also examined the relationship between the 5 year mean contemporary and future anthropogenic atmospheric carbon inventory in 2060 (Figure S6a) and 2100 (Figure S6b), taking into account uncertainties from measurements of nineteenth century CO2 and fossil emissions. This alternative metric slightly changed the ordering of models and strengthened the coefficient of determination, further confirming the robustness of the bias persistence relationship. To explore the value of a tuned model with no CO2 bias at the end of the historical period, we compared the CCTM estimate described in section 2 with the set of CMIP5 model predictions and the RCP 8.5 CO2 mole fraction trajectory. Figure 5 shows the coefficients of determination (R2) of the CCTM atmospheric CO2 mole fraction trajectory, as well as for the trajectories for ocean and land carbon accumulation when the same method is applied for those reservoirs. All of the coefficients of determination peak at one for the contemporary tuning year (2008, the center of the 2006–2010 averaging period), as expected, and decrease on either side, into the past and future. Statistical significance (p<0.05) was maintained with N=17 model results for R2values above 0.23 (i.e., after about 1910 and through 2100 for atmospheric CO2). The resulting atmospheric CO2 trajectory for 1850–2100 is shown as the red line in Figure 6.

Figure 4.

(a) Future (2060) versus contemporary (2010) atmospheric CO2 mole fraction fit for CMIP5 emissions-forced simulations of RCP 8.5 and (b) future (2100) versus contemporary (2010) atmospheric CO2 mole fraction for the same set of model simulations. The observed atmospheric CO2 mole fraction is represented by the vertical line at 384.6 ppm with an uncertainty range (±0.5 ppm) shown in gray. The linear regression model is represented by the blue line surrounded by red dashed lines indicating a 95% confidence interval. While a point is plotted for the historical observed atmospheric CO2 and the RCP 8.5 concentration trajectory derived from a reduced form model without explicit feedbacks, that point is not included in the linear regression.

Figure 5.

The coefficients of determination (R2) for the multimodel bias structure, from which the contemporary CO2 tuned model (CCTM) was derived, relative to the set of CMIP5 model atmospheric CO2 mole fractions (black) and oceanic (blue) and land (green) anthropogenic carbon inventories in 2010, defined as the 5 year mean for the period 2006–2010.

Figure 6.

The contemporary CO2 tuned model (CCTM) atmospheric CO2 mole fraction estimate compared to the CMIP5 multimodel mean trajectory. The pink range surrounding the CCTM represents the 95% confidence interval from the linear model around the contemporary observation projected onto the y axis of historical or future CO2 mole fractions for every year. The blue line represents the multimodel mean CO2 trajectory, and the blue range indicates the 95th percentile of the range for the multimodel standard deviation, assuming a normal distribution (1.96σ).

The CCTM estimate suggests that for a tuned model, future atmospheric CO2 in 2060 under the RCP 8.5 scenario would be 600±14 ppm (including the 95% confidence interval of the estimate). In contrast, the multimodel mean atmospheric CO2 mole fraction in 2060 was 621±80 ppm, which was above and outside the confidence interval for the CCTM estimate (Figure 7a). Individual model predictions spanned a range from 516 to 697 ppm in 2060. The spread of the CCTM was considerably smaller than that of the multimodel 95th percentile distribution spread. In 2100, the CCTM estimate yielded an atmospheric CO2 mole fraction of 948±35 ppm, while the multimodel mean prediction was 980±161 ppm (Figure 7b). The CanESM2, CESM1-BGC, MIROC-ESM, and NorESM1-ME models predicted atmospheric CO2 mole fractions greater than 1000 ppm by 2100. In terms of anthropogenic atmospheric carbon accumulation, the CCTM estimate in 2060 under the RCP 8.5 scenario was 672±28 PgC (including the 95% confidence interval of the estimate). The multimodel mean anthropogenic atmospheric carbon accumulation in 2060 was 715±173 PgC, which was above and outside the confidence interval for the CCTM estimate (Figure S7a). In 2100, the CCTM estimate yielded an anthropogenic atmospheric carbon accumulation of 1412±72 PgC, while the multimodel mean prediction was 1488±347 PgC (Figure S7b).

Figure 7.

The probability density of CO2 mole fraction predictions from the CCTM peaks lower than the probability density for multimodel mean for (a) 2060 and (b) 2100. In addition, the width of the probability density is much smaller for the CCTM, by almost a factor of 6 at 2060 and almost a factor of 5 at 2100, indicating a significant reduction in the range of uncertainty for the CCTM prediction.

To assess the mechanisms causing the strong relationship between contemporary and future atmospheric CO2 levels among the models, we also developed CCTM-like estimates for the individual ocean and land inventories (Figure S8). This analysis revealed that the ordering of ocean inventories among the models was more persistent into the future than for land inventories, but for both components, statistically significant multimodel relationships existed between contemporary (2010) and future values through the end of the 21st century (Figure 5). However, because uncertainties in ocean and land inventories were larger, constraints offered by contemporary observations were considerably weaker than for atmospheric CO2, in terms of the future evolution of these inventory components (Figure S9).

3.5 Implications of a Persistent Atmospheric CO2 Bias

To explore the climate implications of the persistent atmospheric CO2 biases described above, we compared the radiative forcing (equation (4)) and the resulting temperature change (equations S1 and S2) for the CCTM estimate and the set of CMIP5 model predictions. Figure 8a shows the radiative forcing due only to CO2 calculated for each of the CMIP5 models. The model range was 5.4–7.4 Wm−2at year 2100 for RCP 8.5. Figure 8b shows the multimodel mean radiative forcing compared to the radiative forcing for the CCTM estimate. As with the CO2 comparison described above, the spread of the CCTM was considerably smaller than that of the multimodel 95th percentile distribution spread. In 2100, the CCTM estimate yielded a radiative forcing of 6.4±0.2 Wm−2, while the multimodel mean prediction was 6.6±0.9 Wm−2. Figure 8c shows the corresponding cumulative temperature change due to this CO2 radiative forcing for each of the CMIP5 models. The temperature increase for the models ranged from 3.3°C to 4.6°C. Figure 8d shows the corresponding multimodel mean cumulative temperature change compared to the CCTM estimate. In 2100, the CCTM estimate yielded a cumulative temperature increase from the CO2-induced radiative forcing of 4.0±0.1°C, while the multimodel mean prediction was 4.2±0.6°C.

Figure 8.

(a) and (c) CO2-induced radiative forcing and temperature change computed from the prognostic atmospheric CO2 mole fraction for each of the CMIP5 models. (b) and (d) Corresponding radiative forcing and temperature change for the multimodel mean and contemporary CO2 tuned model (CCTM). The pink range surrounding the CCTM represents the uncertainty propagated from the 95% confidence interval from the linear model for the CCTM atmospheric CO2 trajectory. The blue range surrounding the multimodel mean represents the uncertainty propagated from the 95th percentile of the range for the standard deviation of the multimodel mean atmospheric CO2 trajectory.

The CO2 mole fraction, CO2-induced radiative forcing, and CO2-induced cumulative temperature change for each of the CMIP5 models are shown in Table 3 for the years 2010, 2060, and 2100. In addition, the last three columns of the table show the temperature change bias between the models and the CCTM estimate. In 2010, the temperature bias of the multimodel mean was 0.06°C (ranging from −0.20°C to 0.24°C), and this bias increased to 0.11°C in 2060. Individual model results showed that some biases increased, some decreased, and others remained the same between 2010 and 2060. The MRI-ESM1 and CESM1-BGC models had the largest temperature biases in 2060, at −0.48°C and 0.48°C, respectively, while the INM-CM4 and MPI-ESM-LR models had the smallest temperature biases in 2060, at −0.02°C and 0.01°C, respectively. By 2100, the multimodel mean temperature bias had increased to 0.14°C. The MRI-ESM1 and CESM1-BGC models had the largest temperature biases in 2100, at −0.70°C and 0.61°C, respectively. The temperature biases for individual models were significant and increased with time during the 21st century. The original RCP 8.5 atmospheric CO2 mole fraction trajectory resulted in a −31 ppm mole fraction bias and a −0.10°C temperature bias from the CCTM estimate at 2100. This result suggests a small inconsistency between the RCP 8.5 specification of the CO2 mole fraction trajectory and the corresponding fossil fuel emissions trajectory. The RCP 8.5 trajectory was derived from the MESSAGE-MACRO integrated assessment model [Riahi et al., 2011], which incorporates the MAGICC/SCENGEN (version 4.1) coupled gas-cycle/climate model [Wigley, 2003] that includes a net positive carbon cycle feedback but lacks explicit representation of many ecosystem processes that influence climate-carbon and concentration-carbon feedbacks. Prior to its use in deriving the RCP 8.5 trajectory, parameters in the carbon cycle model of MAGICC/SCENGEN (version 5.3) [Wigley, 2008] were changed to give concentration projections consistent with the results from the C4MIP activity [Friedlingstein et al., 2006].

It is important to note in the context of the results described above that model-to-model variations in atmospheric CO2 trajectories documented here contributed to only a small amount of the model-to-model variation in surface air temperature changes. This is because many of the models in the ensemble had different representations of aerosol processes, including forcings and feedbacks, and because the models had widely varying climate sensitivities (e.g., Gillett et al. [2013]). Specifically, the multimodel mean estimate of temperature change from the beginning of the simulations was 3.1±1.3°C at 2060 and 5.1±2.2° at 2100. When we adjusted each model temperature estimate for the impact of CO2 biases using the CO2-induced temperature biases shown in Table 3, the multimodel mean changed slightly to 3.0±1.2°C at 2060 and 5.0±1.9°C at 2100.

4 Discussion

4.1 Why Do Carbon Cycle Biases Persist on Decadal Timescales?

In our analysis, we found that the ordering among model predictions of atmospheric CO2 persisted for several decades. Models that had the highest positive biases near the end of the observational record in 2010 were more likely to have higher positive biases in earlier decades, during the latter half of the twentieth century (Figures 1 and 8). Similarly, this same set of models also had the highest set of future atmospheric CO2 projections during the middle and latter half of the 21st century in response to RCP 8.5 emissions (Figure 4). Many structural model elements probably contributed to this bias and ordering persistence, including processes that influence the strength of concentration-carbon feedbacks. One important example includes the representation of ocean mixing processes that regulate the formation of intermediate and deep waters in the ocean. Past work from analysis of 13 simulations from the second phase of the Ocean Carbon Cycle Model Intercomparison Project indicated that climate models often underestimate this overturning in the Southern Ocean [Doney et al., 2004; Matsumoto et al., 2004; Dutay et al., 2002]. In addition, Russell et al. [2006] performed an intercomparison of the Southern Ocean circulation in CMIP3 control simulations and found that the maximum wind stress in the Southern Hemisphere, nominally associated with the Antarctic Circumpolar Current, was located too far equatorward in most models. In ESMs, such deficiencies in model structure and large-scale circulation have the potential to limit CO2 uptake by the oceans and are likely to contribute to a persistent atmospheric CO2 bias over time because many of the physical processes regulating mixing are unlikely to change rapidly. Biases in atmospheric CO2 caused by this type of mechanism likely grow through time as the atmospheric CO2 growth rate accelerates and transport of carbon out of the mixed layer becomes an increasing bottleneck to net ocean carbon uptake. Our finding that many models underestimated the ocean anthropogenic carbon inventory (Figures 3 and S2) is consistent with other studies indicating some ocean models exhibit weak meridional overturning circulation [Downes et al., 2011; Sallée et al., 2013]. However, additional research is needed to understand the causes of model-to-model variations in ocean carbon uptake for the CMIP5 models.

On land, similar deficiencies in model structure have the potential to contribute to persistent multidecadal biases in carbon fluxes. Key regulators of carbon uptake on land in response to elevated levels of atmospheric CO2 include, for example, the response of gross primary production (GPP) to CO2 concentration, the allocation of GPP to longer lived woody pools, and subsequent increases in soil organic matter pools [Thompson et al., 1996; Luo et al., 2006]. Carboxylation parameterizations of Rubisco often follow the form of a modified Michaelis-Menten equation [Farquhar et al., 1980] and vary considerably among models. Models that have lower estimates of the maximum carboxylation rate in different biomes, in response to nitrogen limitation (e.g., Thornton et al. [2007]) or other factors, are likely to have smaller CO2-driven increases in GPP by the end of the twentieth or 21st centuries. Similarly, models that have reduced allocation of GPP to wood pools will also have lower rates of carbon uptake, given the same trajectory of GPP increases. Since in many models, the maximum carboxylation rate is either held constant or unlikely to rapidly change in response to changing environmental conditions, this parameterization can induce a long-term bias in carbon fluxes. The same argument applies to allocation submodels: although many plant allocation models are dynamic [Friedlingstein et al., 1999; Arora and Boer, 2005; Litton et al., 2007] and respond to regional variations in light availability, soil moisture, and other environmental controls, many aspects of these models are unlikely to change rapidly during the twentieth and 21st centuries, allowing flux biases to persist in response to monotonic increases in atmospheric CO2.

Other land model structural components not associated with concentration-carbon feedbacks also can contribute to long-term flux biases. For example, land use carbon emissions are an important component of the terrestrial carbon budget and are highly uncertain [Hansen et al., 2010; Houghton et al., 2012; Baccini et al., 2012; Andres et al., 2012; Harris et al., 2012]. Model estimates of this flux can be biased if, for example, the representations of aboveground and belowground carbon pools within the model do not capture observed patterns. As a consequence, carbon losses for a given rate of land clearing may be too high or too low, with a bias that is persistent if rates of land clearing change gradually from one decade to the next. Similarly, climate-carbon feedbacks, including, for example, the response of heterotrophic respiration to temperature [Davidson and Janssens, 2006] could also contribute to long-term biases. Nevertheless, for the CMIP5 models, their contribution during the latter half of the twentieth century and first half of the 21st century might be expected to be smaller than other drivers, given that temperature and other changes in climate increase through time [Arora et al., 2013].

The overall success of contemporary atmospheric CO2 observations in constraining future CO2 levels (e.g., Figure 7) is probably related to several factors. First, the atmospheric anthropogenic carbon inventory is known relatively well, in contrast to the much larger uncertainties associated with the ocean and land inventories. Second, concentration-carbon feedbacks appear to contribute more to the intermodel variations of future (2100) atmospheric CO2 level projections than climate-carbon feedbacks [Arora et al., 2013]. In this context, the rapid rise of atmospheric CO2 observed over the last few decades provides an important direct test of the combined set of ocean and land concentration-carbon mechanisms operating within the models, and, as described above, any biases today are likely amplified as the growth rate of CO2 accelerates. Although temperature and other climate changes also occurred during this period, the magnitude of these changes was much smaller as compared to what is expected during the middle and latter part of the 21st century. As a consequence, the variations in atmospheric CO2 estimates among models resulting from climate-carbon feedbacks were likely relatively small for the contemporary period (e.g., Arora et al. [2013]). The growing importance of climate-carbon feedbacks probably contributes to the increasing uncertainty in our CCTM estimate during the latter part of the 21st century (Figure 6).

4.2 What Is the Value of Improving Carbon Cycle Processes to Match Contemporary CO2?

One of the goals of the integrated assessment modeling community in developing the different representative concentration pathways (RCPs) was to enable ESMs to compute possible emissions scenarios consistent with a particular atmospheric CO2 trajectory [van Vuuren et al., 2011]. This is valuable, for example, in identifying the magnitude of required mitigation efforts to stabilize CO2 levels in the atmosphere at a particular mole fraction, taking into account carbon cycle responses and feedbacks (e.g., Jones et al. [2013]).

Our analysis has several implications for the interpretation of future compatible emissions time series derived from the set of ESMs participating in CMIP5. First, the compatible emissions time series derived from the multimodel mean of concentration-forced simulations during the 21st century is likely to be too low. This assertion is based on the observation that (1) fossil fuel emissions would have to be reduced below observations to eliminate the high bias found in the multimodel mean during the last few decades (Figure 1) and (2) our finding that biases observed today were significantly correlated with future atmospheric CO2 projections because of parameterizations of slowly changing carbon cycle processes.

Second, the range of variation in compatible emission estimates among individual models during the remainder of the 21st century has a large component that can be avoided for any given concentration-forced scenario by reducing or eliminating biases in contemporary atmospheric CO2. Specifically, if each model were individually optimized to eliminate biases in atmospheric CO2 during the last few decades, the range of compatible emissions projections during the 21st century would be considerably compressed. In our analysis, we investigated the potential magnitude of this uncertainty reduction by using the entire set of CMIP5 ESMs to construct a tuned model (CCTM). Projections from the CCTM provided almost a sixfold reduction in uncertainty of atmospheric CO2 levels at 2060 and nearly a fivefold reduction at 2100. As previously noted, the range of model projections diverges through time during the 21st century, as climate–carbon cycle feedbacks strengthen. However, even by the end of the century, a significant component of the variation among models can be attributed to biases that exist today. This result is consistent with results from Arora et al. [2013] that show much of the model-to-model variation in carbon cycle estimates is driven by concentration-carbon feedbacks and only to a lesser degree by variation in climate-carbon feedbacks.

Considering the carbon cycle a “black box” from the perspective of climate change impacts on other aspects of the Earth system, there is significant value in model development efforts to eliminate biases in atmospheric CO2 that occur by the end of the observational record. By doing so for the set of simulations evaluated here, high biases in radiative forcing and global temperature increases could be reduced in many of the models (Figure 8). Improved estimates of CO2-induced climate change, in turn, would reduce uncertainties related to rates of snow and ice melt [Flanner et al., 2009] and other processes contributing to climate feedbacks [Hall and Qu, 2006; Davidson and Janssens, 2006; Zaehle et al., 2010; Koven et al., 2011]. Benefits would also exist for developing more precise estimates of changes in ocean surface chemistry [Caldeira and Wickett, 2003; Doney et al., 2009a] and ocean circulation [Downes et al., 2011; Sallée et al., 2013], and better estimates of climate change impacts on agriculture and other aspects of human society [Lobell et al., 2011].

An interesting question then emerges regarding how best to reduce these biases within individual models and for the set of ESMs as a whole contributing to future climate assessments. Many structural elements of the models may be improved through extensive comparison of ESMs with observations and the development of community-wide benchmarking and evaluation systems such as the International Land Model Benchmarking (ILAMB) project [Luo et al., 2012] and equivalent ocean projects [Doney et al., 2009b]. These efforts are underway, and significant advances are expected over the next several years. Biases also may be reduced by having closer coordination among different ESM development teams and allocating more time to evaluating coupled transient ESM simulations during the nineteenth and twentieth centuries. More specifically, given that constraints on some long-term flux components are uncertain, modeling teams may need to optimize several sets of parameters to achieve a more realistic integrated carbon simulation. For example, adjustments to parameterizations of subgrid scale mesoscale eddy mixing can improve many aspects of physical ocean system [Gent and McWilliams, 1990; Danabasoglu and Marshall, 2007; Danabasoglu et al., 2008, Gent, 2011] but may have unintended consequences for ocean carbon uptake. At a minimum, more quantification and analysis of these trade-offs is needed, and ocean carbon benchmarks need to be fully considered when modifications are made to ocean model physics.

On land, uncertainties in land use histories and responses of carbon storage to elevated CO2 and other changing resources provide additional opportunities for making model adjustments that can improve the fidelity of the model's overall atmospheric CO2 trajectory but not conflict with available data constraints. Ecosystem manipulation experiments and observations also are needed to improve our understanding of ecosystem processes and their representation in models. In addition, a robust set of Earth system observations is needed to quantify climate change impacts on terrestrial carbon sinks and carbon dynamics associated with land use change.

5 Conclusions

The trajectories of atmospheric CO2 mole fraction for 19 historical and 17 future emission-driven simulation results produced for CMIP5 by 15 fully coupled ESMs were analyzed. Comparison of ESM prognostic atmospheric CO2 over the historical period with observations indicated that ESMs, on average, had a high bias in their predictions of contemporary CO2 levels. Comparison with observationally based estimates of anthropogenic carbon inventories in the ocean indicated that this bias was driven by weak to nominal ocean carbon uptake in many ESMs and that terrestrial and ocean carbon accumulation often compensated for one another within individual models, reducing the bias in predicted atmospheric CO2. We found a linear relationship over decadal timescales between contemporary and future atmospheric CO2 mole fractions and used this relationship to construct a model of the atmospheric CO2 trajectory tuned to contemporary observations, which we called the CCTM. CCTM estimates of atmospheric CO2 were 21 ppm lower than the multimodel mean at 2060 and 32 ppm lower at 2100. Using an impulse response function, we approximated radiative forcing and temperature changes resulting from ESM, CCTM, and observed CO2 trajectories. Comparison of temperature changes from ESMs with the CCTM estimate indicated a small positive multimodel mean bias during the 21st century. Individual model results exhibited a much larger range of CO2-induced temperature change, from 1.9°C to 2.9°C in 2060 and from 3.3°C to 4.6°C in 2100, demonstrating the net effect and significant climate implications associated with the large model spread in carbon accumulation in ocean and land reservoirs.

Atmospheric CO2 biases persist in models for decades because parameterizations of biological and physical processes related to carbon accumulation on land and in the ocean do not allow the system to change rapidly. Many of the biases associated with concentration-carbon feedbacks (i.e., Arora et al. [2013]) likely increase through time in the RCP 8.5 scenario as the atmospheric CO2 growth rate accelerates. Because of the high atmospheric CO2 bias exhibited by ESMs for the contemporary period, future fossil fuel emissions trajectories designed to stabilize atmospheric CO2 levels, sometimes called “allowable” emissions, would be too low if estimated from the multimodel mean. We have shown that a significant component of the variation of atmospheric CO2 levels among models during the 21st century was linked to biases in their predictions of contemporary atmospheric CO2. This suggests improving the agreement of individual models with the contemporary atmospheric CO2 record could reduce the magnitude of future CO2 biases in many models and narrow the range of predicted radiative forcing and CO2-induced global temperature increases. To reduce biases in individual models, a rigorous campaign of extensive and multifaceted evaluation—directed at improving model structure and optimizing model parameters through comparison with contemporary observations—must be performed. Community-based benchmarking and model evaluation systems, such as ILAMB, tighter coordination among ESM development teams, and optimization of model parameters using all available observational constraints has the potential to both reduce model biases and significantly decrease the multimodel spread of carbon cycle predictions for future development scenarios and mitigation.


The authors wish to thank R. J. Stouffer for his thoughtful comments on an early draft of the paper, which lead to improvements in the discussion of our results. In addition, the authors gratefully acknowledge the careful reviews of T. A. Boden, R. J. Norby, and S. Denning. This research was sponsored by the Climate and Environmental Sciences Division (CESD) of the Biological and Environmental Research (BER) Program in the U. S. Department of Energy Office of Science and the National Science Foundation (AGS-1048890). This research used resources of the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL), which is managed by UT-Battelle, LLC, for the U. S. Department of Energy under contract DE-AC05-00OR22725. CDJ was supported by the Joint DECC/Defra Met Office Hadley Center Climate Program (GA01101). This is a contribution to the BIOFEEDBACK project of the Center for Climate Dynamics (SKD) at the Bjerknes Center for Climate Research. The National Center for Atmospheric Research is sponsored by the National Science Foundation. We acknowledge the World Climate Research Program's Working Group on Coupled Modeling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 1 of this paper) for producing and making available their model output. For CMIP, the U. S. Department of Energy's Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.