Corresponding author: R. Olson, Department of Geosciences, Penn State University, University Park, PA, USA. (email@example.com)
 Many studies have attempted to estimate the equilibrium climate sensitivity (CS) to the doubling of CO2concentrations. One common methodology is to compare versions of Earth models of intermediate complexity (EMICs) to spatially and/or temporally averaged historical observations. Despite the persistent efforts, CS remains uncertain. It is, thus far, unclear what is driving this uncertainty. Moreover, the effects of the internal climate variability on the CS estimates obtained using this method have not received thorough attention in the literature. Using a statistical approximator (“emulator”) of an EMIC, we show in an observation system simulation study that unresolved internal climate variability appears to be a key driver of CS uncertainty (as measured by the 68% credible interval). We first simulate many realizations of pseudo‒observations from an emulator at a “true” prescribed CS, and then reestimate the CS using the pseudo‒observations and an inverse parameter estimation method. We demonstrate that a single realization of the internal variability can result in a sizable discrepancy between the best CS estimate and the truth. Specifically, the average discrepancy is 0.84°C, with the feasible range up to several °C. The results open the possibility that recent climate sensitivity estimates from global observations and EMICs are systematically considerably lower or higher than the truth, since they are typically based on the same realization of climate variability. This possibility should be investigated in future work. We also find that estimation uncertainties increase at higher climate sensitivities, suggesting that a high CS might be difficult to detect.
 Future climate projections strongly depend on climate sensitivity (CS) [Matthews and Caldeira, 2007; Knutti and Hegerl, 2008]. CS is the equilibrium global mean near‒surface temperature change for a doubling of atmospheric CO2concentrations [Andronova et al., 2007; Knutti and Hegerl, 2008]. Many recent studies attempted to estimate climate sensitivity. A common methodology is to use Earth models of intermediate complexity (EMICs) and simple models in conjunction with spatially and/or temporally averaged historical observations towards this purpose [Forest et al., 2002; Gregory et al., 2002; Knutti et al., 2003; Forest et al., 2006; Tomassini et al., 2007; Drignei et al., 2008; Hegerl et al., 2007, and others]. Despite these efforts, CS has remained consistently uncertain [Edwards et al., 2007; Hegerl et al., 2007; Knutti and Hegerl, 2008].
 Key sources of this uncertainty include the following: (i) climate model error, (ii) unresolved internal climate variability, and (iii) observational error. We refer to the sum of these processes as “unresolved climate noise”. Quantifying the relative contribution of these sources of uncertainty is of considerable policy relevance. Here we focus on the role of the unresolved internal climate variability. The unresolved internal climate variability is the part of the observed internal climate variability record that a climate model can not reproduce.
 We analyze the role of the unresolved climate variability using observation system simulation experiments (OSSEs). OSSEs are a common tool in physical and environmental sciences to evaluate observation system designs [e.g., Piani et al., 2005; Knutti et al., 2006; Urban and Keller, 2009; Huang et al., 2010a, 2010b; Serra et al., 2011; Zakamska et al., 2011]. In OSSEs, synthetic (or “pseudo‒”) observations are usually first generated from a model with known “true” parameter setting by adding noise representing observational error. The parameters are then reestimated using the pseudo‒observations. This simple set‒up allows for careful testing of the methods and for quantifying different drivers of uncertainties.
 Our starting point is an ensemble of Earth System model runs spanning the last two centuries where climate sensitivity is systematically varied. The ensemble also accounts for the uncertainty in ocean mixing and radiative effects of anthropogenic sulfates [Olson et al., 2012]. We use a previously developed statistical approximator (“emulator”) of this model to estimate model output at the parameter values where the model was not evaluated. In a suite of OSSEs, we construct pseudo‒observations of surface temperature (T) and upper ocean heat content (0–700 m, OHC) by contaminating the emulator output at a set “true” CS with unresolved climate noise. We then reestimate CS using the pseudo‒observations, and an inverse parameter estimation method. We use this approach to address three main questions: (i) How well can we constrain CS using observations of temperature and upper ocean heat content? (ii) Do the estimation uncertainties depend on the input CS? and (iii) What is the contribution of the unresolved internal climate variability to the CS uncertainty? We give further details on the Earth System model, its emulator, the parameter estimation methodology, and the experimental design in the following sections.
2.1 Earth System Model Simulations
 We use output from the University of Victoria Earth System model (UVic ESCM) version 2.8 [Weaver et al., 2001], an Earth system model of intermediate complexity (EMIC). The model's atmosphere is a one‒layer energy moisture balance model, with prescribed winds from the NCEP/NCAR climatology [Kalnay et al., 1996]. The ocean component is a general circulation model MOM2 with 19 vertical levels [Pacanowski, 1995]. The horizontal resolution of both components is 1.8° [lat] × 3.6° [lon]. The model also includes thermodynamic sea ice, dynamic terrestrial vegetation, and oceanic biogeochemistry. UVic ESCM has little or no internal climate variability in near‒surface atmospheric, and upper ocean temperatures (the variables used in this study) besides the seasonal cycle. EMICs often represent many physical processes in a simplified way, but they are less computationally expensive compared to general circulation models (GCMs) and have been frequently employed to estimate climate parameters [e.g., Forest et al., 2002; Knutti et al., 2003; Forest et al., 2006; Tomassini et al., 2007; Sanso and Forest, 2009; Olson et al., 2012]. Our modified version of the model includes an updated solar radiative forcing, and implements additional greenhouse gas, volcanic, and anthropogenic sulfate aerosol forcings [Olson et al., 2012].
 Specifically, we use an ensemble of 250 historical UVic ESCM runs spanning the years 1800–2010 [Olson et al., 2012]. The ensemble samples model parameters CS, background vertical ocean diffusivity (Kbg) and a scaling factor for albedos due anthropogenic sulfate aerosols (Asc). CS is varied through an additional parameter f∗that changes longwave feedbacks. Specifically, it perturbs modeled outgoing longwave radiation as a function of local temperature change from year 1800. We diagnose the mapping between f∗ and climate sensitivity using a small ensemble of long CO2 doubling simulations with varying f∗ [Olson et al., 2012]. We provide the ranges for the model parameters of the ensemble in Table 1.
Table 1. Ranges for Model and Statistical Parametersa
Subscripts T and OHC refer to surface air temperature and upper ocean heat content respectively.
°C per CO2 doubling
2.2 Gaussian Process Emulator
 Our methodology requires orders of magnitude more UVic ESCM runs than it is feasible to carry out with a typical computational environment (see section 2.3). We overcome this hurdle by using the UVic ESCM emulator detailed in Olson et al. . Emulators are fast statistical approximators to climate models that are increasingly used in climate science [Drignei et al., 2008; Holden et al., 2010; Edwards et al., 2011; Bhat et al., 2012; Olson et al., 2012]. Emulators are very fast, which enables better sampling of model parameter space. Our emulator relies on model output at the 250 parameter settings of the ensemble and interpolates the model response to any desired parameter setting. The emulator prediction at each parameter setting is a random variable with the expected value (known as “posterior mean”) and the associated predictive uncertainty. We refer to the posterior mean as “emulator output” throughout this paper. Specifically, the emulator estimates global average annual surface temperature anomalies T (years 1850–2006) and upper ocean heat content anomalies OHC (0–700 m, years 1950–2003). These times reflect the coverage of pseudo‒observations (section 2.3) and are consistent with the span of observations from Brohan et al.  and Domingues et al. . The temperature anomaly is calculated with respect to years 1850–1899, while the OHC anomaly—to years 1950–2003.
 The emulator works in rescaled model parameter coordinates such that each parameter ranges from zero to unity. The emulator approximates the climate model output as a sum of a quadratic polynomial in the rescaled parameters, and a zero‒mean Gaussian process with an isotropic covariance function (i.e., the smoothness of the Gaussian Process is the same in all rescaled parameter directions). We only use the emulator to interpolate the model outputs between the parameter settings. There is no extrapolation beyond the range of the ensemble. The emulator provides a reasonable approximation to UVic ESCM over the parameter ranges used [Olson et al., 2012]. An example of emulator output for the final year for each diagnostic is given in Appendix A.
2.3 Observation System Simulation Experiments
 We conduct several OSSE to address the three questions previously outlined in the Introduction. The OSSEs involve two stages:
 Generation of pseudo‒observations from the emulator given assumed “true” CS.
 Reestimating CS given these pseudo‒observations, the emulator, and an inverse parameter estimation method.
 In the first stage, we answer the question: Given a “true” CS, what time series of temperature and ocean heat content can we theoretically observe? To this end, we construct pseudo‒observations by superimposing unresolved climate noise on the emulator output at a pre‒defined “true” climate parameter setting. The unresolved noise models the sum of the processes that result in the discrepancy between the observations and the emulator. These processes include emulator predictive error, model error, observational error, and unresolved internal climate variability. Mathematically, the noise n is defined as follows:
where y refer to the observations, is the emulator output, θ is the vector of model parameters (Kbg, CS, Asc), t is the time index, and k is the diagnostic index (i.e., k=1 for T, and k=2 for OHC).
 We approximate the unresolved climate noise by an AR(1) process. Exploratory data analysis shows that this is a reasonable assumption for all our OSSEs (results not shown). Specifically,
where ρ is first‒order autocorrelation and w is an independently and identically distributed Gaussian noise with the innovation standard deviation σk. This AR(1) process is completely specified by the σkand ρkparameters.
 The second stage of the OSSE addresses the following question: What CS pdfs can we obtain given the “true” CS value and the different realizations of the unresolved climate noise? Following Olson et al. , we use the pseudo‒observations y to reestimate CS using the following statistical model:
where bkis an additional time‒independent bias. We set the bias term for OHC to 0, for consistency with Olson et al. . Associated with each parameter value Θ=(Kbg, CS, Asc, σT, σOHC, ρT, ρOHC, bT) there is a likelihood function which describes the probability of pseudo‒observations given this parameter value (detailed in Appendix B). Using Bayes Theorem, we multiply the likelihood function by the prior probability for the parameters to obtain the posterior probability for each parameter setting. We estimate the joint posterior pdf for Θusing Markov chain Monte Carlo (MCMC). The MCMC algorithm [Metropolis et al., 1953; Hastings, 1970] is a standard computational approach for estimating multivariate posterior pdfs. We implement the method following Olson et al. . Specifically, our MCMC parameter chains are 300,000 members long for each unresolved noise realization.
 Our methodology is different from the work of Sanso and Forest , which uses many realizations of the stochastic emulator predictions, in that we use the posterior mean function, or expected emulator prediction. The posterior mean function approach has been previously adopted by Drignei et al. ; Higdon et al. ; Bhat et al. , and Olson et al. .
 For each experiment, we repeat the procedure of generating pseudo‒observations and estimating CS 60 times, each time using a different random realization of the unresolved climate noise process. We test two out of 60 realizations for convergence by running the estimation twice with different initial values for the final MCMC chain. We have not detected any convergence problems with our algorithm.
 The OSSEs share the same general set‒up, with relatively minor differences. Specifically, the experiments differ in assumed “true” parameter values, in the priors, and in the assumptions about the unresolved noise process (Table 2).
Table 2. Summary of the Design and the Results of the Observation System Simulation Experimentsa
Properties of CS Estimates (°C)
Mean 68% CI
“Unif.” refers to uniform priors for climate parameters, and “Inf.” refers to informative priors for Kbgand CS following the default case of Olson et al. . The mean 68% CI refers to the mean 68% posterior credible interval of CS estimates. The interval is calculated as the range between the 16th and the 84th percentiles of the CS chains.
While “true” input CS is 3.1°C, the mean of the non‒uniform prior is 3.25°C, and the mode is 2.96°C.
 In the first experiment, called “Standard”, we address how well can the observations constrain CS assuming realistic knowledge of climate uncertainties. Here we use mean estimates from the base case of Olson et al.  as “true” climate parameters. These values are Kbg=0.19 cm2 s−1, CS=3.1°C and Asc=1.1. For unresolved climate noise we adopt the modes from the base case of Olson et al. : σT=0.10°C, σOHC=2.6×1022 J, ρT=0.58, and ρOHC=0.079 (UVic ESCM Residuals in Figures 1 and 2). For simplicity, we do not use bias terms when generating pseudo‒observations, since the 95% posterior credible intervals for these terms include zero [Olson et al., 2012]. We use uniform priors for all parameters (Table 1).
 In the experiment “Nat. Var.”, we address the following question: How well could we theoretically estimate CS if the model, emulator, and observational errors decreased to zero? In this case, internal climate variability remains the only component of unresolved climate noise. By the internal climate variability, we mean the variations in the mean state of the climate on all spatial and temporal scales beyond that of individual weather events due to natural internal processes within the climate system (as opposed to variations in natural or anthropogenic external forcing) [Baede, 2007]. We also assume, as an approximation, that we know perfectly the statistical properties of this variability (e.g., errorless GCMs that can correctly simulate the “true” variability). We discuss the effect of this assumption later.
 Unfortunately, it is difficult to estimate the internal climate variability from observations because of the confounding effects of observational errors, particularly in the case of OHC. Thus, following Tomassini et al.  and Sanso and Forest , we approximate the internal variability by using the GCM output. We fit an AR(1) process to detrended near‒surface global mean annual atmospheric temperature and 0–700 m ocean heat content anomalies from preindustrial control runs of three climate models: BCCR‒BCM2.0 [Ottera et al., 2009], GFDL‒CM2.1 [Delworth et al., 2006; Gnanadesikan et al., 2006], and UKMO‒HadCM3 [Gordon et al., 2000; Pope et al., 2000; Johns et al., 2003]. The output of these runs was obtained from the World Climate Research Programme's (WCRP's) Coupled Model Intercomparison Project phase 3 (CMIP3) multi‒model data set [Meehl et al., 2007]. Specifically, we use run 1 for all three models. We discard the first 100 years for BCCR‒BCM2.0 because of the drifts in modeled climate. We first remove the mean and then detrend the anomalies using robust locally weighted regression [Cleveland, 1979] with the span f of 2/3. When calculating OHC, we first obtain temperatures from potential temperatures and salinities using the UNESCO equation of state [UNESCO, 1981] following Bryden  and Fofonoff . For this conversion, we find the ocean pressure field from latitude and depth using simplified equations [Lovett, 1978]. For GFDL‒CM2.1 we only use years 1–300, since it is only for these years that the salinity fields are available. The resulting AR(1) properties, averaged across the models, are as follows: σT=0.12°C, σOHC=0.51×1022 J, ρT=0.45, and ρOHC=0.9 (Table 2, Figures 1 and 2, red triangles). As in previous work [e.g., Tomassini et al., 2007], we neglect the cross‒correlation between T and OHC. The average correlation estimated from the three GCMs is 0.41, indicating that 17% of variability in temperature is explained by the variability in ocean heat content, and vice versa. In the estimation stage, we fix the statistical parameters of the observation‒emulator residuals at their “true” values. This represents a case where one has a perfect knowledge of internal climate variability. This is in contrast to “Standard” experiment where the properties of the unresolved climate noise are estimated jointly with climate parameters.
 The “Higher CS” experiment explores the effects of different “true” parameter values on the estimation. It differs from “Standard” by using a higher “true” input CS. Specifically, we adopt Kbg=0.19 cm2s−1, CS=4.8°C and Asc=1.3. These values are selected to be consistent with the bivariate joint pdfs presented in Olson et al. .
 The “Inf. Priors” experiment examines the role of priors. It uses informative priors for CS (Figure 3) and Kbgfollowing the default case of Olson et al. . “Inf. Priors” has otherwise the same setup as “Standard” (cf. Table 2).
3 Results and Discussion
 Our results suggest that the process driving unresolved internal climate variability is a key factor behind the current uncertainty in climate sensitivity estimates. Specifically, the average width of the estimated CS pdfs (as measured by the 68% posterior credible intervals) in the “Nat. Var.” case is only modestly lower compared to the “Standard” case (Table 2, Figure 3). This suggests that CS is likely to remain uncertain in the world of error‒free models and perfect observations, due to the confounding effect of the unresolved internal climate variability. The variability also appears to be a key factor in the second‒order uncertainty in climate sensitivity (Figures 3 and 4). This uncertainty represents the sensitivity of estimated CS pdfs to different realizations of the unresolved climate noise, and is measured by the mean deviation of estimated CS modes. Specifically, while the mean deviation is 1.1°C in the “Standard” experiment, it decreases to 0.84°C in the “Nat. Var.” case (Table 2). Broadly consistent results for internal climate variability (but with higher scatter of the modes) are obtained if the AR(1) properties of the variability are estimated (rather then held fixed at “true” values); and when the bias term is removed during the estimation stage (see Appendix C). Overall, our results suggest that internal climate variability presents a substantial obstacle to estimating climate sensitivity. It is thus far an open question whether this hurdle can be overcome with alternative approaches that perform joint state and parameter estimation [e.g., Annan et al., 2005; Evensen, 2009; Hill et al., 2012]. Of course, the pivotal role of the internal climate variability should not prevent us from investing in better future observational systems. Webster et al.  show, using a simplified unresolved climate noise representation, that future observations are expected to further reduce the CS uncertainty.
 The CS estimation uncertainties increase at higher CS. Specifically, both pdf width and scatter increase considerably compared to the “Standard” case (Table 2, Figure 4). This suggests that higher climate sensitivities can be difficult to detect if a particular realization of climate noise biases the result low. This is consistent with the analytical model results of Hansen et al. , which show that the dependency of transient ocean warming on climate sensitivity weakens at high CS. Thus, at high CS, a small uncertainty in a single ocean surface warming observation implies a larger uncertainty in climate sensitivity. Our numerical model shows similar response of atmospheric surface warming to changing CS. Note that there are other complicating factors influencing the CS uncertainty, such as the aerosol effects specified by Asc.
 Switching from uniform to informative priors (the “Inf. Priors” experiment) substantially reduces the CS uncertainty (Table 2, Figures 3 and 4). Under the informative priors, the mean estimated CS mode (2.9°C) is somewhat lower than the “true” value of 3.1°C. This difference is statistically significant (α=0.05). This might be in part due to the biasing effect of the mode of the CS prior, which is lower than the “true” value. Both of these effects (lower uncertainty but potential biases under narrower priors within the context of OSSEs) have been previously found and discussed by Webster et al. . Thus, while using informative priors can be a promising approach, care should be given to choosing an appropriate prior.
 In all experiments, higher estimated CS modes are associated with higher Asc modes. The Spearman's rank correlation coefficients between the two sets of modes are 0.84 for “Standard,” 0.88 for “Nat. Var.,” 0.77 for “Higher CS,” and 0.75 for “Inf. Priors.”
 Finally, each realization of internal climate variability can result in a considerable discrepancy between the best CS estimate and the true value (“Nat. Var.” panels, Figures 3 and 4). While average discrepancy due to the unresolved internal variability is 0.84°C (Table 2), one of the “Nat. Var.” realizations leads to an estimate of 6.3°C which is 3.2°C higher than the “true” value. Even larger outliers are found in additional sensitivity experiments that differ in statistical assumptions from the “Nat. Var.” For example, in the experiment that does not use a bias term for temperature (bT), one out of 60 variability realizations leads to a CS mode of 10.8°C, which exceeds the “true” value by 7.7°C (Appendix C). In general, the distribution of the discrepancy is positively skewed, with a longer upper tail (Figure 4).
 Historical observational constraints on climate sensitivity (e.g., global average upper ocean heat content and surface temperature) are based on a single realization of internal climate variability process. Not considering the effects of the observational and model errors, this realization alone can introduce a considerable discrepancy between the best CS estimate and the true value. Given that scientific models often share similar assumptions and might not be independent (see Pennell and Reichler  for a discussion of similarities in GCMs), it is possible that the bias due to the internal variability can be in the same direction in studies using different models. As a result, current EMIC‒derived CS estimates from these data sets may be systematically higher or lower than the true value. A way forward might be to use independent constraints from other time periods (e.g., Last Glacial Maximum, Holden et al. ; Schmittner et al. ) or information from a wider variety of spatially resolved data sets and reanalyses [Forest et al., 2002, 2006; Piani et al., 2005; Knutti et al., 2006].
 Our analysis uses many assumptions that point to several caveats and open research questions. The Earth System model approximated by our emulator relies on a number of simplifications (e.g., it does not explicitly include clouds) and neglects some historic forcings (e.g., indirect effects of anthropogenic sulfates; and tropospheric ozone [Forster et al., 2007]). Also, we do not fully account for past forcing uncertainties. In addition, we change climate sensitivity using a very simplistic approach by varying longwave radiative feedbacks, while shortwave feedbacks are also uncertain [Bony et al., 2006].
 The way we estimate internal climate variability for use with the “Nat. Var.” experiments has limitations. For example, our estimates of the variability rely on three climate models. Using more models might provide a better sample. In addition, there is a distinct possibility that climate models considerably underestimate the observed decadal OHC variability (e.g., Levitus et al. , Hansen et al. ; but see AchutaRao et al.  for an alternative view). If true, we hypothesize that the CS uncertainty in the “Nat. Var.” experiment would increase, which would strengthen our conclusion that natural variability is an important driver of CS uncertainty.
 In addition, the limitations of our statistical model and OSSE set‒up deserve mentioning. Specifically, our statistical model does not include any cross‒correlation among the residuals for T and OHC, and relies on a simple AR(1) structure. However, our exploratory data analysis and the spectra of internal climate variability from the three GCMs suggests that this structure is a reasonable approximation to the underlying statistical processes. Also, we use a relatively small number of realizations in the OSSEs to keep the computational burden manageable. Furthermore, we rely on uniform priors in most experiments. We have chosen to work with the relatively simple prior specification because it still remains an open question to find more informative priors that lead to good bias, and coverage properties. We use a uniform prior for Asc to reflect the current large range of uncertainty about past aerosol forcings [Forster et al., 2007]. Considering the impact of learning about Asc on the CS uncertainty (by using a tighter prior for Asc) is the subject of future research. Finally, we explore only a small subset of uncertainty in unresolved climate noise, and in climate model parameters.
 We use observation system simulation experiments (OSSEs) to analyze the effects of unresolved internal climate variability on the uncertainty in climate sensitivity. We repeatedly simulate pseudo‒observations from a statistical emulator of an Earth System Model at a given climate sensitivity, and then reestimate the sensitivity using a Bayesian inversion method.
 Our results suggest that unresolved internal climate variability (as approximated by the three general circulation models we use) is an important driver of the first‒order (as measured by the 68% posterior credible internal) and the second‒order (as measured by standard deviation of the estimated modes) uncertainty in climate sensitivity estimates. A single realization of climate variability can introduce a substantial discrepancy between a CS estimate and the true value. These results open the possibility that, recent CS estimates from intermediate complexity models using global mean warming observations are systematically higher or lower than the true CS, since they typically rely on the same realization of the climate variability. For this methodology, the unresolved internal variability represents a critical roadblock. Our research suggests that even if we at present had structurally errorless models and perfect observations, current estimation approaches would still result in considerable CS uncertainty. Our results should be further confirmed with other climate models, and with an improved statistical model of internal climate variability.
 Overall, the influence of internal climate variability on CS estimates from these methods warrants thorough investigation. Future work should examine the power of learning about aerosol effects, and of combined state and parameter estimation methods [e.g., Annan et al., 2005; Evensen, 2009; Hill et al., 2012], to confront this challenge.
Appendix A:: Emulator Output
 The emulator output for the final year of pseudo‒observations (2006 for T, and 2003 for OHC) is shown in Figure A1. Specifically, two model parameters are varied at a time, while the third parameter is kept constant at the “true” setting for the “Standard” experiment. The response of temperature and ocean heat content is similar in many ways. Higher climate sensitivity leads to higher warming, and so does lower Ascas it represents less cooling effects from the sulfates. The response of both variables to Kbg is more subtle than to CS and Asc.
Appendix B:: Likelihood Function
 This appendix provides the likelihood function for observations if the statistical model is given by equations (2) and (3). We define where Nk is the number of observations for diagnostic k (k=1 for temperature, and k=2 for ocean heat content). The likelihood function for observations yk given the model and the statistical parameters is given by [Bence, 1995; Olson et al., 2012]:
 Here refers to the stationary process variance and is defined by , and wt,kare whitened bias‒corrected residuals. The whitened residuals are calculated as wt,k=nt,k−ρknt−1,k for t>1. Assuming the independence of the residuals (between the model emulator and the pseudo‒observations) across different diagnostics, the final likelihood for all pseudo‒observations Y≡(yT,yOHC) is the product of the individual likelihoods:
Appendix C:: Sensitivity of “Nar. Var.” Experiment to Statistical Assumptions
 We perform two additional experiments to explore the sensitivity of our “Nat. Var.” results to statistical assumptions. In the experiment “Nat. Var. Est.,” we estimate the statistical properties of the AR(1) process representing internal climate variability (σT, σOHC, ρT, and ρOHC), as opposed to fixing them at “true” values in “Nat. Var.” This represents a case where we have no model, observational, or emulator error, but we are still uncertain about the statistical properties of the internal climate variability. We use uniform priors for the AR(1) parameters over their prior ranges (Table 1). This experiment has otherwise the same design as the “Nat. Var.”
 In the experiment “Nat. Var., No Bias” we remove the bias bTfrom the estimation by fixing it at the true value of zero. The design of this experiment is otherwise identical to the “Nat. Var.” experiment. For both “Nat. Var. Est.” and “Nat. Var., No Bias,” two out of 60 realizations are tested for convergence by running the estimation twice with different initial values for the final MCMC chain. We have not detected any convergence problems.
 The results from these additional experiments are broadly consistent with the original “Nat. Var.”: the CS pdfs exhibit a characteristic spread around the “true” value of 3.1°C (Figures C1 and C2). The average width of the pdfs (as measured by the 68% posterior credible intervals) is very close to the original “Nat. Var.” case, while the scatter of CS modes is somewhat higher (Table C1). One reason contributing to the higher mean and standard deviations compared to “Nat. Var.” is the presence of outliers: one of the “Nat. Var. Est.” experiments leads to a mode estimate of 7.5°C, and one of “Nat. Var., No Bias”—to 10.8°C. These values are 4.4°C and 7.7°C higher than the “truth”, respectively. Both of the outlying cases were tested for convergence, and no convergence problems were detected.
Table C1. Properties of CS Estimates From the Additional “Nat. Var.” Sensitivity Experiments, and the Original “Nat. Var.,” Compared to “Standard” Case (in Bold)a
Mean Deviation of Modes
Std. of Modes
Mean 68% CI of pdfs
The mean 68% CI refers to the mean 68% posterior credible interval of CS estimates. The interval is calculated as the range between the 16th and the 84th percentiles of the CS chains. In all experiments, the assumed “true” CS is 3.1°C. All values are in °C.
“Nat. Var. Est.”
“Nat. Var, No Bias”
 We are very grateful to Chris Forest and Jim Kasting for generating insightful and useful ideas, and for their sagacious feedback on the scope and implementation of the project. This work was partially supported by NSF through the Network for Sustainable Climate Risk Management (SCRiM) under NSF cooperative agreement GEO‒1240507, and through the Center for Climate and Energy Decision Making under the cooperative agreement SES‒0949710 between the NSF and Carnegie Mellon University. We are grateful to Michael Eby and to the developers of UVic ESCM for providing the model and for discussions and advice. This study would not have been possible without the efforts of scientists who collected the observations used in this study. We acknowledge the modeling groups, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP's Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multi‒model data set. Support of this data set is provided by the Office of Science, U.S. Department of Energy. This research uses data provided by the Bergen Climate Model (BCM) project (www.bcm.uib.no) at the Bjerknes Centre for Climate Research, largely funded by the Research Council of Norway. Furthermore, we thank the scientists at the Met Office Hadley Center, and Geophysical Fluid Dynamics Laboratory for producing the GCM output used in this study. All views, errors, and opinions are solely that of the authors.