[1] In the paper “Sensitivity of distributions of climate system properties to the surface temperature data set” by A. G. Libardoni and C. E. Forest (*Geophys. Res. Lett.*, *38*, L22705, doi:10.1029/2011GL049431), two errors were made. First, there was an offset of 1 month when comparing model data to observational data. Second, the likelihood function that was used to relate model goodness-of-fit statistics to a probability distribution, while derived from basic understanding of probability distributions, is not acceptable as a likelihood function to statisticians.

[2] In this correction, we fix the mismatch that existed between model and observational annual averages. When calculating annual averages from model output, seasonal means were averaged, resulting in a given year being the average from December through November. When calculating annual averages from observational data, monthly means were averaged, resulting in a given year being the average from January through December. To correct for this 1 month mismatch, all annual mean temperatures derived from observations are calculated as December to November means, subject to the threshold criterion described in *Libardoni and Forest* [2011]. Because decadal mean temperatures are used for the surface temperature diagnostic, the 1 month shift in the averaging window has minimal impact on the resulting observational time series. Across all five decades and four zonal bands, the temperature differences due to the 1 month shift are at most 0.05°C. The revised time series are not shown.

[3] We also present revised results that implement a likelihood function proposed in *Lewis* [2013] that is more statistically sound as applied to the Bayesian methodology used in *Libardoni and Forest* [2011]. It will be shown below that the updated likelihood function alters the posterior distributions. The changes to the likelihood function involve changing the shape of the distributions used in statistical tests, changing the test statistic, and taking into account the necessary volumetric correction when making a change of variable. In total, the following changes were made to account for the likelihood estimate from *Lewis*[2013]:

[4] We estimate the likelihood from goodness-of-fit statistics using the probability density function (PDF) of an

*F*distribution, as opposed to 1 minus the cumulative distribution function (CDF) of an*F*distribution for the surface and upper-air diagnostics.[5] We use a

*t*distribution for Δr, rather than an*F*distribution for Δ*r*^{2}, for the ocean diagnostic.[6] We change the degrees of freedom in the statistical distributions to

*κ*, the number of EOFs retained in estimates of the noise-covariance matrices, and*ν*, the number of control run segments available to make these estimates, respectively. This results in a change from 3 and 24 to 16 and 49 degrees of freedom, respectively for the surface diagnostic, from 3 and 14 to 14 and 39 degrees of freedom respectively for the upper-air diagnostic, and from 3 and 24 degrees of freedom in an*F*distribution to an effective degrees of freedom of 4.1 in a*t*distribution for the ocean diagnostic.[7] We change the test statistic in the

*F*distribution for the surface and upper-air diagnostics from to , where*κ*are 16 and 14 for the surface and upper-air diagnostics, respectively.[8] We multiply the likelihood from the

*F*distribution by to account for the transformation from the data space (Δ*r*^{2}values) to the model parameter space.

[9] When calculating a likelihood function, a probability density function should be used to calculate the likelihood value for a given value of the test statistic. In previous work, the corresponding cumulative density function was used and the likelihood was estimated as the probability of obtaining values greater than the test statistic as typically done for hypothesis testing. Incorporating the five changes presented above corrects the error from previous work and, as described in *Lewis* [2013], results in a likelihood function that ensures that a probability density function is now used. The changes, however, do not implement a switch from Δ*r*^{2} to r^{2} in the test statistic as proposed in *Lewis* [2013]. Making this change would represent a change in the noise model that is not necessarily justified. Similar to assumptions used in linear regression, shifting from r^{2} to Δ*r*^{2} implies that the noise model for the residual variability is appropriate in the vicinity of Δ*r*^{2}=0, rather than assuming it is appropriate for all values of r^{2}.

[10] The differences between the likelihood functions discussed above, as implemented by each study for the surface diagnostic, indicate a stronger rejection of higher Δ*r*^{2} values using the *Lewis* [2013] likelihood function (Figure 1). Similar results hold true in the likelihood functions for the upper-air and deep ocean temperature diagnostics.

[11] We present a revised Figure 3 from *Libardoni and Forest* [2011] to show the impact of using the likelihood from *Lewis* [2013] (Figure 2). Marginal distributions for each parameter are presented using both the original method and the corrected method. Updated parameter distributions have not been included for the results from *Forest et al.* [2008] because the previous results were provided only for comparison purposes in *Libardoni and Forest* [2011] and not re-derived using the new methods in *Libardoni and Forest* [2011]. The corrected method leads to shifts in climate sensitivity posteriors due to the likelihood function change, however, the general shifts in the resulting cumulative distribution functions are small compared with the ranges due to other factors such as the observational data source [*Libardoni and Forest*, 2011], confirming the original results (Figure 3). In general, the distribution modes are more pronounced under the new likelihood function, a compensating narrowing of the distribution is present, and the lower bounds of the distributions show small increases.

[12] When testing the sensitivity of the distributions to individual changes in the likelihood function, it was found that changing from 1 minus the CDF of the *F* distribution to the PDFs of *t* and *F* distributions for the diagnostics led to a narrowing of the distributions. Increasing the degrees of freedom in the *t* and *F* distributions led to a broadening of the distributions and a shift towards lower climate sensitivity and aerosol forcing values. The net impact of the changes results in the observed narrowing of the marginal distributions when using the *Lewis* [2013] method. After all contributions from the individual changes are incorporated, the previously mentioned net narrowing of the distributions is observed (Figure 2).