Quantifying the irreducible uncertainty in near‐term climate projections

If the Paris agreement at the Conference of Parties 21 is implemented very effectively, greenhouse‐gas emissions might decrease after year 2020. Whether this would lead to identifiable near‐term responses in “iconic” climate quantities of wide scientific and public interest is unclear, because the climate response would be obscured by quasi‐random internal variability. I define the climate response as an increase or decrease in a linear climate trend over the period 2021–2035, compared to 2006–2020, and establish the probability of such a trend change being caused by an assumed policy shift toward emissions reductions after 2020. I quantify the irreducible uncertainty in projecting such a trend change through very large (100‐member) ensembles of the state‐of‐the‐art climate model MPI‐ESM‐LR. Trends in global‐mean surface temperature (GMST) are higher over the period 2021–2035 than over 2006–2020 in one‐third of all realizations in the mitigation scenario RCP2.6, interpreted as implementing the Paris agreement, compared to around one‐half in the no‐mitigation scenario RCP4.5. Mitigation is sufficient to cause a GMST trend reduction with a probability of 0.40 and necessary with a probability of 0.33. Trend increases in Arctic September sea‐ice area and the Atlantic meridional overturning circulation are caused by the emissions reductions with a probability of only around 0.1. By contrast, emissions reductions are necessary for a trend decrease in upper‐ocean heat content with a probability of over one‐half. Some iconic climate quantities might thus by year 2035 exhibit an identifiable response to a successful Paris agreement but sometimes with low probability, creating a substantial communication challenge.

If the Paris agreement at the Conference of Parties 21 is implemented very effectively, greenhouse-gas emissions might decrease after year 2020. Whether this would lead to identifiable near-term responses in "iconic" climate quantities of wide scientific and public interest is unclear, because the climate response would be obscured by quasi-random internal variability. I define the climate response as an increase or decrease in a linear climate trend over the period 2021-2035, compared to 2006-2020, and establish the probability of such a trend change being caused by an assumed policy shift toward emissions reductions after 2020. I quantify the irreducible uncertainty in projecting such a trend change through very large (100-member) ensembles of the state-of-the-art climate model MPI-ESM-LR. Trends in global-mean surface temperature (GMST) are higher over the period 2021-2035 than over 2006-2020 in one-third of all realizations in the mitigation scenario RCP2.6, interpreted as implementing the Paris agreement, compared to around one-half in the no-mitigation scenario RCP4.5. Mitigation is sufficient to cause a GMST trend reduction with a probability of 0.40 and necessary with a probability of 0.33. Trend increases in Arctic September sea-ice area and the Atlantic meridional overturning circulation are caused by the emissions reductions with a probability of only around 0.1. By contrast, emissions reductions are necessary for a trend decrease in upper-ocean heat content with a probability of over one-half. Some iconic climate quantities might thus by year 2035 exhibit an identifiable response to a successful Paris agreement but sometimes with low probability, creating a substantial communication challenge.

| INTRODUCTION
I ask you to join me in a thought experiment: We jump into year 2035 and look back at the climate evolution over the 30 years prior, since 2006. The Paris agreement (Conference of Parties 21, 2015) at the Conference of Parties 21 (COP21) has been implemented very effectively, and greenhouse-gas emissions have fallen since 2020, along the scenario RCP2.6 (van Vuuren, Edmonds, et al., 2011;van Vuuren, Stehfest, et al., 2011). Can we see the effects of the policy change brought about by COP21 not only in the emissions but also in the climate response? The question is eminently policy-relevant because 15 years is a long enough period so that policymakers and society would expect to see a result of their effort to curb emissions. But the question is also eminently difficult to answer because 15 years is a relatively a short period in climate evolution, and trends even in an integrated quantity such as the global-mean surface temperature (GMST) are dominated by quasi-random internal variability over these timescales (Easterling & Wehner, 2009;Hartmann et al., 2013;Huber & Knutti, 2014;Liebmann, Dole, Jones, Blade, & Allured, 2010;Marotzke & Forster, 2015;Risbey et al., 2014). The climate response to a policy change toward emissions reductions and thus slower increase in radiative forcing is thus obscured and possibly overwhelmed by internal variability. Here I provide a new perspective on the resulting irreducible uncertainty in near-term climate projections (e.g., Hawkins, Smith, Gregory, & Stainforth, 2016), by analyzing very large (100-member) ensembles of state-of-the-art climate-model simulations with the help of a formal theory of event causation that was developed in computer science (Pearl, 2000) and recently introduced into climate science (Hannart, Pearl, Otto, Naveau, & Ghil, 2016).
I ask whether decadal-timescale climate events can be attributed to the policy change implied by the changeover from scenario RCP4.5 (interpreted in the near-term as a no-mitigation scenario) to scenario RCP2.6 (mitigation scenario, interpreted as implementing the Paris agreement). I restrict myself to the simple event definition asking whether a linear climate trend, for example, in GMST, over the period 2021-2035 is smaller than the linear trend over the period 2006-2020. Thus, I compare the 15-year periods after and before the maximum in emissions. For quantities for which we expect a long-term declining trend in a warming climate, I ask whether the trend over the period 2021-2035 is larger than the trend over the period 2006-2020. Before characterizing the methodology and results of my approach, I outline to what extent the question addressed here differs from previous approaches.

| CONTRAST TO PREVIOUS DETECTION AND ATTRIBUTION APPROACHES
The event attribution question I pose here differs markedly from event attribution performed previously. A rich literature now exists on the attribution of extreme climate and weather events to anthropogenic influences (e.g., Hannart et al., 2016;Herring, Hoerling, Kossin, Peterson, & Stott, 2015;Kirchmeier-Young, Zwiers, & Gillett, 2017;Schaller et al., 2016;Stott, Stone, & Allen, 2004). This attribution focuses on the long-term (multidecadal) anthropogenic change in the probability distribution of weather and seasonal events. By contrast, I consider events and possible causes that both occur on the decadal timescale, posing a more difficult signal-to-noise problem-hence the need for large ensembles.
Combining the theory of causation (Hannart et al., 2016;Pearl, 2000) with a large ensemble allows me to address attribution questions that, while also challenged by internal variability, are distinct from the Hasselmann style detection and attribution (e.g., Bindoff et al., 2013;Hasselmann, 1979) as well as the time-of-emergence concept (e.g., Christensen et al., 2007;Baehr, Keller, & Marotzke, 2008, who used the concept although not the term; Hawkins & Sutton, 2012). The distinction between the methods is perhaps best illustrated through applications all aiming at detecting and attributing responses to future forcing changes, represented through different scenarios (e.g., Li, Thompson, Barnes, & Solomon, 2017;Sanderson, Oleson, Strand, Lehner, & O'Neill, 2018;Tebaldi & Friedlingstein, 2013) or through geoengineering (Bürger & Cubasch, 2015;Jackson et al., 2015;Lo, Charlton-Perez, Lott, & Highwood, 2016). The Hasselmann approach defines an expected fingerprint of change such that the change in a model realization of future climate is projected onto the fingerprint with an optimal signal-tonoise ratio (e.g., Bürger & Cubasch, 2015;Lo et al., 2016); the fingerprint can be quite abstract. Time of emergence asks by what time two differently forced realizations of a quantity of interest differ by more than the internal variability (e.g., Jackson et al., 2015;Li et al., 2017;Sanderson et al., 2018;Tebaldi & Friedlingstein, 2013); the answer might well be, "at some distant point in the future". By contrast, I am asking about the consequences of future policy and hence forcing changes, for a given quantity and at a given time. As the example of the recent surface-warming hiatus has shown (e.g., Flato et al., 2013), such a question does arise for quantities of large scientific and public interest, and it requires an answer beyond stating a time in the future. I show here that such an answer can indeed be given.
In principle, any large ensemble of model simulations could be used, including the CMIP5 multimodel ensemble (Taylor, Stouffer, & Meehl, 2012). In that case, however, any difference between realizations obtained with different models stems from a combination of internal variability and model differences; the latter reflect uncertainty that is in principle reducible (epistemic). Here, I prefer using a large ensemble with a single model, because any difference between realizations can unambiguously be attributed to internal variability-the uncertainty thus reflected is aleatoric; it cannot be reduced through increased knowledge, and it is crucial to quantify this irreducible uncertainty.
The Max Planck Institute Grand Ensemble (MPI-GE) was generated with the Max Planck Institute Earth System Model version 1.1 (MPI-ESM1.1), which in turn builds on the Earth system model submitted to CMIP5 in the LR configuration (Giorgetta et al., 2013). 1 The 100 historical ensemble members are started from different times of the preindustrial control run and are driven by the CMIP5 historical forcing from 1850 to 2005. The final state of each historical ensemble member serves as the initial condition for one simulation each of scenarios RCP2.6 and RCP4.5 (van Vuuren, Edmonds, et al., 2011;van Vuuren, Stehfest, et al., 2011), out to year 2100. The MPI-GE differs from the other existing large ensembles with state-ofthe-art models through the use of the mitigation scenario RCP2.6 but also through its unprecedented size, 100 realizations compared to 40 in Kay et al. (2015) and 50 in Sigmond and Fyfe (2016) and Kirchmeier-Young et al. (2017). The MPI-GE furthermore differs from the large ensemble of Kay et al. (2015) through the initialization procedure, by taking different states from the control run instead of perturbing the atmospheric state slightly. Initialization effects on model spread are analyzed in Box 1. Anomalies in GMST and ocean heat content in the upper 2,000 m (OHC 2000 ) are calculated with respect to the time average over the period 1861-1880, a period with minimal anthropogenic and volcanic forcing.

| Observations used for model evaluation
All observational products used here are freely available for download. Observations for GMST are from HadCRUT4 (updates from Morice, Kennedy, Rayner, & Jones, 2012), for the Atlantic meridional overturning circulation (AMOC) at 26 N from the RAPID-WATCH project (updates from Smeed et al., 2014), and for heat content they are updates from Levitus et al. (2012). For the sea-ice area (SIA), I follow Notz and Stroeve (2016) and use satellite retrievals given by the National Snow and Ice Data Center (NSIDC) sea-ice index (Fetterer, Knowles, Meier, & Savoie, 2002, updated 2015, adjusted by adding the mean difference between this record and the earlier  SIA estimates based on the HadISST sea-ice record (UK Meteorological Office, 2006).

| The calculus of event causation
There are three different probabilities of event causation one can and should identify (Hannart et al., 2016;Pearl, 2000). The most straightforward and also the strongest diagnosis of causation states that the policy change from no-mitigation to mitigation, reflected here in the assumed change from RCP4.5 to RCP2.6, is both necessary and sufficient to cause a trend reduction. The probability P NS of causation both necessary and sufficient is given by the difference between the frequencies of the event "trend reduction" in the two ensembles, P RCP2.6 and P RCP4.5 (Hannart et al., 2016;Pearl, 2000): assuming P RCP2.6 > P RCP4.5 . A rigorous and formal derivation of (1), based on Boolean algebra, can be found in Pearl (2000) (p. 293). A heuristic justification appears useful and goes as follows. Without emissions reduction (scenario RCP4.5), the probability of the event "trend reduction" is P RCP4.5 . With emissions reduction (scenario RCP2.6), the probability increases by BOX 1

INITIALIZATION EFFECTS ON MODEL SPREAD
Since each historical realization serves as the starting point for both RCP2.6 and RCP4.5 realizations, I can investigate the time it takes to reach ensemble-spread saturation for the various quantities considered here, by taking the difference between each pair of RCP4.5 and RCP2.6 realizations (see the figures). The surface quantities GMST and Arctic September sea-ice area (SIA) reach 90% of the year-2035 ensemble spread after 5 years, whereas the oceanic quantities Atlantic meridional overturning circulation (AMOC) and ocean heat content in the upper 2,000 m (OHC 2000 ) reach 90% after 10-15 years (bottom figure). The ensemble-mean difference between RCP4.5 and RCP2.6 stays below the value of 1SD until year 2035, for all quantities considered here (top figure). Because of the large ensemble size, the SE of the ensemble mean is only one-tenth of the SD, and by 2035 the ensemble-mean difference exceeds the SE for all quantities except AMOC (not shown). the joint probability of two occurrences-the event not occurring in RCP4.5 and the event occurring in RCP2.6. This joint probability is P NS , so we get P RCP2:6 ¼ P RCP4:5 + P NS , and (1) follows by rearrangement.
The second type of causation considers RCP4.5 as the "real" world-"factual" in the jargon of causation theory-in which we assume no trend reduction. We ask in advance (in year 2005) whether a policy change toward RCP2.6-an imagined or the "counterfactual" world-would be sufficient to cause a reduction in GMST trend over 2021-2035 compared to 2006-2020. This probability of the policy change being sufficient to cause a trend reduction is given by (Hannart et al., 2016;Pearl, 2000): The justification of (2) goes as follows. The joint probability P NS can be thought of as the product of two probabilities: the first is the probability P S of the event occurring under the counterfactual assumption of emissions reductions, given that the event does not occur in the factual world of no emissions reduction; the second is the probability 1 − P RCP4.5 of nonoccurrence in the world of no emissions reduction. This is effectively a statement about conditional probability: where X stands for event "emissions reduction," Y for event "trend reduction," the overbar for nonoccurrence, Y X for trend reduction under emissions reduction, and the vertical bar for "conditioned on". Rearrangement of (3) and using (1) leads to (2). The third type of causation returns to the thought experiment at the opening of this study. Now RCP2.6 is the factual world and RCP4.5 the counterfactual one, and in retrospect (in 2035) we diagnose that the GMST trend over 2021-2035 was indeed smaller than over 2006-2020. The probability that the policy change to RCP2.6 was necessary to cause the observed trend reduction is then given by (Hannart et al., 2016;Pearl, 2000): The expression for P N in (4) is the same as the fraction of attributable risk that is widely used in the attribution of extreme events to anthropogenic influence, where the factual world is the one with anthropogenic forcing and the counterfactual world the one without (Hannart et al., 2016;Kirchmeier-Young et al., 2017;Stott et al., 2004).
The justification of (4) proceeds as a variant to that of (2). P NS can also be thought of as the product of two other probabilities: the first is the probability P N of the event not occurring under the counterfactual assumption of no emissions reductions, given that the event did occur in the factual world of emissions reductions; the second is the probability P RCP2.6 of occurrence in the world of emissions reductions. Hence we have: Rearrangement of (5) and using (1) leads to (4). Comparison of (1), (2), and (4) shows that P NS is always smaller than both P S and P N ; P S is smaller than P N , if 1 −P RCP4:5 > P RCP2:6 .

| PROBABILITY OF MITIGATION CAUSING TREND CHANGES
I consider four climate quantities that, through their expected changes in a warming world, all have taken on iconic status: GMST, the Arctic September SIA, the AMOC, and the ocean heat content of the upper 2,000 m, a major contributor to sealevel rise. Observations exist for all four quantities to allow some assessment of recent simulated trends. And the MPI-GE ensemble size means that the probability of an event occurring within the ensemble at any given time can be characterized with a resolution of 1%; "ensemble resolution" matters because I compare two probabilities that arise under two different forcing scenarios but that might differ by only a little in a noisy system.
The well-known challenge of discriminating between the climate response to two different forcing scenarios in the nearterm (Kirtman et al., 2013;Tebaldi & Friedlingstein, 2013) is illustrated by the near-indistinguishable ensemble representations of GMST over the period 2005-2035, for RCP2.6 and RCP4.5 (Figure 1a,b). It is only after around year 2030 that the ensemble means and the upper ends of the ensemble spread begin to deviate from each other. But for any individual realization-containing the full complement of internal variability-one would be hard pressed to classify it as belonging to one ensemble or the other, and a statistical analysis is required.
The distributions of the 15-year trends over the period 2006-2020 are nearly indistinguishable between the two scenarios (Figure 1c,f; 5-95% ranges are 0.01-0.34 and 0.00-0.30 K/decade for RCP2.6 and RCP4.5, respectively). The distributions deviate somewhat from each other for the period 2021-2035 (Figure 1d,g; 5-95% ranges are −0.02 to 0.27 and 0.03-0.35 K/ decade for RCP2.6 and RCP4.5, respectively), and the difference becomes clearer still if we look at the frequency of the decadal event as defined above: the GMST trend over 2021-2035 is smaller than the trend over 2006-2020 in P RCP2.6 = 67% of the realizations in RCP2.6, whereas it is smaller in only P RCP4.5 = 45% of the realizations in RCP4.5. With a probability of about one-third, the warming rate will increase in RCP2.6 despite the emissions reductions.
Near-term global-mean surface temperature (GMST) in Max Planck Institute Grand Ensemble (MPI-GE). I ask whether the post-2020 greenhouse-gas emissions reductions in scenario RCP2.6 cause a GMST trend reduction, as would be expected theoretically for the forced response (Gregory & Forster, 2008;Marotzke & Forster, 2015). (a) GMST time series for each realization, scenario RCP2.6. (b) As (a) but for scenario RCP4.5. The thick blue, red, and green lines show, respectively, the RCP2.6 and RCP4.5 ensemble means and the observations. (c-e) Frequency distribution of linear-trend sizes of GMST across the RCP2.6 ensemble over the periods (c) 2006-2020, (d) 2021-2035, and (e) for the trend difference between these two periods. (f-h) As (c-e) but for scenario RCP4.5. Bin size is 0.025 K/decade The difference in the event frequency between the two scenarios can be turned into precise statements about the probability of the event being caused by the policy change implied by the difference between RCP4.5 and RCP2.6-the answer to the attribution question in a noisy climate (Hannart et al., 2016;Pearl, 2000). When we consider RCP4.5 as the factual world in which we assume no trend reduction and ask in advance (in year 2005) whether a policy change toward RCP2.6-the counterfactual world-would be sufficient to cause a reduction in GMST trend over 2021-2035 compared to 2006-2020, we obtain for this probability of the policy change being sufficient to cause a trend reduction, according to (2): Now we ask the converse question, close to the thought experiment at the opening of this study. Now RCP2.6 is the factual world and RCP4.5 the counterfactual one, and in retrospect (in 2035) we diagnose that the GMST trend over 2021-2035 was indeed smaller than over 2006-2020. The probability that the policy change to RCP2.6 was necessary to cause the observed trend reduction is then according to (4): The strongest diagnosis of causation-the policy change is both necessary and sufficient-is according to (1) characterized by: The probabilities P S = 0.40, P N = 0.32, and P NS = 0.22 of near-term GMST trend reduction being caused by emissions reduction are nonnegligible. But this causation is far from certain, and in particular the perhaps desired outcome-a policy measure is needed and will suffice to bring the desired effect-occurs with only a small probability P NS . We are thus faced with substantial irreducible uncertainty about the GMST response to possible greenhouse-gas emissions reductions after year 2020.
Causation of a trend change is even less probable for two other iconic climate quantities, the Arctic September SIA (Notz, 2015;Notz & Marotzke, 2012;Notz, Haumann, Haak, Jungclaus, & Marotzke, 2013;Stroeve et al., 2012;Swart, Fyfe, Hawkins, Kay, & Jahn, 2015) and the AMOC (Smeed et al., 2014) at 26 N. The ensemble means in RCP2.6 show declines over the period 2006-2035 of about 10 6 km 2 and 2 Sv (Sverdrup, 1 Sv ≡ 10 6 m 3 /s), respectively (Figure 2), and the RCP4.5 ensembles differ very little from their respective RCP2.6 counterparts (not shown). Comparing the periods 2021-2035 and 2006-2020, a trend increase occurs in just under one-half of the realizations in RCP4.5 and in slightly above one-half of the realizations in RCP2.6 for both quantities (Table 1). This translates into low probabilities (around 0.15) of sufficient causation and necessary causation for both quantities and very low probabilities (below 0.10) of causation both necessary and sufficient (Table 1). Owing to the substantial internal variability in SIA and AMOC, we do not expect a visible response of either quantity to emissions reductions in the near-term.
The final quantity I consider here is the change in OHC 2000 , a major contributor to global sea-level change (e.g., Church et al., 2013) and a quantity that is expected to be less affected by internal variability, for example, OHC 2000 kept increasing unabated during the surface-warming hiatus of the early 21st century (Flato et al., 2013), which in turn contained a major contribution from internal variability (Flato et al., 2013;Fyfe et al., 2016;Hedemann et al., 2017;Huber & Knutti, 2014;Marotzke & Forster, 2015;Medhaug, Stolpe, Fischer, & Knutti, 2017;Risbey et al., 2014). The expectation of a weaker influence The probabilities that the policy change from RCP4.5 to RCP2.6 causes a reduction in the OHC 2000 trend come out as follows. Viewed backward from year 2035, in a factual world of RCP2.6 and a trend reduction, we would conclude that with a probability P N = 0.53 the policy change was necessary to cause the reduction. But in advance, viewed from year 2005 in a factual world of RCP4.5 and assuming no trend reduction, we assess that the policy change would be sufficient with a  Probability that the shift from RCP4.5 to RCP2.6 causes the respective trend reduction or increase, in a sufficient causation sense (Column 4), in a necessary causation sense (Column 5), and in the sense of being both necessary and sufficient (Column 6). The last row gives the probabilities of GMST trend reduction for the period 2036-2050, compared to the period 2006-2020.
probability of only P S = 0.20. And the probability of policy change being both necessary and sufficient for causing trend reduction lies at an even lower 0.17 (Table 1). The methodology I have presented can be applied to a wide range of quantities and time windows. To offer a glimpse of the possibilities, Table 1 additionally lists the results for GMST trends over a 15-year period near mid-century (2036-2050), compared to the period 2006-2020. We note a substantially increased probability of trend reduction for this later period in scenario RCP2.6 and, as one consequence, a higher than two-in-three probability that the policy change from RCP4.5 to RCP2.6 is sufficient to cause 15-year GMST trend reduction by mid-century.

| CONCLUSION
I have here provided a proof of concept for attributing decadal-timescale changes in crucial climate quantities to changes in climate forcing. An essential component of this framework is the quantification of the irreducible uncertainty in near-term climate projections that arises from internal variability; this quantification can here be performed rigorously, albeit within the confines of a single model.
My attribution results are contingent upon the climate model simulating the correct ratio of forced response to internal variability. The model appears to do a reasonable job concerning GMST (Flato et al., 2013) but underestimates the forced response in Arctic sea-ice retreat (Notz & Stroeve, 2016). Model realism is harder to assess for AMOC and OHC 2000 because of poorer observational coverage, but I note that the simulation ensemble encompasses the observed trends considered here (Figure 3 for OHC, not shown for AMOC). Nonetheless, a more in-depth evaluation of simulated internal variability is required, preferably through a multimodel meta-ensemble of large ensembles. Furthermore, my results will change, perhaps substantially, if a major volcanic eruption occurs in the near-term future or if solar variability differs from that in the scenarios.
My thought experiment demonstrates that it is crucial to have realistic expectations of the efficacy of climate policy in the near-term: Even if greenhouse-gas emissions begin to decline after year 2020, the probability is substantial that the response of iconic climate quantities to this decline will not have emerged by year 2035. According to the model used here, GMST will rise at a faster rate with a probability of as much as one-third. Should this occur, we might face the "hiatus" debate in reverse-the question will be asked why temperature rises faster despite falling emissions.
The major advance brought about by my analysis lies in the ability to quantify the degree of irreducible uncertainty about whether the assumed emissions reduction will cause the desired climate response over a given timescale. The probability of this response occurring depends on the quantity in question but also on the type of causation; for the time horizon out to 2035 the probability lies here in the range between a bit under 0.1 for causation both sufficient and necessary for SIA and AMOC and a bit above one-half for necessary causation for ocean heat content.
Communicating these probabilities will be nontrivial but will be aided by the precise definitions and meanings underlying them (Hannart et al., 2016;Pearl, 2000). The communication challenge (Deser et al., 2012) furthermore supports the notion that the recent hiatus was not a distraction to the scientific community (Lewandowsky, Risbey, & Oreskes, 2016) but instead provided an opportunity to communicate the role of internal variability  to an audience that might otherwise be disinclined to engage in this discourse. Code availability: The model version MPI-ESM1.1 used to generate the MPI-GE is available at http://www.mpimet.mpg.de/ en/science/models/mpi-esm.html. Computer code used in data analysis has been archived by the Max Planck Society for the Advancement of Science under http://hdl.handle.net/21.11116/0000-0002-5A51-E.

RELATED WIREs ARTICLES
Prospects for decadal climate prediction Use of models in detection and attribution of climate change Ensemble modeling, uncertainty and robust predictions Modelling interdecadal climate variability and the role of the ocean