Toward Calibrated Language for Effectively Communicating the Results of Extreme Event Attribution Studies

Extreme event attribution studies attempt to quantify the role of human influences in observed weather and climate extremes. These studies are of broad scientific and public interest, although quantitative results (e.g., that a specific event was made a specific number of times more likely because of anthropogenic forcings) can be difficult to communicate accurately to a variety of audiences and difficult for audiences to interpret. Here, we focus on how results of these studies can be effectively communicated using standardized language and propose, for the first time, a set of calibrated terms to describe event attribution results. Using these terms and an accompanying visual guide, results are presented in terms of likelihood of event changes and the associated uncertainties. This standardized language will allow clearer communication and interpretation of probabilities by the public and stakeholders.


Introduction
In the years since the first extreme event attribution (EEA) studies provided a quantification of the role of anthropogenic influence in the occurrence of a specific extreme (Stott et al., 2004), many such attribution studies have been conducted (see Herring et al., 2014Herring et al., , 2015Herring et al., , 2016. Attribution studies collectively seek to provide scientific information of interest to the public and media about the extremes that people experience (Hassol et al., 2016), contribute to climate adaption efforts from a quantitative, rather than descriptive, basis (Hulme, 2015), provide early warnings of extremes (Stott et al., 2013), and inform possible future litigation efforts for climate damages (Allen, 2003;Otto et al., 2017). The findings of EEA analyses are often communicated in the media and are one of the areas of climate science that members of the public will read or hear about the most. As such, in this commentary, we focus on how EEA results can be effectively communicated to nonexperts using a standardized language to have maximum benefit for interpretation by the public and informing adaptation policy.
In the time since Stott et al.'s (2004) study, there has been considerable evolution in EEA studies (National Academies of Sciences, Engineering, and Medicine, 2016). This evolution includes in the number of studies, their scope and applied methodologies, and the certainty of results. For EEA studies that have quantified the human influence in temperature extremes using an attribution framework, there is a general trend toward a greater degree of confidence in the results of these studies. In specific cases, this trend includes increasingly large anthropogenic influences determined for events, as measured through risk ratio (RR) or fraction of attributable risk (FAR) values. For example, the first quantitative examination of the 2003 record hot European summer found that the risk of such an event was doubled (RR of 2) due to climate change (Stott et al., 2004), while a study revisiting this event found a dramatically increased risk of extreme summers of the 2003 magnitude occurring in the decade since (Christidis et al., 2015). For some temperature extremes, studies (Imada et al., 2018;Knutson et al., 2018;Lewis & Karoly, 2014b;Perkins-Kirkpatrick et al., 2019;Walsh et al., 2018) have identified events that do not occur (infinite RRs or FAR values of one) in the model frameworks used within simulations run without anthropogenic influences (e.g., greenhouse gases). This saturation of RR values for temperature extremes for certain events has been explored in depth elsewhere (Harrington et al., 2018).
The maturation of the EEA research field, together with rapid increases in the attributable risks of some extreme events, presents new opportunities for effectively communicating scientific results. During an extreme weather or climate event, or in the aftermath of an event's impacts, scientists are often asked -"What caused this? Was this climate change?" (Hassol et al., 2016). As noted by Hassol et al. (2016), typically, a scientific answer free from uncertainties, caveats, and equivocation is most effective for communicating clearly the state of the scientific understanding of the human influence on the extreme being investigated.
In social or mainstream media, EEA study results have often been reduced to a single numerical value for RR or FAR, such that a hypothetical heatwave was, for example, 50 times more likely because of climate change. However, in our scientific experiences, the focus on a single RR or FAR number, conversely, confounds rather than clarifies EEA results and implications. We highlight one such example in which a rapid coupled climate model-based analysis was conducted of the anomalously hot sea surface temperatures (SSTs) in the Coral Sea in 2016 that coincided with significant bleaching of the Great Barrier Reef (GBR). This analysis estimated that there was at least 175 times increase in the likelihood of such hot conditions occurring during March due to anthropogenic greenhouse gases . This "175 times" result was predominantly the headline associated with this analysis, and key uncertainties, caveats around the possible model dependence of such results and complex links to the impacts of such SST conditions on the complex reef ecosystem were largely overlooked in the communication of this result We argue that the publication, communication, and dissemination of single, specific FAR, or RR value for extreme events, even where explicit uncertainty estimation has been undertaken, does not enhance the ability of EEA studies to provide scientific information to the public or policymakers about climate change, risk, and adaption. A finding that the SST anomalies of the Coral Sea in 2016 were 100, rather than 175, times more like due to anthropogenic climate change is unlikely to prompt differing responses in terms of informing adaption or litigation efforts or public understandings of climate change. Our experience of the limitations of single-value results for public communication is reflected in broader research demonstrating the added benefits of text together with numerical values, to convey likelihood and uncertainty to general audiences (e.g., Budescu et al., 2012Budescu et al., , 2014. In addition, a comprehensive treatment of EEA language has been identified as an important issue for event attribution (National Academies of Sciences, Engineering, and Medicine, 2016).

Calibrated Language for EEA
Here we propose a set of calibrated terms to describe EEA results in a standardized manner. Just as a common framework with associated calibrated language is used to discuss uncertainty in Intergovernmental Panel on Climate Change (IPCC) reports and is used to characterize findings of the assessment process in a standardized manner (Mastrandrea et al., 2011), we describe a language framework to accompany the numerical results of EEA studies. This framework draws primarily on the language of IPCC assessments. The approach outlined below is layered, with differing levels of information available for provision to different audiences.

Likelihood Scale
First, FAR and the equivalent RR values determined for specific extreme weather and climate events are categorized into bands of likelihood and given accompanying descriptive terms based on the degree of determined anthropogenic influence on the event (Table 1). We present seven Likelihood Categories of anthropogenic influence and use the following descriptions: "virtually certain the event would not have happened without climate change," "the event was very much more likely due to climate change," "the event was more likely due to climate change," "climate change did not alter the likelihood of the event," "the event was less likely due to climate change," "the event was very much less unlikely due to climate change," and "the event was exceptionally less likely due to climate change." These descriptions are associated with FAR or RR values, and we recommend that "very likely" (10th percentile) values be used, as discussed more fully in section 2.2.
These text descriptions of associated EEA likelihood values can also be accompanied by a visual communication tool. Graphics can be used with numbers and text to summarize probability data concisely in many contexts, including climate change and projections (Spiegelhalter et al., 2011). The graphical approach presented here is based on El Niño-Southern Oscillation outlook dials, such as the one employed by the 10.1029/2019EF001273

Earth's Future
Australian Bureau of Meteorology, that use several categories designated as "Watch," "Alert," and "Event" for El Niño and La Niña episodes (Gamble et al., 2017). In Figure 1, we apply a graphical approach to EEA text descriptions and present such a visual dial for the example of anthropogenic influence on the anomalously high Coral Sea SST anomalies observed in March 2016 during catastrophic GBR bleaching episodes.
Using this dial, it can be readily communicated that it was virtually certain that the extreme event would not have occurred without anthropogenic greenhouse gas influences in the model frameworks used. While a quantitative analysis is undertaken and described in detail in published results, a precise numerical value is not necessary to communicate likelihoods to stakeholders and the public. There is a strong precedent for this approach in the Australian Fire Danger Ratings, which use a numerical Fire Danger Index as the basis for a category-based rating system (low to catastrophic danger). Each rating is accompanied by a description of individual preparation actions required, such as activating a Bushfire Survival Plan.

Communicating Uncertainty
While we earlier discussed FAR and RR as single values determined through quantitative model and/or observational analysis, including using the GBR bleaching example, single values are in fact rarely presented in scientific studies. Rather, studies generally provide confidence intervals around determined FAR or RR values. Various statistical approaches to estimating uncertainties and outcomes have been published. A commonly used approach is to apply bootstrap resampling as a statistical tool for estimating the uncertainty in RR or FAR estimates and ultimately provide a confidence interval (see Lewis & Karoly, 2013;  Note. FAR = fraction of attributable risk; RR = risk ratio. Although the communication of multiple FAR or RR estimates conveys important information about uncertainty in EEA results, we argue that the communication of the Likelihood Category (again see Table 1) for an event based on a lower confidence bound (e.g., 10th percentile value associated with 90% confidence interval) is most useful. We also note that this would benefit overall clarity in EEA communication by allowing terms of "likely" and "very likely" to be reserved for the associated change in probability of an event occurring, rather than for the assessment of statistical error.
Effective communication of overall EEA results requires a layered approach that is audience centered. For an extreme event, simple categorized attribution statements or visual guides (e.g., Figure 1) are provided for general audiences using the Likelihood Scale determined from lower confidence bound RR or FAR values. For more technically adept audiences, such as in accompanying technical reports or peer review articles, comprehensive analysis using more nuanced language around attributable signals and the assessment of associated uncertainties can be communicated.

Confidence Assessment
Several comprehensive studies of the robustness of EEA results have demonstrated that for some event types, RR/FAR values are sensitive to experiment design, model frameworks, and event definitions (e.g., Angélil et al., 2017;Harrington, 2017;Lewis & Karoly, 2014a). This means that the robustness of individual EEA findings can be variable. To account for this spectrum of evidence from EEA studies, we propose that assessments of confidence also employ calibrated language, which we provide in Table 2.
In IPCC documents, confidence assessments are qualitatively made and depend on the type, amount, quality, and the consistency of available evidence (Mastrandrea et al., 2011). We present a layered approach to confidence assessments, which includes an overall confidence scale ("high confidence," "medium confidence," and "low confidence"), determined by the type, amount, and quality of evidence ("robust," "medium," or "limited") and the degree of evidence agreement ("high," "medium," or "low").
We argue that the greatest confidence in EEA results (high confidence) occurs when multiple independent studies obtain the same Likelihood Category, or where multiple independent and credible models obtain the same Likelihood Category. For example, two studies determined that the record hot year of 2014 in Australia was virtually impossible without anthropogenic forcings (Knutson et al., 2014;Lewis & Karoly, 2014b), giving high confidence in a "virtually certain anthropogenic" impact on this event. These studies, however, both used Coupled Model Intercomparison Project Phase 5 (Taylor et al., 2012) models, and hence the model dependence of this result was not explicitly explored and further confidence would result from a large number of analyses primarily based on different model frameworks. Overall, we have the highest confidence in the Likelihood Category of events which are explored using credible multimodel frameworks.
Multimethod approaches have been widely applied to understanding the factors contributing to extreme events in near real-time through the World Weather Attribution project, and attempts to synthesize results from multiple approaches have been made (World Weather Attribution, 2018). However, these multimethod collaborative approaches have not been accompanied by clearly defined, simple language around confidence. The assessment of confidence in the attribution of extreme events is an ongoing exercise, and high confidence in EEA results likely emerges over a period of time. As with the likelihood scale and Table 2 Extreme Event Attribution Confidence Assessment

Text description Evidence indicators Agreement indicators
High confidence Mulitple independent studies and multiple independent, credible models used ("robust") Agreement in likelihood scale ("high") Medium confidence More than one study and/or independent, credible model used ("medium") Agreement in likelihood scale ("high or medium") Low confidence Single study ("limited") Disagreement in likelihood scale ("low")

10.1029/2019EF001273
Earth's Future uncertainty communication, various levels of complexity of confidence assessments can be communicated to different audiences, including, for example, an overall rating of low confidence, or specific information about multiple independent but conflicting studies ("robust" evidence but "low" agreement).

Applications of Calibrated Language
We note that there are some important considerations and difficulties in providing information from EEA studies in terms of calibrated language. First, it is well documented that definition of an extreme event is a critical facet of the EEA result. Numerous studies have demonstrated a spatiotemporal-scale dependence of FAR/RR results, in addition to the specific metric being explored (as discussed in Angélil et al., 2016;Cattiaux & Ribes, 2018;Harrington, 2017;Uhe et al., 2016). Furthermore, the results of EEA studies can be dependent on whether attributable changes in likelihood or magnitude in events are examined, which is demonstrated by varying analyses of the 2010 Russian heatwave (see details in Otto et al., 2012). The diversity of approaches and definitions to EEA studies, given the potential sensitivity of results to event definitions, may make coherent assessments of individual events difficult. Nonetheless, in some cases, further analysis has reconciled seemingly discordant or differing EEA results (such as around the 2010 Russian heatwave).
In addition, different EEA studies have used different techniques for presenting likelihood statements (e.g., presenting best estimate, likely or very likely bounds) for events. For example, separate studies of Hurricane Harvey in 2017 focused on differing spatiotemporal and meteorological event definitions, and methods. Risser and Wehner (2017) focused their observation-based study on the most affected areas of Houston, where observed precipitation accumulations increased 3.5 times due to anthropogenic greenhouse gases (likely lower bound). A second study focused on the wider industry and infrastructure-relevant Gulf Coast region, concluding that global warming made the precipitation about 8% more intense (very likely lower bound, 2.5 times more likely; van Oldenborgh et al., 2017). A third study determined that recent climate warming contributed to the extreme precipitation that fell on southeast Texas during the period of Harvey by approximately 20% (Wang et al., 2018). While these studies apply differing event definitions, methodologies, and framings, they indicate a robust increase in attributable rainfall during the event and can provide a high level of confidence in EEA results.
Second, we note that uncertainty estimation requires further explicit consideration in EEA studies. As it has already been applied broadly in EEA studies, we earlier highlighted, bootstrap resampling approaches. However, we note that this technique may perform poorly in quantifying statistical uncertainty and comprehensive studies performed elsewhere argue for the implementation of more sophisticated statistical methods (Paciorek et al., 2018).
The examples we have presented throughout can be described using the language proposed. The anomalously hot SSTs in the Coral Sea in 2016 would be designated as "virtually certain" on the Likelihood scale, it is virtually certain the event could not have happened without climate change in the model frameworks used. In addition there is a "limited amount" of highly consistent evidence (in "high" agreement on likelihood scale), leading to "medium confidence" Lewis & Mallela, 2018). For the Hurricane Harvey example, the extreme rainfall associated with the event was "more likely" due to climate change, and again we designate "medium" confidence as there is "robust" evidence in medium agreement supporting this likelihood conclusion (Risser & Wehner, 2017;van Oldenborgh et al., 2017;Wang et al., 2018). We note that these are preliminary assessments of these example events, and further evidence may change the categorizations given above.

Further Considerations
Previously EEA studies have been written primarily for an audience with a high level of scientific literacy who are accustomed to interpreting results through a trained lens of uncertainty estimates and probability distributions. However, many of the users of event attribution results are the public and often little explicit thought is put into the accessibility of these results in forums for a wider audience. Previously complex attribution statements have been reduced to single numbers that have been misinterpreted or critical nuances overlooked. Here, we have put forward a framework for how attribution results could be communicated to a wide audience. The implementation of common and consistent language to summarize the findings of 10.1029/2019EF001273

Earth's Future
EEA studies would provide a clear way to communicate the intricacies of complex analysis, allowing results, uncertainties, and confidence levels to be conveyed simply to a broad audience of differing backgrounds.
The approach we have outlined here is founded on literature on effective communication of probabilistic statements. For example, Budescu et al. (2014) propose using both verbal terms and corresponding numerical values as an effective way to communicate probabilities and uncertainties. Probabilistic statements in IPCC reports have been found to be open to multiple interpretations, but the dual (verbal and numeric) approach has numerous benefits, including the ability to allow for categories to be better differentiated and increasing the consistency of interpretation of these terms (Budescu et al., 2012). However, the underlying probabilistic nature of attribution statements must not be lost in this communication.
We also emphasize that differing interpretations or misunderstandings about climate change, including the factors contributing to extreme events, are not purely a result of poor comprehension, or poor communication of scientific probabilities. Previous studies have demonstrated that in survey cohorts, members of the public with the highest science literacy level and technical reasoning capacity were not the most concerned about climate change impacts (Kahan et al., 2012). Interpretations of statements about climate change are also dependent on people's ideologies, and their prior views and beliefs about climate change issues (see Budescu et al., 2012Budescu et al., , 2014Kahan et al., 2012). Although these communication limitations remain, we present calibrated language here and recommend usage in public discussion as a step forward in providing clarity around EEA results