Numerically Bounded Linguistic Probability Schemes Are Unlikely to Communicate Uncertainty Effectively

In a recent issue of Earth's Future (vol. 7, pp. 1020–1026), S. C. Lewis et al. (2019, https://doi.org/10.1029/2019EF001273) recommended a numerically bounded linguistic probability (NBLP) scheme for communicating probabilistic information in extreme event attribution studies. We provide a critique of NBLP schemes in general and of Lewis et al.'s in particular, noting two key points. First, evidence from voluminous behavioral science research on the interpretation of linguistic probabilities indicates that NBLP schemes are an ineffective means of communicating uncertainty to others. Second, where the motivation to implement such schemes nevertheless persists, the schemes should be developed through an evidence‐based approach that seeks to optimize interpretational agreement between the scheme and users.


Introduction
In recent years, there has been considerable interest in, and much importance attached to, accurately communicating information about climate change and its causes to the public. In that spirit, Lewis et al. (2019) suggest that publications aimed at lay readers employ numerically bounded linguistic probability (NBLP) schemes to convey scientists' uncertain and probabilistic forecasts and causal statements. (Lewis et al., 2019, use the term calibrated language to refer to such schemes. However, because the term calibrated has disparate technical meanings in related fields, we prefer to avoid the ambiguity, and we denote them with the term NBLP.) Specifically, Lewis et al. (2019) recommend use of a seven-point, ordered scale with probabilistic statements (e.g., "Virtually certain the event would not have happened without climate change") that have associated risk ratios (RRs) and fractions of attributable risk (FARs), both expressed as intervals. Their recommendation is consistent with similar NBLP schemes used for several decades in a variety of organizational contexts. These include the Intergovernmental Panel on Climate Change (IPCC; Mastrandrea et al., 2011), the U.S. Environmental Protection Agency (Morgan, 1998), and most intelligence organizations, including in all five-eye countries and in NATO (Friedman, 2019; for an early example, see Kent, 1964). The key feature of NBLP schemes is an ordered list of linguistic probability terms or phrases accompanied by numeric ranges intended to define their imprecise meanings.
We believe that in the absence of appropriate empirical work, such schemes are more likely to obfuscate than to illuminate the uncertainties associated with the events in question. The primary aim of communicating scientific thinking about climate and such events to the public should be to maximize the degree to which the end users of communications interpret the message as intended by the writers. Below, we review key research findings that undermine the use of NBLP schemes in general. Then we focus on the Lewis et al. (2019) scheme in particular. Our critiques lead to two recommendations. The first is that communicators of probabilistic information should consider means other than NBLP schemes to improve the clarity and usability of their communications to the relevant audiences. The second point is that when NBLP schemes are used, they should be developed with empirical testing aimed at optimizing the correspondence between end users' understanding of the focal terms and the stipulated ranges adopted in the schemes. They should not, as is often the case, be constructed by fiat.

Problems With NBL Schemes
One problem with using NBLP schemes to describe scientific uncertainties to the public is the well documented finding that people generally prefer to receive such communications in numerical form, even while they generally prefer to communicate them verbally (Erev & Cohen, 1990 Murphy et al., 1980;Olson & Budescu, 1997;Wallsten et al., 1993). Erev and Cohen (1990) introduced the term preference paradox to denote these simultaneous preferences, often in the same individual. One reason that communicators prefer expressing uncertainty verbally rather than numerically is that it can serve to avoid an "illusion of rigor" (Friedman et al., 2018). Another reason that people prefer using natural language rather than numbers to convey probability is that words are easier and more natural (Wallsten et al., 1993). In contrast, the primary reason that people generally prefer to receive such communications numerically is that numbers are more precise and transparent (Wallsten et al., 1993). However, the preferences in either direction are not fixed, and there are conditions under which they reverse. Wallsten et al. (1993) concluded from their results: Perhaps of greatest interest, however, is our respondents' flexibility in communicating about uncertainty as a function of the nature of the issue and the strength of the data. Generally, people indicated a preference for numerical communication when the situation was important or when numerical estimates were supported by the information base. In contrast, they preferred verbal communication when the situation was unimportant or the information base weak. (p. 138) This suggestion raises the ironic possibility that using verbal terms to express probability suggests to the audience that the results are less important or based on sparser data than the communicators think is the case. Thus, this mode of communication may have exactly the opposite effect from that intended.
Another problem with NBLP approaches is the evidence that they often are ineffective. Several studies have shown that after being presented with a NBLP scheme, end users fail to interpret the identified terms within their stipulated ranges. Budescu et al. (2009Budescu et al. ( , 2012Budescu et al. ( , 2014) studied this issue extensively in the context of the IPCC standard. The proportion of end users having an interpretation consistent with the IPCC standard after viewing it ranged from 21% to 35% across 25 countries (based on approximately 200 respondents per country). Even when the stipulated numeric ranges were printed beside the linguistic probability terms within the text, the proportion of end users who gave interpretations consistent with the terms ranged from 28% to 54% (again, based on separate samples of approximately 200 respondents per country). (See Figure 3 in Budescu et al., 2014.) These findings were well replicated using a NBLP scheme adopted by the U.S. intelligence community (Wintle et al., 2019), and similar interpretational discrepancies have been documented in frequency based language schemes (Berry et al., 2003). Note, however, that most NBLP schemes do not require numeric ranges to be printed in assessments. Therefore, the proportion of end users that interpret probability terms as stipulated is likely to be small. Even when the ranges are printed alongside the terms, there is a risk that they will be interpreted as assessment-specific credible intervals rather than as assessment-general stipulated ranges. After all, end users are likely to be focused on the substance of the text rather than on how to interpret the probability terms.
Yet another problem with all NBLP schemes, and a likely explanation for the results summarized just above, is that linguistic probabilities are context dependent in a variety of respects, which is to be expected given their function as relative adjectives (Clark, 1990). For instance, their meaning is influenced by perceived base rates of the forecasted events (Wallsten et al., 1986;Weber & Hilton, 1990), event severity (Harris & Corner, 2011), outcome valence (Mandel, 2015;Mullet & Rivet, 1991), and content domain (Brun & Teigen, 1988;Mellers et al., 2017). Such effects imply that the meaning of linguistic probabilities will vary not only across individuals but also within individuals as they encounter identical expressions in different contexts. The Wallsten et al. (1986) study is especially compelling as its respondents were professionals working for the National Weather Service who were using probability terms with assigned probability intervals in their daily work, but when the same expressions were embedded in different contexts, they ignored this translation scheme. To put it bluntly, it is practically impossible to mandate the use of natural language.
Finally, NBLP schemes can implicitly communicate action recommendations, either inadvertently or intentionally but, in any case, inappropriately. This is because linguistic probabilities not only convey probability levels but frequently also imply a directionality that in turn suggests expected actions. For example, terms that convey comparable probability levels can signal optimism (e.g., some chance) or pessimism (e.g., doubtful) regarding desirable future events (Teigen & Brun, 1995, 1999. Collins and Mandel (2019) found that individuals perceive linguistic probabilities as communicating probability levels less clearly than do numerical probabilities. They also found that, for low-probability terms, individuals 10.1029/2020EF001526 perceived implicit recommendations more clearly from the verbal probability term than from the numeric probability, even though the communicators provided no explicit recommendations. This tendency can add an influence function, such as nudging policy-makers or the public toward a particular viewpoint or policy, when that is outside the communicator's mandate (Piercey, 2009).

Specific Comments on Lewis et al. (2019)
We turn now to Lewis et al.'s (2019) article, which suggests language for communicating RR or FAR estimated from extreme event attribution (EEA) studies. Two features of this paper merit discussion as illustrations of the problems we outlined above. What makes their paper particularly notable is that the EEA approach provides a model-based means for estimating probabilities of well-defined events having specific effects (National Academies of Sciences, Engineering, and Medicine, 2016). Lewis et al. suggest that lay readers cannot fathom the RR or FAR results per se but will understand the linguistic interpretations they suggest.
The first problem that we address is Lewis et al.'s (2019) specific language choices (see their Table 1 or Figure 1). Five of the seven alternatives seem difficult to interpret. For brevity we illustrate the point with only two of the five: The event was very much more likely due to climate change. The event was exceptionally less likely due to climate change. (We assume the word unlikely in line 6 of Lewis et al.'s Table 1 was a typographical error and they intended the word likely). These are comparative descriptors and beg the question, the event is much more likely or exceptionally less likely due to climate change than due to what? They are not interpretable in an absolute sense.
The bigger point is that the choice of terms is not backed by research. Do people understand them as Lewis et al. (2019) intend? Is their interpretation insensitive to context differences? One cannot just compile a set of probability terms and assume they convey the intended values even if they are explicitly defined (Ho et al., 2015).
Our second concern with Lewis et al.'s (2019) approach is that it seems counterproductive within the EEA framework. Linguistic probabilities are vague, they have context-dependent meanings, and they often convey unintended recommendations to end users, as we pointed out in discussing NBLP schemes in general. But EEA results are typically drawn from scientific research in which the probabilities and uncertainties have already been quantified (Lewis et al., 2019;National Academies of Sciences, Engineering, and Medicine, 2016). Thus, NBLP schemes in this context seem to make the information both more vague and coarse. This effect contradicts sound advice from the European Food Safety Authority (EFSA), which recommends against converting quantitative scientific estimates into linguistic probabilities for public consumption of risk information (European Food Safety Authority et al., 2018).
It is possible that our criticism is misplaced and the lay public does understand a suitably selected set of terms more clearly than they do quantitative RR and FAR values, but this can only be determined with appropriate empirical work, which does not appear to have yet been done.

Recommendations
Given the diversity of aims and capabilities among targeted audiences, such as the media, policy-makers, and the public, we caution against the use of communication strategies that presume how the information communicated will be understood. In some important cases, quantitative information can better support decision-making than can linguistic information. For instance, when end users receive multiple estimates from different sources (e.g., different advisors), they may wish to know the aggregate estimate, such as the mean or median value. That easily can be done with numbers, but not with probability words. Recent research shows that accuracy is greater when nonexpert end users estimate averages and products from numeric than from linguistic probabilities . Visual methods, such as icon arrays, have been shown to improve comprehension of risk information by less numerate end users (Galesic et al., 2009) as well as by those with relatively low graph literacy (Okan et al., 2015).
Although we caution against the use of NBLP schemes, we realize that some organizations will nevertheless adopt them. Accordingly, we recommend that such schemes draw on relevant empirical research on linguistic probability interpretation and be further empirically tested before being adopted. For instance, 10.1029/2020EF001526 Ho et al. (2015) demonstrated that by eliciting numeric translations of probability terms in the IPCC and in various intelligence community NBLP schemes, they could improve the degree of agreement between end users' interpretations of the probability terms and the stipulated ranges in these schemes (for a conceptual replication, see Wintle et al., 2019). Usually, however, as in national security intelligence , NBL schemes are devised by fiat by a small group of insiders, without a clear methodological justification and without empirical testing.

Data Availability Statement
Data were not used nor created for this research.