The effect of likelihood and impact information on public response to severe weather warnings

Meteorological services are increasingly moving away from issuing weather warnings based on the exceedance of meteorological thresholds (e.g., windspeed), toward risk‐based (or “impact‐based”) approaches. The UK Met Office's National Severe Weather Warning Service has been a pioneer of this approach, issuing yellow, amber, and red warnings based on an integrated evaluation of information about the likelihood of occurrence and potential impact severity. However, although this approach is inherently probabilistic, probabilistic information does not currently accompany public weather warning communications. In this study, we explored whether providing information about the likelihood and impact severity of forecast weather affected subjective judgments of likelihood, severity, concern, trust in forecast, and intention to take protective action. In a mixed‐factorial online experiment, 550 UK residents from 2 regions with different weather profiles were randomly assigned to 1 of 3 Warning Format conditions (Color‐only, Text, Risk Matrix) and presented with 3 warnings: high‐probability/moderate‐impact (amber HPMI); low‐probability/high‐impact (amber); high‐probability/high‐impact (red). Amongst those presented with information about probability and impact severity, red high‐likelihood/high‐impact warnings elicited the strongest ratings on all dependent variables, followed by amber HPMI warnings. Amber low‐likelihood/high‐impact warnings elicited the lowest perceived likelihood, severity, concern, trust, and intention to take protective responses. Taken together, this indicates that UK residents are sensitive to probabilistic information for amber warnings, and that communicating that severe events are unlikely to occur reduces perceived risk, trust in the warning, and behavioral intention, even though potential impacts could be severe. We discuss the practical implications of this for weather warning communication.

events (World Meteorological Organization [WMO], 2015), with the need for the development of hazard early warning systems emphasized in the Sendai Framework for Disaster Risk Reduction (United Nations Office for Disaster Risk Reduction [UNDRR], 2015).In order to better align weather warnings with the risk of harm posed by severe weather, meteorological services are increasingly moving away from warnings based on meteorological thresholds alone to impact-based warnings, which weight the potential severity of weather impacts by the likelihood of them occurring (WMO, 2015).The UK Met Office has issued impact-based warnings since 2011 (Goldstraw, 2012).However, these are typically provided to the public without explicit information about impact likelihood and severity.This means that the same warning level can be issued for high-probability/lowimpact events as for low-probability/high-impact (LPHI) events.Using UK wind warnings, we explore the effect of providing information about likelihood and severity of weather impacts in Text and Risk Matrix formats on perceived risk, forecast trust, and behavioral intention.

Severe weather warnings in the UK
In the UK weather warnings are issued by the Met Office's Severe Weather Warning Service for wind, rain, snow, ice, fog, thunderstorms, lightning, and heat (Met Office, 2021a).Yellow, amber, and red denote increasing levels of risk.Warning level is based on a Risk Matrix (Figure 1).Red warnings always indicate high-probability/high-impact (HPHI) events.Yellow and amber warnings, however, can denote varying levels of likelihood and severity.Amber warnings for instance can denote both high-probability/moderate-impact (HPMI) events and LPHI events.Members of the public can access the Risk Matrix on the Met Office website.However, public weather warning dissemination, (e.g., news media, Met Office app), typically presents Color-only warnings.
Although the use of the same warning level to capture HPMI and LPHI events follows an expected value approach to risk that weights consequence by likelihood, the characteristics of low, moderate (medium), and high-impact events are different.The behavioral responses required to reduce risk may also differ across impact levels (e.g., being prepared for travel delays for low impact events, cancelling nonurgent journeys for medium-impact events, and sheltering in place for an extended period for a high-impact event) (Met Office, 2021a).If people associate warning levels with impacts alone, they may underestimate or overestimate the potential severity implied by yellow or amber warnings.Moreover, being unaware that a severe event is unlikely but possible may diminish trust in warning providers if it does not materialize (Ripberger et al., 2015).Indeed, LPHI warnings may be issued at longer lead times where there is greater uncertainty about future conditions and subsequently revoked or upgraded to high/medium probability as lead time and uncertainty decrease.Consequently, there are ongoing discussions amongst the operational forecasting community as to how LPHI warnings should be communicated to the public (Zhang et al., 2019).
To date, few studies published in peer-reviewed literature have explored public responses to severe weather warning communications in the UK.Those available suggest that people are broadly aware of the ordinal nature of current color-coded warning levels (Mu et al., 2018;Taylor et al., 2019).However, qualitative work has demonstrated that misinterpretations can occur, particularly with respect to yellow and amber (Tang & Runblad, 2015).In a UK survey exploring public responses to warnings issued for a 2017 severe wind event, Taylor et al. (2019) found that while perceived risk was higher amongst those in areas with amber versus yellow warnings, there was no difference in reported behavioral response.However, noise may have been introduced into this data by some participants incorrectly identifying the warning color for their local area on a warning map.Here, we explore perceived risk and behavioral intention where local warning color is unambiguous, and where warnings are presented with or without probability information.

Probability and judgments about weather risk
The utilization of probabilities in decision making has been widely studied in the behavioral sciences.Studies using willingness-to-pay and choice evaluation paradigms have found that sensitivity to probability may be absent or diminished for "affect rich" (i.e., emotionally salient) outcomes such as medical side effects or electric shocks compared to "affect poor" outcomes such as small monetary losses (Rottenstreich & Hsee, 2001;Suter et al., 2016).Finding that willingness-to-pay was less sensitive to probabilities for affect rich (vs.affect poor) descriptions of the same arsenic poisoning event, Sunstein (2003) posited that emotionally salient LPHI risks elicit a focus on impacts at the expense of probability.In the context of weather, where impact severity can vary from minor inconveniences (e.g., traffic delays) to severe outcomes (e.g., fatalities), findings regarding integration of probabilities into decision making have been mixed.In one US study probability neglect was suggested as a possible explanation for disparities between perceived risk from flooding and heat and event observations (Allan et al., 2020), although this may have been attributable to recent events biasing responses in line with the availability heuristic (Tversky & Kahneman, 1973).Studies of willingness-to-pay for flood insurance have suggested that while some people are averse to the risk of losses to the extent that likelihoods are neglected, others dismiss low probability events entirely (Botzen & van den Bergh, 2012;Robinson & Botzen, 2019).This may align with prospect theory's weighting function (Kahneman & Tversky, 1979), which holds that small probabilities events may be disregarded (i.e., treated as impossibilities) or overweighted.
In Taylor et al.'s (2019) post-event survey, anticipated impact severity was strongly correlated with anticipated likelihood and trust in warnings.This is consistent with the affect heuristic, whereby feelings toward one characteristic of risk (e.g., potential for harm) may be used as a shortcut to form a generalized assessment of the risk (Finucane et al., 2000).However, as these participants were only provided with the warning color for their region, this may be due to the absence of information on likelihood and severity.Nonetheless, other studies examining perceptions of likelihood and severity have found that when warned for events are more severe they tend to be perceived as more likely to occur (see Ripberger et al., 2022;Weber & Hilton, 1990).
Choice experiments comparing responses to probabilistic versus deterministic weather information have generally been favorable toward the provision of uncertainty information, suggesting that it can lead to better overall decision outcomes over a series of choices (Ramos et al., 2013;Roulston & Kaplan, 2009;Stephens et al., 2019) and reduce loss of trust from false alarms (Joslyn & LeClerc, 2012;LeClerc & Joslyn, 2015).However, providing more information has not universally been found to improve outcomes.Mu et al. (2018) found that while providing risk matrices increased understanding and trust (compared to Color-only warnings), it did not improve scores on a task where participants had to choose whether to take costly protective action or not.However, many of these studies asked participants to take the perspective of an organizational decision maker, rather than consider how they would respond in day-to-day life.Although this provides quantifiable measures of performance, it may not reflect perceptions of personal risk or individual-level behavioral response to severe weather warnings (e.g., changing travel plans).
Here, we explore whether presenting information about likelihood and potential impact severity affects perceived likelihood, potential severity, concern, and behavioral intention.If probability is neglected for LPHI weather events, then providing probability and impact information should lead to greater concern and protective behavioral intention by emphasizing that potential impacts are severe.If communicating that an event is rare but severe leads some to underweight and others to overweight low probability events, then we might expect the standard deviation of response measures to be wider for LPHI than HPMI and HPHI warnings.

Trust
In the broader risk perception literature, trust in risk information and information providers has been found to have a complex relationship with risk perception and response, with some studies finding a link between trust and behavioral response (Siegrist, 2021;Wachinger et al., 2013).In the context of weather, trust measures have generally been found to predict greater intention to undertake protective responses (e.g., Kox & Thieken, 2017;Morss et al., 2016;Ripberger et al., 2015;Sherman-Morris, 2005;Taylor et al., 2019).Moreover, there is evidence that providing information about uncertainty may increase trust in warnings and forecasts and reduce a loss of trust from false alarms (Joslyn & LeClerc, 2012;LeClerc & Joslyn, 2015).In the UK and USA it has also been found that warnings for more severe events elicit greater trust than warnings for less severe events (Losee & Joslyn, 2018;Taylor et al., 2019), a finding that may be due to more threatening events creating a psychological need to trust in social systems (Jost et al., 2004;Losee & Joslyn, 2018) but may also be due to trust in forecast reflecting confidence that an event will occur.Indeed, Taylor et al. (2019) found that "trust in forecast" and "trust in the Met Office" as an institution made separate contributions to the prediction of behavioral intention, suggesting that the two are related but distinct.
Here we expand on Taylor et al.'s (2019) work, where only amber and yellow warnings could be compared, to examine trust in red warnings, which represent the highest level of risk (i.e., HPHI).Moreover, we explore whether trust differs among HPHI, HPMI, and LPHI warnings.If severity alone increases trust in forecasts, then trust should be higher for HPHI and LPHI warnings than HPMI warnings.However, if probability affects trust in the forecast, then it should be higher for HPHI than both LPHI and HPMI.

Format
This study compares the provision of Color-only warning with those including information about likelihood and impact severity in the form of (a) text description; and (b) the Met Office's Risk Matrix.Multiple studies have investigated the provision of numeric probabilities when communicating about specific meteorological values such as probability of precipitation (e.g., Joslyn & Nichols, 2009;Handmer & Proudley, 2007;Morss et al., 2008).However, in the context of impact-based warnings, attaching numeric values to probabilities and impacts in public-facing communications may not be feasible due to the fact that warnings are not based on exceedance of consistent numeric thresholds (e.g., windspeed exceedance).Presenting information using ordinal categories (e.g., low, medium, high), as captured in the Risk Matrix itself, represents one way to characterize this information non-numerically.Studies examining the use of verbal categories to convey probability and outcome information in hypothetical decision contexts related to health, consumer choice, and environmental management have found that they may help people to understand and utilize risk information compared to receiving numeric information alone, through providing an evaluative structure (Dieckmann et al., 2012;Peters et al., 2009).However, a potential disadvantage of using verbal categories is that they may be interpreted differently by different people (Dhami & Mandel, 2021).
For instance, interpretation of the terms "very likely" and "likely" has been found to vary considerably between individuals (Budescu et al., 2009).Likewise, terminology related to severity may affect perceptions of probability, with events described as being more severe perceived as more likely (Harris & Corner, 2011).Risk matrices emphasizing the ordinal nature of "low," "moderate," and "high" categories may remove some ambiguity from interpretation compared to verbal categories and Color-only warnings.Mu et al. (2018) found that risk matrices assisted interpretation relative to Color-only warnings.However, another study examining risk matrices in a broader context did not find that they consistently improved performance on a risk comparison task relative to text controls (Sutherland et al., 2022).In this case, however, both Text and Risk Matrix formats contained numeric likelihood information, which as noted, is unlikely to be feasible in public impact-based warning communications.In this study, we will thus explore whether presenting probability information using verbal descriptions or risk matrices affects perceived risk, trust in forecast, and behavioral intention, in comparison to a Color-only control.If expressing probability and impact verbally elicits varied interpretations, as per Budescu et al.'s (2009) findings, then we would expect to see greater standard deviations in measures of perceived likelihood and severity for the text than the Risk Matrix.

Location
In the UK impact-based weather forecasts are regionally calibrated, meaning that the meteorological conditions (e.g., windspeeds) needed to trigger a warning in one area may be different than that for another region, depending on vulnerability and infrastructure (Hemmingway & Robbins, 2020;WMO, 2015).However, many UK residents believe that the conditions needed to trigger a warning are the same everywhere (Taylor et al., 2019).Moreover, there is evidence that perceptions of weather and climate risk differ between southern and northern areas of the UK (Palutikof et al., 2004), with anecdotal evidence from research commissioned by the UK Met Office suggesting that the those in northern areas of the UK may perceive some weather warnings to imply a lower threat to their local area than southern regions (DJS Research, 2014).This raises the concern that the threat implied by a warning may be underestimated when people perceive their region to be resilient to weather events.To empirically test whether this is the case for wind warnings this study purposefully compares responses across two regions: Yorkshire and Humber (Northeast of England), which is heavily exposed to winds from the north and east (Wheeler, 2013); and Greater London (Southeast of England), which has a milder climate, but is vulnerable to the impacts of strong winds due to high population density and infrastructure (Mayes, 2013).

Research questions
Based on the areas for investigation identified above, we use an experimental design to address the following research questions: 1. Does perceived likelihood, severity, concern, trust in warning, and behavioral intention differ across warning levels when Color-only wind warnings are presented?2. Does providing information about likelihood and impact affect the perceived likelihood, impact severity, and concern elicited by wind warnings?3. Does providing information about likelihood and impact severity affect trust in wind warnings?4. Does providing information about likelihood and potential impact severity affect behavioral intention for wind warnings? 5. Are there regional differences in wind warning response?

Participants
Between August 7 and August 9, 2019 an online experiment was conducted with 550 participants (female = 275) from Greater London (n = 275) and Yorkshire and Humber (n = 275).Ages ranged from 18 to 86 (mean = 40.2,SD = 14.6).Participants from the two focal regions were recruited from market research panels by Qualtrics Panels (cost = per participant £3.39).Participants received points exchangeable for rewards for their participation.Gender composition was broadly representative of the two regions, though the sample did skew younger than national and regional averages (Office for National Statistics, 2022) (see Supporting Information section for full demographic breakdown).

Design
The study used a mixed-factorial design with Location and Warning Format as between-groups factors.Warning level was a repeated measures variable.Participants were randomly assigned to one of three Format conditions (Figure 2): • Color-only: Statement that a wind warning of a particular color had been issued for participant's local area.• Text: Statement that a wind warning of a particular color had been issued for participant's local area, with text description of potential impact severity and likelihood.• Risk Matrix: Statement that a wind warning of a particular color had been issued for a participant's local area, with text description of potential impact severity and likelihood and a visual representation of the Met Office Risk Matrix.
In the Text and Risk Matrix conditions, participants were shown wind warnings for three warning levels (order randomized): Amber HPMI, amber LPHI event, and red HPMI.As participants in the Color-only condition did not receive any additional information about impact severity or likelihood, they were shown yellow, amber, and red warnings (order randomized).

Perceived risk
For each warning, slider scales of 0-100 were used to rate expected likelihood of strong wind (0 = impossible, 100 = certain), expected impact severity (0 = not severe at all, 100 = very severe), and expected concern (0 = not concerned at all, 100 = very concerned).

Trust in forecast
Participants indicated their trust in each warning using a 0-100 slider scale (0 = would not trust at all, 100 = would trust completely).

Behavioral intention
For each warning, participants indicated how likely they thought that they would be to engage in seven protective behaviors (check for warning updates, notify others, check on vulnerable others, be more cautious when traveling, avoid travel, leave work or study early, take physical action to protect property) on a five-point scale (1 = would definitely not do this, 5 = would definitely do this).Principal components analysis indicated that behaviors loaded onto a single component for all levels (Cronbach's alpha >0.88).
A "behavioral intention" score was created by taking the mean of all items (see Supporting Information section for information on individual behaviors).

Trust in the Met Office
Trust in Met Office was measured using the mean of an 11-item scale adapted from Earle and Cvetkovich (1995), presenting pairs of opposing descriptors (e.g., "not trustworthytrustworthy," "unreliable-reliable") on five-point scales (Cronbach's alpha = 0.91).
Although it would be conventional to use a mixed-factorial analysis to jointly assess the effect of within-groups and between-groups manipulations, it is not appropriate to do this here due to the fact that those in the Color-only condition were presented with yellow, amber, and red warnings, whereas those in the Risk Matrix and Text conditions were presented with amber HPMI, amber LPHI, and red HPHI warnings.Hence, although we present a mixed-factorial ANOVA assessing the interaction between Format and Level for those in the Text and Risk Matrix conditions, separate tests are run for comparisons involving the Color-only condition.

Warning level
The effect of warning level on perceived likelihood, impact severity, concern, trust in forecast, and behavioral intention is assessed using a repeated-measures ANOVA for the Color-only condition (comparing yellow, amber, and red) and a mixed-factorial ANOVAs for the Text and Risk Matrix conditions (comparing HPMI, LPHI, HPHI across formats).
Pitman-Morgan tests are used to compare variability of responses across each level.Using concern as a proxy for perceived personal threat, we assess the extent to which responses of participants in the Color-only condition are aligned with the intended ordinal nature of warnings (i.e., yellow < amber < red), and whether this differs for yellow versus amber than amber versus red, using Wilcoxon sign-rank texts.

Format and Location
Effect of Format and Location on perceptions of likelihood, impact severity, concern, trust in forecast, and behavioral intention are assessed using MANOVA tests for each warning level.We use Levene's test of equality of error variance to assess whether variability in responses differs across the Text and Risk Matrix conditions (Supporting Information section).

Predicting behavioral intention
Hierarchical ordinary least squares (OLS) regression is performed with behavioral intention as a dependent variable.Model 1 enters dummy variables representing Location (Yorkshire and Humber as baseline) and Format (Color-only as baseline).Model 2 adds perceived likelihood, severity, concern, trust in forecast, and trust in the Met Office.Mediation tests without covariates are performed to assess whether effects of Location and Format are mediated by concern.

Perceived likelihood, severity, concern, trust in warning, and behavioral intention across warning levels for Color-only warnings
Mean ratings of anticipated likelihood, severity, concern, trust in warning, and behavioral intention were highest for red warnings and lowest for yellow warnings, (Table 1).A repeated measures ANOVA with pairwise comparisons demonstrated that across all five dependent variables ratings were significantly higher for red than amber and amber than yellow (Table 1).Standard deviation tended to be higher for yellow warnings than amber and red, suggesting greater variability in responses at the lowest warning level.Pitman-Morgan tests indicated that variance was significantly greater for yellow than amber warnings when it came to perceived likelihood and trust, and greater for yellow than red for trust and behavioral intention.
For the majority of participants the order of reported concern aligned with warning level (Table 2).Few (12%) reported greater concern for yellow warnings than red.However, 26% reported greater concern for yellow than for amber.In comparing whether the frequency with which concern was aligned or not aligned with warning order we found that misalignment was significantly greater for yellow versus amber than yellow versus red (Z = −4.15,p < 0.001), and marginally significantly greater for yellow versus amber than amber versus red (Z = −1.93,p = 0.053).

Perceived likelihood, severity, concern, trust in warning, and behavioral intention across warning levels for Text and Risk Matrix warnings
Amongst those in the Text and Risk Matrix conditions mean ratings on all dependent variables were higher for HPHI than HPMI warnings, and for HPMI than LPHI warnings (Figure 3, Table 3).These differences were statistically significant (Table 4).As shown in Figure 3, ratings of perceived likelihood, severity, concern, and behavioral intention were higher in the Risk Matrix than Text condition for amber warnings (HPMI and LPHI) but not red (HPHI).Consistent with this we found a significant interaction between Format and warning level for each of these (Table 4).
Standard deviations tended to be wider for LPHI warnings than others (Table 3).Pairwise comparisons of equality of variance using Pitman-Morgan tests indicated that differences in variability reached statistical significance (p < 0.05) for perceived severity and concern in both conditions, and trust in forecast and behavioral intention in the Text condition.However, as these are repeated pairwise comparisons, the likelihood of a Type 1 error is inflated, meaning that caution is needed in interpreting these findings.For amber warnings standard deviations for the Text condition tended to be wider than those for the Risk Matrix condition (Table 3), Levene's tests indicate that this difference was significant (p < 0.05) for trust in forecast (HPMI and LPHI), likelihood (LPHI-only), and concern (HPMI-only) (see Supporting Information section).For red HPHI warnings standard devi-ations tended to be slightly narrower for the Text than Risk Matrix format, though this only reached significance at p < 0.05 for likelihood.

Effect of Warning Format and Location on perceived risk, trust in warning, and behavioral intention
Table 5 reports MANOVA tests examining the effect of Location and Format on perceived likelihood, severity, concern, trust in forecast, and behavioral intention for HPMI, LPHI, and HPHI warnings.Responses to the same Color-only amber warning are contrasted with the HPMI and LPHI amber warnings presented for the other conditions.Overall multivariate effects (Wilks Lambda) for Format and Location are reported, along with ANOVAs for each dependent measure, post hoc comparisons for Format and indication of the direction of Location effects (see Supporting Information section for descriptive statistics for Location).No effect of Warning Format was found for responses to red HPHI warnings (Table 5).For the amber HPMI warning the Risk Matrix elicited significantly higher ratings of perceived likelihood, severity, and concern than both the Color-only only and Text formats, and higher behavioral intention than the Text format.For the amber LPHI warning the Risk Matrix elicited greater perceived severity than other formats and greater concern than the Text format, whereas ratings of perceived likelihood and concern were higher for the Color-only than the Text format.
For amber HPMI and LPHI warnings, ratings of concern and behavioral intention were significantly higher amongst those in Greater London than Yorkshire and Humberside (Table 5).No difference in concern and behavior intention was found for red HPHI warnings, though those in Yorkshire and Humberside gave higher ratings of perceived likelihood and severity.
TA B L E 3 Ratings of perceived likelihood, severity, concern, trust, and behavioral intention for warning levels when probability and impact information was provided.

Predictors of behavioral intention
Stepwise OLS regression analyses examining the predictors of behavioral intention at each warning level are reported in predictor of behavioral intention in Model 1, although the association diminished in Model 2. Likewise, the Risk Matrix format was associated with greater behavioral intention for HPMI warnings in Model 1 but not Model 2. In Model 2 concern and trust in the forecast were strongly associated with behavioral intention, with trust in the Met Office making an additional contribution for the model for red HPHI warnings.
The outputs of Models 1 and 2 suggest that the association of Location and Format with behavioral intention was at least partially mediated by variables in Model 2, with earlier MANOVA analyses (Table 5) indicating Location and Format affect concern but not trust in forecast.Formal tests of indirect effects without covariates (PROCESS Model 4, Figure 4a-c) are consistent with the effects of Location and Format on behavioral intention being mediated by concern.Additional tests controlling for other Model 1 variables yielded the same pattern of findings (see Supporting Information section for full summary of models).

Does perceived likelihood, severity, concern, trust in warning, and behavioral intention differ across warning levels when Color-only warnings are presented?
Consistent with earlier work suggesting that the ordinal nature of risk implied by color-coded weather warnings is broadly well understood in the UK (Mu et al., 2018;Taylor et al., 2019), our analyses of responses to Color-only warnings indicated that mean ratings of perceived likelihood, severity, concern, and behavioral intention were highest for red warnings and lowest for yellow.However, a notable minority of participants reported higher concern for yellow than amber warnings.In some cases, this may simply be attributable to indifference or error.However, the fact that there were fewer cases where concern was higher for yellow than red does suggest that there may be some ambiguity when it comes to the distinction between the level of risk implied by yellow and amber.There was also some indication that responses to yellow warnings tended to be more variable than responses to amber and red warnings, again implying that there may be a lower consensus as to how yellow warnings should be interpreted.A possible explanation for this is that judgments are informed by prior experience of different warning levels.As yellow warnings can indicate a larger number of probability/impact combinations than amber warnings then the experience of the events following them may itself be more variable.In preceding years, participants are likely to have experienced a sizable number of yellow and amber warnings, and infrequent but highly publicized red warnings.This could align with exemplar-based categorization, where a stimulus is compared to stored exemplars of a category (e.g., Erickson & Kruschke, 1998), or the decision by sampling paradigm (Stewart et al., 2006), where experience of events constitutes natural "sampling" of frequency and consequences.However, further research would be needed identify the specific cognitive process(es) through which experience of different colored warning influence subsequent warning risk perception.
The finding that trust in forecast was highest for red warnings and lowest for yellow is consistent with earlier studies indicating that higher warning levels elicit greater trust (Losee & Joslyn, 2018;Taylor et al., 2019).As noted, it has been suggested that this may be due to more threatening situations eliciting a greater need to trust authorities (Jost et al., 2004;Losee & Joslyn, 2018).However, it is also possible that ratings of trust in a forecast reflect participants' confidence that an event will occur given the forecast, rather than beliefs about the quality and credibility of the information per se.Consistent with prior findings that trust in forecasts and trust in forecast providers appear to be related but distinct constructs (Taylor et al., 2019), we find that trust in the Met Office made a unique contribution to the prediction of behavioral intention for red HPHI warnings.
In contrast to earlier work, where no difference in behavioral intention was found between those exposed to yellow and amber wind warnings (Taylor et al., 2019), we found that amber warnings elicited greater intention to engage in protective behaviors than yellow warnings.Differences between these studies may be due to differences in levels of measurement used to record behavioral intention.In Taylor et al. (2019) behavioral intention was coded on a binary scale (Action Taken vs.No Action Taken).Here, participants reported the likelihood of undertaking a range of different actions, potentially capturing greater nuance in intention, and the fact that a protective response may take the form of remaining informed or informing others as well as taking physical actions.Nonetheless, differences in reported behavioral intention for yellow and amber warnings were notably smaller than those between red and amber warnings.This may be attributable to there being a higher number of people who rated yellow warnings as more concerning than amber versus people who rated amber as being more concerning that red.However, it may also reflect greater reported trust associated with red warnings, as well as the comparative rarity and salience of these events (i.e., with red warnings typically being issued no more than twice a year and receiving high media attention).Red warnings may also be interpreted F I G U R E 4 (a-c) Tests of direct and indirect effects of Location and Format on behavioral intention.Note: *p < 0.05, **p < 0.01, ***p < 0.001, β = unstandardized regression coefficient.Total effect of independent variable on behavioral intention is crossed out and reported next to direct effect of independent variable, with indirect effects below these (see Supporting Information section for further statistical details).
as a call to take action in line with phrasing associated with color-coded warnings in earlier UK weather warning messaging (e.g., Neal et al., 2014), which is still used in online Met Office materials describing what a red warning implies (Met Office, 2023).
Taken together these findings demonstrate that without additional information about likelihood and impact severity, red warnings tend to elicit the greatest perceived, trust and willingness to undertake protective behaviors and yellow the lowest.Red warnings, therefore, appear to provide a strong signal that a behavioral response in needed.By contrast, there is some indication that yellow warnings may be perceived as more ambiguous, with higher variability in responses and a notable minority of participants perceiving them to be more concerning than amber warnings.

4.2
Does providing information about likelihood and impact affect the perceived likelihood, impact severity, and concern elicited by warnings?
In comparing responses across HPMI, LPHI, and HPHI warnings in the Text and Risk Matrix conditions, we find that HPHI warnings consistently elicited higher ratings of perceived likelihood, severity, and concern than both HPMI and LPHI warnings.As expected, LPHI warnings elicited lower ratings of likelihood than HPMI warnings.However, they also elicited lower ratings of concern, indicating that the possibility of severe impacts did not lead participants to neglect probabilities.It may be that, for UK residents, the prospect of strong winds associated with amber warnings, which will have been experienced by participants, does not elicit dread in the same way as the as the events addressed by Sunstein (2003).However, the fact that HPHI warnings elicited greater perceived likelihood than HPMI warnings and potential severity than LPHI warnings in the Risk Matrix condition, despite having the same position on the likelihood and impact axis respectively (Figure 2), does suggest that red warnings may have particularly strong affective salience that leads other information to be disregarded or rendered redundant (Leonard, 1999;Silic & Cyr, 2016).This would also align with the salience theory of decision making under risk, which holds that where attention is directed toward one dimension of an outcome (in this case the color red), it may be overweighted relative to less salient outcomes (Bordalo et al., 2012).Indeed, we see that for red (HPHI) warnings responses on all dependent measures were very similar across the Color-only, Text, and Risk Matrix conditions, suggesting that they are always perceived to denote HPHI events irrespective of accompanying information.As the Met Office typically issue no more than two red weather warnings in a year (Met Office, 2021b), and associated events tend to be particularly serious and receive coverage in the news media, this may raise both the cognitive availability and affective salience associated with red warnings.
In considering whether LPHI weather events may be treated in accordance with prospect theory's weighting and editing functions, as suggested by (Botzen & van den Bergh, 2012;Robinson & Botzen, 2019) (i.e., with some underweighting small probabilities and others overweighting them), our analysis does provide some indication that LPHI warnings elicit more variable ratings of concern and severity than other warnings, though no difference in variability was found for perceive likelihood itself.This suggests that there is indeed less consistency in how LPHI warnings are interpreted and responded to.However, it must be noted these differences were relatively small, and whereas the pattern was consistent, the use of repeated pairwise tests did inflate the chance of a Type 1 error.
Although format was not found to affect responses to red warnings, a more complex picture emerges for amber warnings.For perceived likelihood, effects tended to be in the expected directions, with ratings being significantly higher for the Risk Matrix condition than the Color-only condition for HPMI warnings, and significantly lower for the Text versus Color-only condition for LPHI warnings.However, when it came to perceived severity and concern the Risk Matrix elicited the highest ratings across both HPMI and LPHI warnings, whereas Text elicited the lowest.This would seem to indicate that the Risk Matrix increased the affective salience of potential impact severity and associated concern, whereas Text with color attenuated this.As noted, earlier work has highlighted that there may be high variability in how ver-bal probabilities are interpreted (Budescu et al., 2009;Dhami & Mandel, 2021;Harris & Corner, 2011).Consistent with this, we found some indication that for amber warnings there was greater variability in reported concern amongst those presented with Text versus the Risk Matrix.However, this did not reach statistical significance for severity.It may be the case that the phrasing used in the Text condition was interpreted as ambiguous when not accompanied by the Risk Matrix.That is to say that stating that "there is a low likelihood of severe impacts" may have inadvertently downplayed the possibility of severe impact rather than emphasizing that they may occur Together, this suggests that further work exploring the use of different verbal expressions to capture different magnitudes of probability and impact could be beneficial in examining whether there are better ways to align verbal statements with intended meaning.

Does providing information about likelihood and potential severity affect trust in warnings?
As was the case for Color-only warnings, we find that trust in forecast was higher for red warnings than either of the amber warnings for the Text and Risk Matrix.Again, this is consistent with earlier studies indicating that more severe warnings elicit greater trust (e.g., Losee & Joslyn, 2018;Taylor et al., 2019).However, in contrast to earlier work suggesting that providing uncertainty information can increase trust (Joslyn & LeClerc, 2012, 2016) we did not find that providing information about likelihood increased trust in forecast relative to the Color-only condition.Nonetheless, it should be kept in mind that trust in forecast is distinct from trust in forecast providers (Taylor et al., 2019).Here we find that, for HPHI warnings, trust in the Met Office and trust in the forecast provide unique contributions to the prediction of behavioral intention.We therefore postulate that in this case ratings of trust in forecast reflect confidence that an event will occur rather than beliefs about the quality of the information or forecast providers.Indeed, in the broader risk communication literature a conceptual distinction has been drawn between trust and confidence; with the former reflecting perceptions of others' values, and the latter performance (e.g., Siegrist et al., 2003Siegrist et al., , 2005)).Consistent with this, we found that for those provided with likelihood information, trust in forecast was higher for HPMI warnings than LPHI warnings.
Together, our findings show that indicating that weather impacts are likely evokes greater trust in forecasts.Although low "trust in warning" does not necessarily imply low "trust in warning providers," it may contribute to a discounting of warning messages and limit intention to act.As taking protective action in vain can have financial and nonfinancial costs, it is not necessarily incorrect that recipients report low intention to act on low likelihood warnings.However, there may be instances where some form of behavioral response to an unlikely but potentially very severe event would be desirable.As LPHI warnings tend to be issued at longer lead times and then upgraded to HPHI or downgraded to "no warning" as lead times decrease, advisable behaviors might involve monitoring weather forecasts for updates or preparing to change plans if needed.In communicating with the public about LPHI events, operational forecasting services should be aware of this, potentially stressing the steps that it is a good idea to take when severe events are possible if unlikely.

4.4
Does providing information about likelihood and potential severity affect behavioral intention for wind warnings?
In comparing behavioral intention across warning levels, we find that in both the Text and Risk Matrix conditions, behavioral intention was highest for HPHI and lowest for LPHI.The fact that behavioral intention for HPMI warnings was greater than that for LPHI further demonstrates that where probabilistic information is provided for warning warnings it is not neglected, and that this probabilistic information informed behavioral intention.However, when behavioral intention for LPHI and HPMI warnings in Text and Risk Matrix format were compared with Color-only amber warnings, differences did not reach statistical significance.Hence, although behavioral intention may differ when probabilities are explicitly stated to be high versus low, this does necessarily lead to differences in intention from a "no information" scenario.For HPMI warnings, behavioral intention was slightly higher for the Risk Matrix condition than Text condition, with evidence that this is mediated by concern (i.e., with the Risk Matrix eliciting greater concern which in turn lead to higher behavioral intention).Indeed, in keeping with earlier work (Taylor et al., 2019), we found that concern was the strongest predictor of behavioral intention across all warning levels, with trust in forecast also making a consistent contribution.
Based on our findings, we cannot conclusively say that one Warning Format is more effective in prompting protective behavioral intention others.However, where probabilistic information is provided, higher likelihood events correspond with greater behavioral intention.It is therefore important that operational forecasting services be aware that stating that an event is low probability may limit willingness to take precautionary action.As noted, this may not in itself be maladaptive given the costs of taking certain actions in vain.However, it could be useful for forecast communicators to highlight the precautionary behaviors that would be recommended under these conditions.

4.5
Are there regional differences in warning response?
For red HPHI warnings, no effect of location on concern or behavioral intention was found.For amber HPMI and LPHI warnings, however, concern and behavioral intention were higher for those in Greater London than Yorkshire and Humber, with concern mediating the effect on behavioral intention.Although this is not in itself conclusive proof that those in Yorkshire and Humber perceive strong winds to be less threatening, it is consistent with anecdotal data suggesting that some in northern regions may perceive their area to be more resilient to winter weather impacts than those in the south due to experiencing these events with higher frequency and intensity (DJS Research, 2014).Indeed, this would align with findings in the broader field of risk research indicating that hazards perceived as more familiar can evoke lower perceived risk (e.g., Shavit et al., 2016).As noted, prior work shows that many people in the UK are unaware that warnings take regional vulnerabilities into account (Taylor et al., 2019).Hence, it is possible that warning recipients apply intuitive recalibration to warnings based on perceived vulnerability/resilience of their own location, unaware that this has already been incorporated into the warning.At a practical level, this suggests that more needs to be done to inform public audiences that warnings are regionally calibrated.

Limitations and future directions
Although experimental studies have the benefit of permitting randomization and standardization of messaging, a limitation of the approach is that it captures anticipated rather than actual behavior.Tracking behavioral responses to warnings in "real time" could therefore be a logical next step in this area.Indeed, recent Swiss work by Weyrich et al. (2020) has demonstrated the potential of using mobile apps for collecting data on forecast responses.Another possibility would be to longitudinally analyze social media responses to weather warnings across the duration of severe weather events for indicators of engagement with weather informational, as well as cognitive, emotional, and behavioral responses during those events (Domingos et al., 2020;Gaspar et al., 2019Gaspar et al., , 2021)).As seasonality may influence the affective salience of severe weather (Bruine de Bruin et al., 2016;Lefevre et al., 2015), the fact that this study was conducted in the UK summer may have reduced the salience of strong winds, typically associated with winter storms.Longitudinal studies may therefore be useful for capturing the effects of seasonality, as well as enabling comparison of responses to different types of weather event.As noted in Section 4.2, our finding that the Text condition seemed to attenuate concern when it came to LPHI events may have been due to the phrasing chosen.This highlights a critical need for further research on the effect of different types of phrasing to describe probability and impact to ensure that interpretations are aligned with communicators' intentions.

CONCLUSIONS AND IMPLICATIONS
This study has several implications for operational weather forecasting in the UK.Most positively, we find that red warnings are highly trusted, evoking high concern, and behavioral intention, with responses being unaffected by region or additional probabilistic information.Hence, in cases where severe impacts are imminent, there is likely to be high public responsiveness to warnings.As noted, this may be attributable to the fact that red warnings can only take on one meaning (high probability of severe impacts), with experience of these events reflecting this.By contrast, amber and yellow warnings-whereas issued more frequently than red warnings-can take on multiple meanings, which are reflected in experience.Indeed, we find responses to yellow warnings, which can take on the greatest number of meanings, tend to have greater variance.
Our findings show that stating severe events are possible but unlikely, diminishes perceived concern, trust in forecast, and intention to act.Care therefore needs to be taken around messaging about such events to avoid risks being discounted, while also recognizing that it may be maladaptive to undertake protective actions carrying high personal cost in response to them.One strategy may be to provide narratives about a "most likely" scenario, while highlighting that more severe conditions are a possibility and explicitly recommend that people should keep abreast of weather updates, as uncertainty regarding whether an event will occur or not tends to diminish as forecast lead times become shorter.Nonetheless, it does highlight the need for more research on how LPHI events can be effectively communicated to avoid both disproportionate worry and dismissal of the risk, especially around the vocabulary that should be used in doing this.Regional differences in concern and behavioral intention between Greater London and Yorkshire and Humber also highlight the need to convey that weather warnings are already regionally calibrated.
In terms of Warning Format, our findings do not allow for clear and unambiguous recommendation about how information about severity and likelihood should be presented.However, for amber warnings we can state that risk matrices appear to increase perceived likelihood, potential severity, and concern, irrespective of where the tick in the matrix is.Whether this heightened perception of risk is desirable or not may be situationally dependent.Ultimately this work provides a step toward better understanding of how people perceive and respond to impact-based weather warnings, and how this may be affected by additional information about likelihood and impact severity.Further exploration of how different approaches to phrasing and visualizations affect response will be important for informing the future development of weather warning communication in the UK.

A C K N O W L E D G M E N T S
This work was supported by internal funding from Leeds University Business School's Small Grant Scheme.We thank the World Meteorological Organization's High Impact Weather (HIWeather) project for facilitating this collaboration and providing intellectual support for this study.

F
I G U R E 1 Example of the Met Office Risk Matrix for weather warnings.

F
Description of each Warning Format condition.Note that the Text condition contains all of the information from the Color-only condition plus descriptive text about likelihood and impact severity, whereas the Risk Matrix condition contains all of the information from the Text condition plus a Risk Matrix visualization.

F
Mean ratings of perceived likelihood, severity, concern, trust in forecast, and behavioral intention across each warning level and Format condition.Note: Error bars represent 95% confidence interval.*For the Color-only condition responses are identical for amber HPMI and amber LPHI warnings as these participants saw only one amber warning.
Ratings of perceived likelihood, severity, concern, trust, and behavioral intention for yellow, amber, and red warnings in the Color-only condition (n = 184).
TA B L E 1Abbreviations: A, amber, R, red; Y, yellow.*Significantatp < 0.05.**Significant at p < 0.01.***Significant at p < 0.001.TA B L E 2Frequency with which ratings of concern aligned or did not align with warning level order of yellow < amber < red.

Table 6
Effect of Location and Warning Format on perceived likelihood, severity, concern, trust, and behavioral intention across warning level (MANOVA with Bonferroni post hoc tests).
(see Supporting Information section for simple bivariate correlations).Model 1, where only Format and Location were entered, was significant for HPMI and LPHI warnings but overall variance accounted for was small (∼3.3%).Model 2, which added perceived likelihood, severity, concern, trust in forecast, and trust in the Met Office, accounted for substantially greater variance (>34% for all warning levels).For HPMI and LPHI warnings Location was a significant if smallTA B L E 5Abbreviations: CO, Color-only; GL, Greater London; HPHI, high-probability/high-impact; HPMI, high-probability/moderate-impact; LPHI, low-probability/high-impact; RM, Risk Matrix; T, Text; Y&H, Yorkshire and Humber.*Significant at p < 0.05.**significant at p < 0.01.***Significant at p < 0.001.
Ordinary least squares (OLS) regression examining the predictors of behavioral intention at each warning level.