How do operational meteorologists perceive model performance for elevated convection?

Operational Meteorologists (OMs) in the Met Office have a perception that elevated convection is not well represented in kilometre‐scale models, which are generally associated with an improved representation of convection. Here, we consider why there may be a problem with representing elevated convection and consider how OMs judge the model to be poor so often. Three OMs have subjectively scored and classified observed elevated convection cases over the UK from 2017 to 2020. Continental plumes (warm, moist, air coming from the near continent or Africa) account for 73% of the cases. The most frequent errors are associated with (i) location, (ii) organisation, (iii) timing and (iv) intensity of the convection. Thus, OMs perceive that the biggest problem with predicting elevated convection is constraining the location of the convective events. The location errors are particularly prevalent for events coming to the UK from the near continent. The location errors are most frequently identified for flow regimes coming from the near continent in weakly forced synoptic conditions. The identification of this problem enables the specific targeting of research into continental plumes (for UK elevated convection) but also raises questions around the role of lateral boundary conditions in the forecasts of elevated convection.

presented is somewhat misleading and open to interpretation (e.g., Corfidi et al., 2008).The debate around the scientific definition of elevated convection continues within the literature.For example, Corfidi et al. (2008) states that convection should be treated as a continuum between 'purely' surface-based and 'purely' elevated, while Marsham et al. (2011) and Nowotarski et al. (2011) believe that should any air originating from the surface be ingested into the storm it should not be considered as elevated.At the present time, the Berry et al., 1945 definition remains within meteorological glossaries (e.g., AMS, 2022, last accessed 27 September 2023), so this definition is used throughout forecasting procedures.
Part of this lack of definition may be due to the wide variety of formation mechanisms that elevated convection can be linked to.For example, pristine initiation (Wilson et al., 2018), bores and density currents (Haghi et al., 2017), frontal overrunning (Moore et al., 2003), pre-existing convection (Lima de Figueiredo et al., 2019), low-level jet termination (Zhang et al., 2019), transition from surface-based convection (Corfidi et al., 2008) and different forcing compared with surface-based convection (Wilson & Roberts, 2006).These different mechanisms have led to a lack of a unified diagnostic that can identify elevated convection more objectively (e.g., Flack et al., 2023).Therefore, in research on elevated convection, there becomes a reliance on either (i) subjective identification of elevated convection, (ii) specific mechanisms or (iii) case studies.
Several studies have created type-specific climatologies of elevated convection.These studies indicate greater risk of increased hail, increased positive lightning, and increased precipitation accumulations for both nocturnal and frontal overrunning convections (Colman, 1990;Horgan et al., 2007;Reif & Bluestein, 2017) but also as a comparison with surface-based convection (Kastman et al. 2017) showing why OMs tend to split convection into surface-based and elevated.On the other hand, some studies have taken a process-based route looking into sensitivities to instability and moisture through idealized studies (Schumacher, 2015) or detailed examination of case studies indicating a wide range of behaviours and mechanisms (Browning et al., 2012;He et al., 2018;Marsham et al., 2011;Wilson et al., 2018).These process-based studies argue more for the spectrum definition to improve understanding of the differences between surface-based and elevated convections rather than to treat them separately.
There is a wide range of behaviour within the models compared with reality with some examined having poor performance (e.g., White et al., 2016), while some cases show a reasonable approximation with apparent high predictability (e.g., Zhang et al., 2019).However, weather and climate models show poor representation of nocturnal convection (e.g., Becker et al., 2021;Tang et al., 2021).
While the climate runs show systematic trends, it is hard to determine if case studies show systematic behaviours.The plains elevated convection at night (PECAN) field campaign (Geerts et al., 2017) offered an opportunity for verification of multiple elevated convection events to indicate if there was any systematic behaviour present.Both Stelten and Gallus (2017) and Weckwerth et al. (2019) showed that elevated convection events during PECAN had a low probability of detection within forecasts and that model performance somewhat depended on the formation mechanism: convection initiated by pre-existing convection performed better than pristine environment elevated convection.Anecdotally, OMs indicate that elevated convection is the most common situation in which 'poor model performance' occurs during the summer and mention similar problems to those identified in PECAN and the previous studies mentioned (Met Office Guidance Unit; personal communication 2020) supporting these assertions.
The comments from OMs, including from those within the Met Office Guidance Unit (who are responsible for the 'weather story' and issuing of weather warnings), tend to focus on regional kilometre-scale models, in which theory should have more realistic convection (e.g., Clark et al., 2016) but is known to still have problems (e.g., Stein et al., 2015).Thus, given the potential hazards and indication of a problem, it is useful for the community to understand what is meant by OMs when they indicate that there is a problem with the representation of elevated convection.
A priori the comments from OMs are not surprising.The representation of elevated convection is complex, often with many mechanisms happening simultaneously leading to its formation (e.g., Parsons et al., 2019) or its transition between surface-based and elevated convections, or vice versa (e.g., Parker, 2021).To help the research community understand the challenges faced by OMs, a close partnership between OMs and model developers has been established at the Met Office.The partnership aims to identify the 'elevated convection prediction problem' in regional models to understand what is meant by 'poor model performance'.The idea is to lead to model improvements that will be detectable during operations.Therefore, the work presented here is preliminary work based on the Berry et al. (1945) definition of elevated convection.
Specifically, we focus on the following questions: • What do OMs mean by poor model performance for elevated convection?• When do these poor forecasts occur?
This article is set out as follows: the cases, classification, model and forecast scores are described in Section 2; results are discussed in Section 3; and conclusions are drawn in Section 4.

| METHODS
The research questions are considered directly from an operational perspective for a selection of elevated convection cases.The cases are identified from observations (Section 2.1).The cases are then classified (Section 2.2), and the model (Section 2.3) is subjectively scored against observations (Section 2.4).

| Case identification
Convective events over the British Isles between April and October during 2017-2020 are subjectively identified from radar and satellite imagery and classified into whether they are elevated at some point within their life cycle.The cases are confirmed to be elevated within their life cycle by the analysis of tephigrams, aircraft meteorological data relays, radar and satellite imagery.The first two observation sets are used to determine if there are any elevated mixed layers that are not in contact with the surface using standard parcel theory.The radar and satellite imagery are used to help identify (i) if precipitation occurred and (ii) to gauge an idea on the influence of surface fluxes depending on the time of day and cloud cover.
Eighty-five elevated convection cases were identified with 51 having weather warnings.The warned cases are considered further in the analysis.

| Case classification
The 51 cases are subjectively classified into regimes based on (i) large-scale forcing, (ii) source region of the elevated air mass and (iii) the life cycle.The classifications are based on observations and model analyses.However, where the observations or analyses are not available, the climate forecasting system (CFS) reanalysis (Saha et al., 2010; available from: https://www.wetterzentrale.de/en/reanalysis.php?map=1&model=cfsr&var=2, last accessed 9 March 2023) has been used.

| Large-scale forcing
Large-scale forcing is categorized into five levels from weakly to strongly forced.This categorization is based on large-scale patterns in the 500 hPa geopotential height and the presence of forcing from 300 hPa potential (and relative) vorticity signatures.Strong cyclonic curvature coupled with cyclonic vorticity advection into the region will lead to the classification as a strongly forced regime.On the other hand, anticyclonic curvature, or an anticyclonic centre, with zero to negative vorticity advection will lead to the classification as a weakly forced regime.This classification uses model analyses and satellite imagery (including water vapour channels).

| Source of elevated air mass
The elevated air mass associated with the convection has been identified as a region of warm, moist unstable air (hereafter, a plume) using the 850-hPa temperature of the CFS reanalysis.The source regions were identified using reanalysis charts for the preceding days to subjectively determine where the air mass originated.The source regions were defined as Atlantic, continental, homegrown, or a mixture.The identification was, primarily, based on the direction that the plume came from.Atlantic plumes predominantly came from westerly sectors and had no (to limited) interaction with land; continental plumes were from southerly and easterly sectors, thus encompassing the three environments in which Mesoscale Convective Systems tend to occur in the UK described in detail by Lewis and Gray (2010).Extra attention was paid to plumes from a southwesterly direction.If there had been a wider Atlantic influence, these were classified as Atlantic cases, whereas if there was stronger continental influence from North Africa, the Iberian Peninsula, or north-west France, they were described as continental.Homegrown events were from near stationary plumes that persisted over multiple days.Multiple days were required as the UK environment needed to modify the characteristics of the air mass.

| Convective life cycle
The final classification was based on the life cycle of the convective events: 'purely' elevated (cases remaining elevated throughout their life cycle), elevated-to-surface transition, surface-to-elevated transition and 'grey-zone' cases (cases that were unclear as to which category they should sit).The identification of the life cycle follows 'current practices' outlined in Flack et al. (2023).These practices use available tephigrams along the convective tracks to determine the presence of instability (surface-based or elevated).Furthermore, analysis of pressure, geopotential height, wind, temperature and divergence fields alongside local operational and research knowledge of expected behaviour of the convection are used to supplement the ascents.

| Model description
The operational kilometre-scale configuration of the Met Office Unified Model (UM) over the UK was used: the United Kingdom Variable (UKV) resolution configuration (Tang et al., 2013).The UKV is a deterministic model with an interior grid length of 1.5 km.During 2017-2020, seven operational suites (OS) defined the settings of the model: OS38-OS44.The science configuration is defined by the regional atmosphere and land (RAL) settings.Until July 2017, it was defined by RAL0 (Bush et al., 2020).Afterward, RAL1 for the midlatitudes (Bush et al., 2020) became operational until December 2019.Finally, the RAL2 midlatitude (M) configuration (Bush et al., 2023) defined the settings until the end of the period.There are minimal differences between RAL1M and RAL2M, with most being related to incremental changes (Bush et al., 2023).

| Case scoring
Three OMs, from a cross-section of different forecasting applications across the Met Office who have indicated the 'elevated convection prediction problem,' subjectively verified the UKV precipitation field against an equivalent radar-derived rainfall product.This subjective verification is based solely on the model field and not preconceptions or assessments of warning quality.
Seven scores were used to understand different aspects of the forecast.Table 1 describes the scores, and guidance, for the different components.Subjective scoring has been the basis of this work because of sparse observations allowing identification of elevated convection in national observing networks.It allows the focus of the verification to be on the elevated convection events identified by the OMs.The focus on cases identified by OMs allows a specific focus on the first scientific question we aim to answer and has distinct advantages over more objective approaches (e.g., Jahedi & Méndez, 2014;Seshadrinathan et al., 2010).Subjective methods are particularly appropriate here given the lack of a consensus on appropriate diagnostics to identify elevated convection.
Three sets of forecasts were considered in which convective initiation occurs between 3 and 8 h, 15 and 20 h, and 27 and 32 h, respectively.These time ranges are chosen based on their importance for the issuing and escalation of severe weather warnings for convective events; the time ranges will also allow the variation of the forecasts to be considered.The variation between the forecasts allows the OMs to make a judgment on the consistency of the forecasts, and as such represents a measure of confidence over time.This consistency was measured through a score of unity for a consistent forecast (one that led to a repeated message for the convection) or zero for an inconsistent forecast (one that gave mixed messages in terms of presence, amount or location of the convection).

| RESULTS
The observational context (Section 3.1) and model scores (Section 3.2) are initially considered separately.These T A B L E 1 A summary of the subjective scores, and their guidance, used to assess the performance of elevated convection in the model compared with radar observations.

Fragmentation in model and observations agree
Note: A score of 0 is the worst and a score of 3 is the best.
factors are then combined to help determine when the model performance tends to be poor (Section 3.3).

| Observational context
The observational context is required to understand factors that could bias the interpretation of the subjective scores.For example, if one of the classifications occurred more frequently, there is a chance that more forecast busts could occur because of the increased frequency.
There is a dominance of cases being associated with continental plumes (73%) compared with Atlantic plumes (18%; Figure 1a).Furthermore, weaker forcing tends to occur more frequently in continental plumes; stronger forcing is more frequent in Atlantic plumes.This result agrees with climatologies linking convection and forcing over the UK with weakly forced events having a stronger association with winds from a southerly or easterly sector (e.g., Flack et al., 2016).
The dominant large-scale forcing has strengths of strong/moderate to moderate (Figure 1b).The life cycles are, relatively, evenly split across these two dominant forcing situations.There are reduced cases for the other forcing categories, and as such, those results may not be representative but are included for completeness.

| Scores
The relative frequency histograms of the scores by the three OMs indicate certain factors are viewed as poorer compared with others (Figure 2).Specifically, the poorest scores occur for the position, envelope, fragmentation and dissipation.The best scores are for the peak intensity.The distributions for dissipation, convective mode (e.g., the structure of the storms) and the fragmentation are all skewed towards lower scores.On the other hand, the position and envelope are more normal in their distribution (with skewness less than 0.1), suggesting that the positions of the convective events are generally displaced within a county or warning-region scale (Table 1).The scores tend to degrade with forecast lead time (not shown).
The theme with the lowest overall scores is that of location, followed by organisation (Figure 2).Therefore, when an OM refers to a poor forecast of elevated convection, the main error is the location of the event.
The four errors identified here are known problems for convection in kilometre-scale models having been shown multiple times before for location and intensity in Stein et al. (2015), organization in Clark et al. (2016) and organisation and timing in Roberts et al. (2023).
A further aspect to consider is the forecast consistency with lead time.The forecast consistency score (not shown) indicated that, on average, 35% of the forecasts were viewed as inconsistent (i.e., did not converge F I G U R E 2 Relative frequency histograms for the seven different scores.The black solid line represents the score for the position of the event, the black dashed line represents the score for the convective envelope, the blue line represents the intensity, the red solid line represents the score for convective initiation, the red dashed line represents the score for convective dissipation, the purple solid line represents the score for convective mode (e.g., storm structure) and the purple dashed line represents the score for convective fragmentation.The scores are described in Table 1.
showing the Spearman's correlation between all factors examined.The different sources of plumes have been given a value from 1 to 5, with 1 representing homegrown events, three representing Atlantic plumes, 5 Continental plumes and values of 2 and 4 representing the appropriate mixed plumes.The large-scale forcing has also been quantified to values between 1 and 5 (weak to strong forcing).The life cycle has also been quantified with a value for 0 to 3: 0 represents 'grey-zone' convection (convection that does not distinctly fit in any of the regimes), 1 is for surfaceto-elevated transitions, 2 is for elevatedto-surface transitions and 3 is for 'purely' elevated events.'L-S' refers to large-scale and 'Fragment' refers to the fragmentation score.Correlations that are significant at the 95% significance level have been shown.towards a common solution).The lack of forecast consistency reduces confidence in the model as it is not converging towards a common solution and, over time, can erode trust in the model's ability to predict elevated convection.This erosion of trust is a known psychological trait, as seen before in Burgeno and Joslyn (2020).The consistency score tends to be focused on the position and timing of the convection (Figure 3).Therefore, these two factors are viewed as important for generating a consistent message.

| Relationships between scores and observational context
Understanding the context behind the scores and the scores themselves allows further investigation into the situations where the poor forecasts occur.To make inferences from the data, all scores and categorizations have been tested for associations (Figure 3).
All scores show a positive correlation with each other (except for intensity and initiation), indicating that if one aspect of a forecast does well, all the other aspects tend to show positive scores (Figure 3).While a useful insight into the scores, it does not answer the second question posed in this article.Therefore, the correlations between the scores and the regime classifications (life cycle, largescale forcing and plume source) are considered.
The convective life cycle is negatively correlated with intensity (Figure 3).This correlation implies that the more the elevated influence on the storm, the worst the intensity forecast is.The poor scores for more elevated systems could be linked to the forecasts being for the surface and the impact of evaporation of precipitation combined with the fact that the radar does not measure precipitation at the surface.The life cycle is also associated with the fragmentation, suggesting that the model tends to organise elevated systems better than surfaceto-elevated transitions.
Large-scale forcing indicates positive correlations with the scores for dissipation, envelope and consistency (Figure 3).This correlation suggests that forecasts are better for strongly forced situations compared with weakly forced situations for elevated convection.This relationship is well known for convection and convective predictability, when it comes to the convective envelope having been shown by Done et al. (2006), Keil and Craig (2011), Keil et al. (2014) and Flack et al. (2018).All these studies indicate that the area-averaged rainfall is more predictable, across ensemble members, for strongly forced regimes compared with weakly forced regimes.
Finally, the source of the elevated air mass (plume source) indicates improved forecast scores for homegrown and Atlantic plumes compared with continental plumes for position, intensity, convective envelope and convective mode (negatively correlated in Figure 3).Part of this relationship will be linked to the large-scale forcing (weaker forcing is generally associated with continental plumes, Figure 3).However, it is plausible that the multiple changes in surface-type across the English Channel region could have an influence on the convection or how it propagates or advects towards the British Isles during continental plumes.For example, Norris et al. (2013) showed that the English Channel had an impact on the representation of convective snowbands.

| CONCLUSIONS
Elevated convection is perceived to be a difficult forecasting challenge, and numerical weather prediction models tend to do poorly in its representation (e.g., Stelten & Gallus, 2017;Weckwerth et al., 2019).However, to date, there has not been an understanding of what the expert users of the model (i.e., the OMs) perceive to be the main problems associated with the prediction of elevated convection.Therefore, we have investigated forecast performance of 85 cases identified as elevated convection by OMs over the British Isles during the warm seasons of 2017-2020.The specific questions we focus on answering are as follows: • What do OMs mean by poor model performance for elevated convection?• When do these poor forecasts occur?From subjective scoring by three OMs, the answers to these questions were determined to be as follows.
• Poor location (position and convective envelope) and organization (convective mode and fragmentation) of the convective events readily identify poor model performance (Figure 2).• Poor forecasts tend to occur most frequently over the British Isles during continental regimes and under weak large-scale forcing (Figure 3).
The sample used here is small.However, given the cross section of applications covered by the three OMs scoring the forecasts, there is enough confidence in this preliminary work to inform the direction of focus for model developers.Further confidence can be gained from comparison of these results with the current literature where the problems identified from more objective verification measures have been identified as the location (Stelten & Gallus, 2017) and lack of organisation (White et al., 2016).However, unlike the literature (that focused on elevated initiation), the timing of the convection (while still a problem) was not considered the biggest problem by OMs.
The answer to the second question, while potentially being biased by the frequency of continental plumes, provides statistically significant results at the 95% confidence level and has agreement in the literature around the forcing aspects in which reduced predictability is shown for weakly forced events (e.g., Done et al., 2006;Keil & Craig, 2011).
From an operational perspective, for UK audiences, we have identified a situation for which OMs need to take extra care when it comes to forecasting convective behaviour: continental plumes in weakly forced situations.However, there are also implications from a regional modelling perspective.The model development implications focus on the role of the lateral boundary conditions, the influence and representation of elevated continental plumes in terms of stability and moisture, and how convection (that could potentially form on the near-continent) reaches/advects into the British Isles.Public perception could also be an interesting avenue to investigate; however, because of the technical nature of the specifics of elevated convection, it could be a greater challenge understand the public perception for this topic.
The representation of convection, as a whole, is challenging (e.g., Clark et al., 2016).The research to put the 'elevated convection prediction problem' identified here in the context of the convective spectrum is ongoing and of paramount importance to help improve all convective forecasts.

F
I G U R E 1 A summary of the categorizations and some of their relations for (a) the frequency of the source regions coloured by the large-scale forcing and (b) the frequency of the different strengths of large-scale forcing coloured by the elevated life cycle.The x-axis in each of the plots provided shortened names for the different categories.In (a), Cont.refers to continental plumes, Atl.refers to Atlantic plumes, Mix.refers to mixed plumes and HG refers to homegrown plumes; whereas in (b), S is strong forcing, S/M is strong/moderate forcing, M is moderate forcing and M/W is moderate/weak forcing.