Thunderstorm tracking in Northwest Europe for enhanced hazard preparedness

The tracking of thunderstorms provides critical information on their frequency and behaviour for early warning, prediction and preparedness. Thunderstorm tracking has previously been constrained by the boundaries of a particular country or has focused on a particular category such as severe thunderstorms and mesoscale convective systems. However, less severe thunderstorms also pose a risk to life and property and occur more frequently and thus warrant inclusion in investigations of thunderstorm behaviour. In this paper, we present a new thunderstorm event catalogue, including all detected thunderstorms with at least three lightning flashes, derived from a tracking methodology applied to Northwest Europe within the bounds of 48° to 65°N and 15°W to 10°E between 2008 and 2018. The catalogue is based on ATDnet lightning flash data which was clustered into thunderstorms using a spatio‐temporal proximity assessment lightning clustering code written in the R coding language. The thunderstorm lightning clusters enabled the production of thunderstorm behaviour statistics such as speed, direction of movement, lightning flashes per minute and duration. This revealed that in winter, thunderstorms are shorter lived and move faster than in summer as well as more typically tracking from the west rather than from the south‐west. Thunderstorm behaviour characteristics were attributed to weather pattern types for the first time, providing probabilistic data that can be attributed to synoptic conditions. This can improve preparedness and early warning. Such results demonstrate how this new thunderstorm event catalogue, which includes all types and severities of thunderstorms unconstrained by political boundaries, can provide additional important information to enhance understanding of thunderstorm behaviour in the region. Our lightning clustering method may be useful for similar studies in other locales.


| INTRODUCTION
Thunderstorms produce severe weather such as lightning, heavy rainfall, hail, thunder-snow, strong winds and tornadoes (Meyer et al., 2013;Wapler, 2013). This severe weather can result in injury, death (Elsom & Webb, 2017), damage to property and disruption to services (Sibley, 2012). North-western Europe experiences thunderstorms with variable frequency both seasonally and spatially across the region (Hayward et al., 2022).
There have been several studies reviewing lightning and thunderstorm occurrence both on the regional scale (e.g., north-western Europe) and on the continental scale (e.g., Europe). The European continental scale research shows that thunderstorm frequency and lightning flash density decrease from southeast to northwest (Anderson & Klugmann, 2014;Enno et al., 2020;Taszarek et al., 2021;van Delden, 2001). The spatial distributions of thunderstorms and lightning flash density also vary seasonally, which is noted both on the continental scale (Anderson & Klugmann, 2014) and the regional scale (Hayward et al., 2022;Holley et al., 2014;Holt et al., 2001;Stone et al., 2022). This shows a general pattern of peaks in activity occurring in spring and summer over land areas, moving to coastal areas in the autumn and focused along northern/western facing coastlines in more northern latitudes in the winter.
Details of thunderstorm behaviour have also been reported through analysis and reconstruction of thunderstorm tracks. Thunderstorm tracking in the UK has previously been undertaken for a specific thunderstorm category known as mesoscale convective systems (MCS) (Gray & Marshall, 1998;Lewis & Gray, 2010). MCSs are thunderstorm systems which comprise multiple thunderstorm cells covering an area of at least 100 km in one direction (Webb, 2016). The tracks were produced using multiple datasets including manual thunderstorm observations, newspaper archives, satellite-derived cloud cover data and lightning detection data. The MCS tracks were manually recorded and distinguished from single cell, multicell or small thunderstorm complexes by the area covered and their duration. This enabled construction of a detailed climatology of annual, seasonal and monthly MCS distributions, time of initiation and attribution of track characteristics to synoptic weather conditions. It was found that MCS occur on average twice a year and exclusively between May and September, peaking in August. Most MCS have onset times between 3 and 9 p.m. Track characteristics were linked to synoptic conditions, assisting forecasters in prediction of MCS occurrence and movement which will be of particular importance should predictions of increased precipitation, intensity and volume as a result of mesoscale-convective systems in the future prove accurate (Chan et al., 2023). However, MCSs are not the only damaging type of thunderstorm in the UK and Northwest Europe and tracking different thunderstorm types will provide useful information to help predict severe weather. As MCSs occur relatively infrequently, tracking them does not necessarily require automation. An example of automated tracking was conducted in Belgium using EUCLID lightning location system data (Poelman, 2014). This study used an established tracking algorithm known as A-TNT which identified thunderstorm cells defined by spatial and temporal proximity of lightning occurring within a 4 km search radius over a 3.5-min time-period undertaken at 3-min intervals. The tracks were created by joining the correct clusters using the 30-s time overlap to produce a spatial overlap. Data were obtained on cell area, flashes per cell and cell movement. It was found that in Belgium, thunderstorm cells had a mean lifetime of 21 min, mean area of 140 km 2 , mean velocity of 31 km/h and favour a southwest to northeast direction of movement.
In the USA meanwhile, state-wide thunderstorm tracking has been successfully used to help mitigate thunderstorm damage to power supply infrastructure (Mohee & Miller, 2010). The project took place in North Dakota where the authors were able to gather statistics for thunderstorm characteristics such as track length, speed, direction of movement, wind speed and duration for different regions within the state and identify how these varied based on time of day and year. This thunderstorm dataset was used to develop a risk model for electricity transmission lines and is a good example of how thunderstorm tracking could also help mitigate losses in Europe.
The spatiotemporal distribution of thunderstorm initiations was also analysed using thunderstorm tracking of the US Great Plains . This enabled both a greater understanding of the different thunderstorm formation mechanisms as well as identification of 'hot spots' for thunderstorm production. Increased understanding of thunderstorm formation 'hot spots' and their likely subsequent direction of movement can be similarly useful to thunderstorm forecasters and in early warning systems in other vulnerable regions.
The overall aim of this study is to use a lightning clustering technique to compile a new thunderstorm catalogue that not only contains dates, times and locations for thunderstorms but also variables such as duration, direction of movement and flash intensity. This study focuses on the UK and Ireland but includes neighbouring continental regions within the bounds of 48 to 65 N and 15 W to 10 E. By extending boundaries beyond the UK and Ireland, this research is able to include more complete track information for thunderstorms entering the UK having first originated over the continent and also places our regional understanding of Northwest European thunderstorm movement and characteristics in a wider context, unconstrained by political borders.
A preliminary investigation of the thunderstorm variables contained within the new thunderstorm catalogue is undertaken to assess the accuracy of the dataset and to quantify the confidence with which it can be used by industry, decision makers, forecasters and members of the public. Preliminary results are also examined to identify trends and patterns of thunderstorm behaviour as well as identify areas which would benefit from further research.

| Data
ATDnet lightning flash data have been provided by the UK Met Office. It is a very low-frequency long-range lightning location system, exploiting the time of arrival technique for locating the origin point of lightning emissions by matching their waveforms and using known waveform propagation rates (Enno et al., 2020). The system is designed to detect cloud-to-ground (CG) lightning events by identifying their distinctive 'return stroke' but also detects some cloud-based (IC) lightning. In France, ATDnet has been demonstrated to detect up to 25% of IC flashes and 90% of CG flashes (Enno et al., 2016).
The ATDnet system was chosen because it has not had a significant upgrade since 2008 and therefore offered temporally homogeneous data collection throughout the 2008-2018 11-year period for which data has been made available. The ATDnet waveform propagation style lightning detection is known to be affected by variation in the height of the Ionosphere (Hudson et al., 2016) which may cause some diurnal variation in lightning flash detection rates. Ideally, such inhomogeneities could be mitigated by using multiple lightning detection datasets (Hayward et al., 2022), however, diurnal lightning activity variation is beyond the scope of this study.

| Lightning clustering
This study employs a clustering method to identify lightning flashes which belong to an individual thunderstorm or complex. Our method created in the R programming language (R Core Team, 2020), clusters by quantifying the spatio-temporal proximity of lightning flashes to one another. Thunderstorm identification through spatiotemporal clustering of lightning data was successfully demonstrated by Galanaki et al. (2018). The Galanaki method starts with one lightning stroke and adds any other strokes within a prescribed time and distance to create a cluster. This is then repeated for each stroke in the cluster and for each additional stroke that is added until there are no more strokes that fulfil the prescribed criteria. Our method works on a similar premise but once the initial clusters are created a centroid for each cluster is calculated from which the clusters can then be grouped into thunderstorms. Our method follows three key steps: 1. It identifies the first flash in the dataset and groups this and any other flashes within a defined temporal (T1) and distance (D1) threshold into a cluster which is then given a unique ID number. When flashes are grouped into clusters they are removed from the original dataset. 2. The mean latitude, longitude and datetime (centroid) are calculated for the cluster and all additional flashes from the original dataset that are within the T1 and D1 thresholds of this central point are extracted and added to the cluster. 3. Once all lightning clusters have been formed from the original dataset, they are grouped into thunderstorms by identifying all the cluster centroids that are within a prescribed temporal (T2) and spatial (D2) proximity to one another. Each thunderstorm is given a unique ID number. Figure 1 illustrates these three steps. This method clusters the lightning flashes to produce a thunderstorm 'footprint' which will include any splitting or merging cells.
The following characteristics (variables) of each storm are calculated using the lightning flash data: 1. Total lightning flash count. 2. Thunderstorm initiation time (the first flash in the thunderstorm). 3. Thunderstorm cessation time (the last flash in the thunderstorm). 4. Thunderstorm duration (cessation timeinitiation time). 5. Flashes per minute (total flashes/duration). 6. Distance travelled (the distance in km between the mean location of the first 10% of flashes and the last 10% of flashes). 7. Thunderstorm bearing (the direction of movement between the mean location of the first 10% of flashes and the last 10% of flashes). Bearing will be described as the direction from which the thunderstorm has travelled to align with meteorological convention regarding atmospheric flow direction. That is to say how meteorologists describe wind direction e.g. westerlies are from the west but moving towards the east. 8. Thunderstorm area in square kilometres. Shapefile polygons of each thunderstorm are created for all thunderstorms with at least 3 flashes. The R coding language (R Core Team 2020) kernel density estimate function (kde) is used to identify the spatial extent of the flashes and contours drawn to create the polygons where density is greater than 0.99 meaning that 99% of the data is within the polygon. Area is then calculated using the area function. 9. Flashes per square kilometre (total flashes/area) 10. Thunderstorm speed in km/h. Speed is calculated by thunderstorm distance travelled divided by thunderstorm duration (in hours). Duration is reduced by 10% to bring duration in line with distance (duration is calculated from the first to the last strike and distance is calculated from the centre point of the first and last 10% of strikes).
To ensure the best possible results of the thunderstorm clustering, appropriate values for T1, D1, T2, and D2 must be chosen. Sensitivity testing (Table S1 of the supplemental materials) identified that D1 followed by D2 had the greatest impact on thunderstorm clustering in accordance with the findings of Galanaki et al. (2018) and Hutchins et al. (2014) which also identified distance as the most important factor in lightning clustering.
Previous studies have used a range of possible values for D1 (4 to 25 km), based either on identifying the distance which a lightning strike is most likely to travel from the main thunderstorm or from values used in studies following a similar thunderstorm tracking methodology (Bertram & Mayr, 2004;Courtier et al., 2019;Dixon & Wiener, 1993;Galanaki et al., 2018;Harel & Price, 2020;Houston et al., 2015;MacGorman et al., 2011;Poelman, 2014;Wilkinson & Neal, 2021). These same studies use a range of temporal clustering from 16 min (Galanaki et al., 2018), to 20 min (Bertram & Mayr, 2004) to 1 h (Harel & Price, 2020). Comparing the results of our automated thunderstorm tracking method with manual observations of lightning clusters showed that values of 14.5 km and 38 min were the optimum values for D1 and T1, respectively. The value identified for T1 seems to be fairly consistent with the Met Office guide for thunderstorm observers (The Met Office, 2000) which states that a thunderstorm can be considered to have ceased 15 min after the last audible thunder. The thunderstorm observers are working from a static location and less likely to capture the thunderstorm activity once it moves away and therefore require a slightly more restrictive time constraint than this method.
In the literature, studies that rely on connecting cluster centroids for thunderstorm tracking have used spatial overlap between time intervals, as well as previous storm motion or spatial proximity to identify the most likely clusters to group together (Figueras Ventura et al., 2019;Kohn et al., 2011;Meyer et al., 2013;Poelman, 2014;Strauss et al., 2013). For our algorithm D2 is relative to the value of D1 ( Figure S1 of the supplemental materials), the starting point for which will therefore be the value of D1 × 2 (29 km). Testing was carried out by increasing and decreasing this value incrementally to see if it improved or reduced performance (see sensitivity analysis in supplementary information). A value of 37 km for D2 was selected. Lastly, T2 was identified by calculating the time elapsed to travel F I G U R E 1 Diagram demonstrating the three steps in the lightning flash clustering algorithm.
Step 1 shows the first lightning flash being clustered with other flashes which are within the prescribed spatio-temporal proximity.
Step 2 shows the centroid of the cluster at step 1 and the clustering of any additional flashes which are within the same spatio-temporal proximity to the centroid.
Step 3 shows the clustering of the flashes from any additional clusters where the centroid is within a prescribed spatio-temporal proximity. [Colour figure can be viewed at wileyonlinelibrary.com] 37 km, given an average thunderstorm speed. Thunderstorm speeds have been calculated in European settings of between 7 km/h and 100 km/h (Galanaki et al., 2018;Hayden et al., 2021;Poelman, 2014;Wapler, 2021;Wapler & James, 2015). We identified the optimum speed as 53 km/h which provided a value of 42 min for T2. Further detail of our performance testing for the values of T1, T2, D1 and D2 can be found in the supplemental materials (figures S2 and S3 show the results of systematic variation of these values) and an example of thunderstorm clustering carried out on thunderstorms occurring on the 28 June 2012 is shown in Figure 2. On this day, there were both supercell thunderstorms and MCS (Clark & Smart, 2016) which makes this a good example to show that different thunderstorm types are correctly clustered by this method.

| Thunderstorm variables
Initial investigation of the variables produced for the thunderstorm catalogue showed that some of the values obtained were unreasonably high. This affected thunderstorm variables which were calculated, rather than directly observed, such as speed (calculated from duration and distance), flashes per minute (from duration and flash count), distance travelled and direction (calculated from the location of mean of first 10% of flashes and last 10% of flashes). This inflation of calculated values was identified as affecting storms with very low numbers of flashes because such flashes can be relatively randomly (and widely) distributed in space but clustered closely in time. It was also clear that there was a very large range of values for each variable which, despite some very high values for some variables such as duration, upon detailed investigation were found to be correct and produced by infrequent severe storms.
The thunderstorm catalogue was therefore divided into three categories as follows: • Isolated outbreaks which contain less than 10 flashes.
The calculated variables which are shown to be unreliable for such storms with low flash counts, are excluded from further analysis.
• Active thunderstorms which are defined as those where the variables are less than; 400 min duration, 200 km distance or 300 total flashes (but more than for isolated outbreaks). These thresholds were obtained by identifying the outliers in each of the three variables and removing them from this category. • Severe thunderstorms which are identified as the positive outliers for the variables of duration, distance and flash count.
Despite dividing the thunderstorms into three separate categories, all three categories show the same distribution of positive skew with the vast majority of thunderstorms having lower values for each variable. This is shown in the histograms for active thunderstorms ( Figure 3) and in Figures S4-S11. Figure 3 also shows scatter plots for all storm variables against thunderstorm area per km 2 . We present only the active storm plots because the distributions are similar for all three thunderstorm types. This shows that there is generally a positive correlation (increasing values) for distance, duration and total flashes against thunderstorm area. Flashes per minute do tend to increase with thunderstorm area but there appears to be an additional cluster of very high values associated with small storm areas. This is also present in the speed versus area plot (panel e) which also includes some unrealistically high speeds suggesting that, despite creating the sub-group of isolated outbreaks, there remain some storms within the active dataset which, as a result of being relatively short lived, produce some anomalously high calculated values.

| Spatial distributions
The spatial density distributions of thunderstorm initiation points (location of the first cluster centroid) for each thunderstorm type by season for the 11-year period are shown in Figures 4 and 5. The density scale for each figure represents the value or number of thunderstorms in each class, the classes are divided into 10 equal intervals which means that whilst the absolute values may be different for each map, their relative distributions can be compared.
All three thunderstorm types show similar seasonal variation in initiation points as might be expected, namely that during spring (MAM), thunderstorms are formed over land mass areas with the highest densities found inland over continental Europe. Spring density in England is concentrated in central England (density of up to 60 for active storms) and on the east side of Ireland (up to 30). In summer (JJA) this pattern of higher densities for active storms moves slightly further east and severe storms occur in low density in the UK, Ireland and the southern tip of Norway (up to a density of 16 severe thunderstorms whereas this reaches 80 in the southeast corner of the study area). In autumn (SON) thunderstorm formation shifts to coastal and near coastal areas, concentrated along the European coastline (densities up to 52 for active storms and 5 for severe). Some lesser density 'hot spots' are found along west-facing coastlines, although less so for severe storms, and for active storms and severe storms initiation point density spreads north into inland areas of England. Lastly in winter (DJF) higher density areas are restricted to more localized density 'hot spots' on northern and western facing coastlines with densities of up to 21 in Germany, 34 in Norway, 17.5 in Scotland, 7 in Ireland and 10 in France (English Channel). Severe storms rarely occur in winter and while we can see hotspots on the Northwest coast of the Netherlands, we also see some forming over ocean areas in the Atlantic and off the coast of southern Ireland, possibly associated with returning polar maritime air masses which have been identified as producing increased convective available potential energy in the southern North Sea and adjacent coastal areas (Holley et al., 2014). Figure 6 presents thunderstorm track-roses (windrose style plots) showing the distributions of speed and direction of movement for active and severe storms for each season. This information can be used to anticipate how a thunderstorm is likely to behave once formed. In spring, summer and autumn most storms move from the southwest at speeds of between 30 and 50 km/h. However, in winter this changes to tracks typically coming from the west with a higher proportion of storms moving at faster speeds >60 km/h. When considering this information in conjunction with Figures 5 and 6 we can see that these fast westerly winter thunderstorms often develop on northwest-facing coastlines (active storms) and over the Atlantic (severe storms). Their veered steering behaviour (from west to east) is partly governed by the influence of an on-average stronger jet stream in winter combined with a change to surface forcing which is more commonly associated with warming from the sea surface and release of potential instability due to uplift over hills, leading to a greater proportion of local thunderstorm formation rather than import from the south or south-west. The severe storms show the biggest shift towards westeast movement in winter with almost half of storms moving in this direction and all of them at over 50 km/h, whereas 25% of winter active storms move from the west of which approximately half reach speeds over 50 km/h. No severe storms move from the east in winter and the next most common direction of origin is from northwest, whereas active storms more commonly move from the southwest.

| Dangerous thunderstorms
Thunderstorms can pose a particular flash flooding hazard when they are slow moving and persist for a sufficient duration to produce a large amount of rain over the same region. It is however, important to note that any duration of heavy rainfall may produce flash flooding and whilst these short events are more frequent this section will focus on the potentially most extreme of such events. In this instance therefore, dangerous thunderstorms were defined as those with a duration of at least 60 min but speed less than 10 km/h. Thunderstorms with fewer than 10 lightning flashes (isolated storms) were not included because of the likelihood that less electrically active storms may introduce unrealistic statistics although their speeds are often calculated as extremely fast moving in any event. This means it is difficult to capture behaviour characteristics of low lightning thunderstorms when reliant on lightning flash data alone. This dangerous thunderstorm criteria also means it is possible that thunderstorms where individual cells form repeatedly in a similar location and move away quickly one after another may be captured as 'slow moving' despite individual cells being fast moving. However, the potential for multiple cells moving over the same region also has the potential to produce dangerous amounts of precipitation and will not be removed from this classification. Figure 7 displays both the initiation points of dangerous storms as well as their track-roses showing speed and directions of movement for each season. In spring there are hot spots for dangerous storms in both the southeast and southwest of England (density up to 3.7), there is also a hotspot inland in the southeast of Ireland (density up to 2.5). Most dangerous storms occur inland from the European coastline but also around Normandy and Paris. The absolute numbers for these dangerous storms are relatively low compared with active, isolated and severe storms, however, over 11 years, the climatological distribution has value for proper preparedness. For example, as the density over London in spring is up to 3.76 storms over 11 years, it might be considered a proportionate response to ensure that adequate drainage repairs and cleaning take place at the start of the spring. In the summer the density generally increases for all inland areas and increases in density inland from the European coastline. There is a hotspot (density 11.2, namely one per year) on the north-eastern coast of England. During the spring and summer, there is no dominant direction of movement for dangerous slower moving thunderstorms, unlike active and severe storms which have a clear southwesterly preference. This is perhaps not surprising since low speed of movement is often associated with an indistinct or undefined track.
In autumn, dangerous storms are reduced in frequency and shift to the south coast of England, the density reaching two in coastal areas around the English Channel and up to 3.5 inland of the European coastline. The increase in dangerous thunderstorms in Europe shows that, despite the shift to thunderstorm formation in southern coastal areas of the UK, there remains some continental mechanism for formation of dangerous slowmoving thunderstorms. Dangerous thunderstorms travel most frequently from the northwest with direction of movement frequency generally decreasing anti-clockwise so that thunderstorms that travel from the north and northeast are the least frequent. In winter dangerous thunderstorms remain coastal features but occur around north and west-facing coastlines with the direction of movement often being from between the northwest and the southwest. However, there are some very low-speed values. Density values for dangerous thunderstorms in winter are very low and produce a maximum density of 0.6 for the whole 11-year period.

| Annual distributions
The annual distribution of the different thunderstorm types is shown in Figure 8a. All four types of thunderstorms have the expected peak in thunderstorm occurrence in July. Active storms and isolated outbreaks also have a secondary peak in winter (December and January). Active thunderstorms produce a larger peak in frequency in July compared to isolated storms. However, between September and April isolated thunderstorms overtake active ones to produce the greatest frequency of storms from autumn to spring. These isolated thunderstorms are fairly short-lived convective events with the odd pockets of electrical activity which occur fairly randomly over a region but not in enough quantity to form a cohesive thunderstorm which produces a track.
The annual median distributions of characteristics for each type of thunderstorm are shown in Figure 8b. These distributions show that severe storms produce the largest flash counts and intensity (flashes per minute) in summer as well as lasting for a longer duration. However, during the summer these storms move more slowly so have a lower median distance and area value. In the winter the storms speed up and cover a larger distance but have a lower intensity. Active storms are also faster moving in the winter but do not persist for quite as long as they do in the summer. The influence of the jet stream likely increases storm speed but as the storms travel inland from the west/northwest-facing coastlines they are no longer fuelled by the warm oceans (relative to cold polar air) and are therefore not sustained.

| Weather patterns: The role of synoptic climatology
The thunderstorms were each assigned to a surface weather pattern, namely the pattern designated for the date that the thunderstorm was formed based on patterns defined by Neal et al. (2016) and shown in Figure 9. Track-roses were produced for thunderstorm speed and direction of movement for each weather pattern ( Figure 10). Table 1 groups the weather patterns where thunderstorm behaviour is similar into six broad classifications.

| Dominant southeast (pattern 27)
Patterns 17 and 27 produce thunderstorms with a dominant direction of movement from the southeast which aligns with the weather patterns where flow originates from central and southeast Europe. In the summer this flow type will behave similarly to the Spanish Plume bringing warm moist air to the south of the study area.
Pattern 27 has a low mean flash count of 151 (relative to the other examples in Table 2), a medium mean duration and flashes per minute, but has the second fastest mean speed of 28.9 km/h. This low mean flash count is likely due to the location and season of the greatest frequency of thunderstorm occurrence which for pattern 27 is shown to be in coastal regions, particularly northerly facing coastlines in the summer  (Hayward et al., 2022), where thunderstorms are much less likely to be as electrically active as land-based summer thunderstorms.

| Dominant southwest (pattern 1)
Most weather patterns produce a dominant frequency of direction from the southwest. Patterns 2, 7, 12, 15, 19, 21 and 22 produce a clear south-westerly flow through most of the study area whilst in patterns 3, 5, 8, 10, 11, 20, 24, and 29 the flow direction varies regionally. The largest number of thunderstorms are likely occurring in the region where the flow direction is south-westerly which for region 8 is in the south of the study area but for region 10 is in the northeast. In the northeast, we would expect the greatest density of thunderstorms to occur in the winter whilst in the south this would be spring and summer (Figure 4). Pattern 1 produces a northwesterly flow which does not agree well with the track-roses clear frequency preference of storms moving from the southwest. It is not clear whether this is the result of limitations to weather pattern generalization or whether variations (or transitional stages between patterns during the course of the day) might produce south-westerly flow in some parts of the study area.

| Ranges between southwest and southeast (pattern 16)
The track-rose for pattern 16 shows the greatest frequency of thunderstorms originating from a southwest or southerly direction. Pattern 16 is identified as the Spanish Plume weather pattern (Hayward et al., 2022;Wilkinson & Neal, 2021) producing thunderstorms in the summer by the mechanism of a low-level southerly flow bringing warm moist air up from Iberia. Thunderstorms also frequently originate from a south easterly direction; these storms are most likely from the southeast of the study area where the position of the high pressure means that this area may receive a south-easterly flow. The mean value for flash count for thunderstorms occurring during weather pattern 16 is 597 which is the largest mean flash count for all weather patterns shown in Table 2. The range is also the highest showing that the storms can also be much less or much more electrically active than this. These storms also last the longest and have the highest mean flashes per minute but move slowly with almost the slowest mean speed of 25 km/h.

| Mostly west (pattern 14)
The patterns which produce a west-east direction of movement all show a clear westerly flow, although for patterns 14 and 30 this westerly flow is concentrated in the south of the study area (where it is likely that the highest number of thunderstorms are located). Fourteen is identified as a pattern frequently producing thunderstorms by Hayward et al. (2022) in the southeast of the UK during the summer, autumn and winter. This pattern is likely providing air that is relatively cooler than ocean areas in the winter and the autumn and cooler than the continental land areas in the summer as it originates from the north and north-west. Daytime warming over land areas may also trigger convection during spring and autumn producing thunderstorms which can move out to sea and persist overnight. Patterns 23, 26 and 30 are also identified by Hayward et al. (2022) as frequently producing thunderstorms in northwest-facing coastlines in the UK and Ireland year-round where the airmass is cold (originating from north and Northwest Atlantic) relative to the sea and in continental based regions and the English Channel in the autumn and winter. In these regions during winter storms originating from the northnorthwest and west may penetrate a short way inland under these weather patterns. The mean values for thunderstorm variables in pattern 14 are fairly average relative to the other examples in Table 2 and broadly similar to pattern 27. Mean speed, however, is the second fastest in Table 2 (32 km/h), these faster moving storms are likely the result of the speed of flow which would explain the colder air being able to penetrate further south whilst remaining cold enough to trigger instability over the landmass and inshore waters.
T A B L E 1 Classification of thunderstorm direction of movement by weather pattern.

Thunderstorm movement
Pattern number

16
Mostly west 4, 13, 14, 23, 26, 30 Lange directional range between east, south and west 6, 9, 28 Inconsistent 18, 25 Note: The direction given is the direction from which the storm moves. Numbers in bold are also detailed in Table 2.

| Large directional range between east, south and west (pattern 6)
There is a large range of movement direction with a similar frequency for patterns 6, 9 and 28 which includes easterly, southerly and westerly (northerly directions being less frequent). Unsurprisingly therefore there is little similarity between these patterns save that flow direction may vary across the UK depending on study area location. In pattern 28, for example, the north of the study area is receiving an easterly flow whereas the south receives a southerly. In pattern 6, the high-pressure area is directly over the UK with flow from the southwest in the north of the study area and northeast in the south. For the thunderstorm variables pattern 6 produces similar means to pattern 16 with the second highest flash count (430), duration, and intensity (1.37) but the slowest moving speed (22 km/h). Wilkinson and Neal (2021) identified pattern 6 as occurring on the 28 June 2011 in which a Spanish Plume event from Iberia caused very severe thunderstorms over the southeast of the UK as described by Sibley (2012). On this day thunderstorms moved from the southwest and this event is well known to have produced intense lightning and damage. Wilkinson and Neal (2021) acknowledge that this case, with the diagnosis of pattern 6 on this day, may demonstrate the limitations of classifying very variable synoptic conditions into 30 general patterns as this Spanish Plume event does not fit well with pattern 6. Given the extreme nature of thunderstorms on 28 June 2011, this event may very T A B L E 2 Summary statistics (mean, median, standard deviation and interquartile range) for thunderstorms that occur under popular weather patterns. Summary statistics are calculated for each variable (flash count, duration, area km 2 , flashes per minute and speed) for all active and severe thunderstorms. well bias variables such as flash count and indeed the median value is much lower than the mean.

| Inconsistent (pattern 18)
Patterns 18 and 25 also produce a variable frequency in movement direction and feature high pressure situated directly over the UK. In this case isobars show that the direction of flow will be different depending on location within the study area. Furthermore, these two patterns have a low sample size which may contribute to greater variability. Upon inspection of the initiation points of thunderstorms under these two weather patterns it is clear that they do not form over the UK land area at all and are concentrated over the European land areas.
There are also thunderstorms forming, in a somewhat random geographic pattern, over the oceans in the study area with a small cluster over the Atlantic to the Northwest of Scotland. This shows that thunderstorms are very rarely forming within the regions covered by the high pressure itself. Pattern 18 has the largest number of thunderstorms in this group but has the lowest mean flash count (47), shortest mean duration, lowest intensity but faster moving thunderstorms (37 km/h) than other examples in Table 2.
The difference between mean speeds only varies by 15 km/h for the examples discussed here and 25 km/h between the mean speeds of all the weather patterns (22.56 km/h under pattern 6 and 47.96 km/h under pattern 30). There may not be a significant difference between most of the mean speeds produced under different patterns (many being clustered around 30 km/h) but it can be useful to note the patterns which produce the greatest differences.

| Thunderstorm tracking method and accuracy of thunderstorm catalogue
Use of the thunderstorm catalogue to research storm behaviour requires a reasonable degree of certainty that the data collected for thunderstorm variables such as speed and direction of movement are accurate. It is clear from investigation of this dataset that caution is required when relying on thunderstorm variables which have been calculated, rather than directly observed, for low flash thunderstorms. The grouping of spatially disparate lightning flashes that occur close together in time has led to these variables (speed, flashes per km, flashes per minute, distance and direction of movement) producing potentially unrealistic results. Some small storms or pockets of electrical activity may be grouped together by the tracking algorithm. For example, where a small number of lightning flashes occur seconds apart but 25 km apart in distance, this will provide an unrealistic storm speed estimate. Direction and distance are calculated based on the difference between the first 10% and last 10% of flashes which, where there are very small numbers of flashes, can produce inconsistent and somewhat random statistics. When using this thunderstorm catalogue it is therefore important to consider this issue and choose a filtering method which balances the need to exclude inaccurate data with the need to retain data which is accurate. Filtering criteria should be considered in light of the use to which the data is to be put and no blanket recommendation can be proposed in this regard. For this study, all thunderstorms with a flash count of less than 10 were separated (isolated outbreaks) and we did not use any of the variables that were calculated. Despite segregating the isolated outbreaks, there remained some unrealistic values for speed in the remaining (active) thunderstorms ( Figure 3) that needed to be removed prior to creating the active thunderstorm track-roses. Further work reliant on this dataset may include comparison with rainfall radar to determine whether these isolated outbreaks are part of the same or separate cumulonimbus.
Similarly, at the other end of the spectrum, there are infrequent severe thunderstorms that produce a large amount of lightning and persist for hours (sometimes 24 h), and which visual inspection of the flash distributions show to be MCS. These systems often persist from afternoon until late the following morning, covering large distances. In this case, the variables (and calculated variables) are reliable but their inclusion with the main body of the thunderstorm catalogue introduces skew to any summary statistics or probabilistic analysis, potentially producing misleading information for warning, forecasting and decision makers in relation to storms that occur more frequently. These infrequent severe storms can also be analysed separately, ensuring that more frequent less severe storms do not mask the potential for intense lightning and precipitation events. In this paper severe storms were treated separately and it was shown that a different annual distribution of variables was present for severe versus active storms. In this case severe storms were defined by identifying those with large outlying values for the thunderstorm variables. This is not the only way to define or subdivide the thunderstorm catalogue and for different studies there may be more appropriate ways, for example, Gray and Marshall (1998) use specific qualifying definitions, such as size, as a basis for identifying MCSs.
This study has used the thunderstorm catalogue to investigate the behaviour and distribution of thunderstorms, the results of which show that the tracking method and catalogue can be confidently relied upon. The spatial distribution of thunderstorm initiation points (Figures 4-6) follows current understanding of seasonal variations in instability. The summer preference for thunderstorm formation over land which shifts to coastal/ ocean areas in the autumn and winter is also found in multiple European-based studies (Anderson & Klugmann, 2014;Enno et al., 2020;Holley et al., 2014;Holt et al., 2001;Wilkinson & Neal, 2021). Analysis of thunderstorm direction of movement under different weather patterns shows that there is good agreement between the two for the majority of weather patterns. Further, pattern 16 has been identified as a pattern that produces Spanish Plumes (Wilkinson & Neal, 2021) which is known to produce severe storms in the study area (Gray & Marshall, 1998;Lewis & Gray, 2010). The statistics produced for pattern 16 are consistent with the presence of severe storms occuring on some days, showing larger mean flash counts, duration and distance travelled. For weather patterns where there is spatial variation in flow direction across the study area, we can also see that there is greater variation in thunderstorm direction of movement. However, there are weather patterns which produce a consistent direction of thunderstorm movement under a weather pattern with a regional variation in flow. In this event, it is possible that the subregion of the study domain that receives the flow direction most similar to the dominant thunderstorm movement produces the greatest frequency of thunderstorms. The only weather pattern where there is a clear difference between the dominant thunderstorm direction of movement and general surface airflow pattern is pattern 1. The reason for this discrepancy is unknown; whether it is (a) the result of difficulties generalizing synoptic conditions into 30 general patterns or (b) that pattern 1 tends to produce thunderstorms that do not form a distinct 'track', is unclear. However, it is important to note that surface-level flow is not the main control mechanism for thunderstorm movement direction due to the depth of thunderstorm cumulo-nimbus clouds. The steering level for thunderstorms whose movement is subject to flow direction is generally considered to be 6 km above mean sea-level (Bertram & Mayr, 2004). There are also other mechanisms which influence storm movement such as topographic barriers or thunderstorms which grow large enough to move under their own energy and therefore deviant from the flow.

| Application of thunderstorm tracking data
Understanding the behaviour of thunderstorms leads to highly useful predictability. Lightning and thunderstorm day climatologies provide information on where and when thunderstorms occur and how frequently (Anderson & Klugmann, 2014;Enno et al., 2020;Hayward et al., 2022;Holt et al., 2001) but do not provide insight into factors such as storm movement, intensity or duration. Some lightning climatology research has used the ratio of lightning flash density to thunderstorm days to determine whether a region has high flash density due to frequent thunderstorm occurrence or due to infrequent but very intense thunderstorms. An example where this is the case is in the Congo basin (Soula et al., 2016) which provides important information from a hazard planning perspective as in this case the high flash density was caused by infrequent very severe storms indicating that the thunderstorm day frequency was the most important metric to anticipate recurrence rate of thunderstorms as well as the potential for extreme weather hazards when they do occur. The thunderstorm track catalogue, on the other hand, allows storms of different severity to be directly identified and separately analysed to provide greater understanding of probable hazard exposure.
Probabilistic data on thunderstorm behaviour can be used to aid early warning of thunderstorm hazards to identify the region most likely to be impacted once a lightning strike is detected. To accomplish this, thunderstorms should be subdivided at least by season and region, but also by weather pattern. As shown here, most weather patterns produce a clear dominant direction of thunderstorm movement, despite surface-level weather pattern flow direction not necessarily being the main control. Weather patterns may reflect similar atmospheric conditions and therefore correlation between these and probabilistic data could be useful to aid prediction of thunderstorm behaviour. Further work producing regional and seasonal level thunderstorm behaviour statistics for weather patterns could be used alongside risk of thunderstorm occurrence by weather pattern statistics (previously investigated in Hayward et al., 2022) to produce an early warning/forecasting decision support tool.
Further work utilizing this new dataset needs to be conducted to investigate how thunderstorms potentially behave differently around different geographies such as topographic features (slopes, mountains and coastlines) and how changes in land use influence storm movement. In Brazil, there has been shown to be an increase in thunderstorm days as a result of increased urbanization (Pinto, 2015) and greater understanding of how this affects thunderstorm behaviour may allow us to understand potential future changes with land use. Therefore, the interaction of thunderstorm behaviour with urban heat islands is a subject which could benefit from future application of the catalogue. The thunderstorm (lightning footprint) shapefiles produced by this research (to provide the thunderstorm area variable) can enable comparison between thunderstorm extent and rainfall severity to assess the impact of thunderstorms on flash flooding events and support catchment management. Lastly, changes in frequency and severity of thunderstorm activity as a result of climate change are likely although how this will change is as yet uncertain (Allen, 2018;Brooks, 2013;Taszarek et al., 2019). A warming climate does have the potential to alter atmospheric circulation and jet stream variability. The effect of melting arctic sea ice, for example, has been modelled to alter mid-latitude circulation, although as it is part of a complex system the potential effects of this forcing mechanism have not yet been conclusively observed (Barnes & Screen, 2015;Blackport & Screen, 2021). An increase in lightning and thunderstorms activity in the British Isles and northern Europe as well as around areas of increased elevation have been modelled (Kahraman et al., 2022). Despite this predicted increase in thunderstorms and lightning, there has been an observed decline in thunderstorm days in Oxford, England (Burt, 2021) which has been successfully linked to the decline in occurrence of thunderstorm producing synoptic set ups (Lamb weather types). Understanding the distributions of thunderstorm variables by weather pattern or type may therefore provide insight into future thunderstorm behaviour where the variability of weather pattern occurrence, as a result of climate change, can be modelled. Further work investigating the impact of global warming on weather pattern frequency should consider that these synoptic types represent one variable which may potentially be used to predict future thunderstorm hazards and perhaps consider these in tandem with variables such as land use changes (urban heat islands) and modelled CAPE predictions.

| SUMMARY AND CONCLUSION
This study used ATDnet lightning flash data to identify thunderstorms occurring over an 11-year period between 2008 and 2018 over north-western Europe. Thunderstorms were identified by grouping lightning flashes based on their spatio-temporal proximity to one another. Once grouped shapefiles of the thunderstorms were created, thunderstorm variables such as flash count, duration and direction of movement were calculated. Preliminary analysis of this data has produced the following findings: • Thunderstorms with low flash counts can produce unreliable estimates of (rather than directly observed) variables such as speed, direction and distance travelled as a result of grouping flashes which are more disparate in space than time. Care should be taken when analysing calculated variables and consideration given to filtering out low flash thunderstorms or treating them separately where appropriate. • Rare severe thunderstorms, although producing more reliable parameters, can also introduce skew into any summary statistics that may be derived from the thunderstorms' variables and therefore, where appropriate, should also be treated separately. • Thunderstorm formation and behaviour varies spatially by season, for example, in summer most thunderstorms are formed over land areas but in the winter thunderstorm formation occurs in highest density along north and northwest-facing coastlines. These winter thunderstorms move faster, and the predominant direction of movement shifts from being from the southwest in summer to being from the west in winter. This shift occurs for a greater proportion of thunderstorms in the severe category. • The thunderstorm variables allow the identification of thunderstorms that pose a risk from a specific hazard such as flash flooding. This study identified slowmoving thunderstorms (classified as dangerous storms) that have the potential to deliver large amounts of intense rain in a concentrated area. This is particularly dangerous for urban areas, where drainage systems can be quickly overwhelmed and rainwater unable to soak into impermeable materials such as roads causing disruption to transport. Some river catchment systems can be prone to flashy behaviour, meaning the time it takes for rainwater to reach the river channel is very short. This makes them vulnerable to overtopping their banks under periods of high peak stream flow due to heavy rainfall. Dangerous storms are identified as occurring relatively frequently in some locations which are also vulnerable to high rainfall events namely, southwest England (steep-sided valleys) and southeast England (urban areas). This data, along with the seasonal distributions of dangerous thunderstorms, may assist with preparedness against rainfall hazards. • Severe storms are most intense during the summer (based on flashes per minute) and persist for longer but have a shorter median distance travelled, likely because they are slower moving (have a lower median speed). However, some severe thunderstorms can move over very large distances (such as supercells) but occur less frequently, thus not impacting this median value. Some further work may be warranted in distinguishing between the different types of severe storm and how these types might impact their behaviour characteristics. In winter, severe and active storms tend to move faster but are shorter-lived. • There is good agreement between weather pattern type flow direction and the direction of thunderstorm movement for nearly all weather types. This enhances confidence in the calculated direction variable but also indicates that this information can be used to provide probabilistic data on areas at risk based on weather pattern type, for early warning purposes.
The results show that the thunderstorm variables and statistics contained in this catalogue can be confidently relied upon. They also show that investigating this dataset can provide insights that can be useful for early warning systems, forecasting and hazard mitigation, for all of which further work should be carried out.