Upward Lightning at Wind Turbines: Risk Assessment From Larger‐Scale Meteorology

Abstract Upward lightning (UL) has become a major threat to the growing number of wind turbines producing renewable electricity. It can be much more destructive than downward lightning due to the large charge transfer involved in the discharge process. Ground‐truth lightning current measurements indicate that less than 50% of UL could be detected by lightning location systems (LLS). UL is expected to be the dominant lightning type during the cold season. However, current standards for assessing the risk of lightning at wind turbines mainly consider summer lightning, which is derived from LLS. This study assesses the risk of LLS‐detectable and LLS‐undetectable UL at wind turbines using direct UL measurements at instrumented towers. These are linked to meteorological data using random forests. The meteorological drivers for the absence/occurrence of UL are found from these models. In a second step, the results of the tower‐trained models are extended to a larger study area (central and northern Germany). The tower‐trained models for LLS‐detectable lightning are independently verified at wind turbine sites in this area and found to reliably diagnose this type of UL. Risk maps based on cold season case study events show that high probabilities in the study area coincide with actual UL flashes. This lends credibility to the application of the model to all UL types, increasing both risk and affected areas.


Introduction
The growing importance to produce renewable energy has recently led to a notable increase in the number of wind turbines (e.g., Pineda et al., 2018).Since those structures are commonly taller than 100 m, the initiation of upward lightning (UL) propagating from the tall structure towards the clouds is facilitated (Berger, 1967).A tall structure is more prone to experience 1 arXiv:2301.03360v1[stat.ML] 9 Jan 2023 UL as it is exposed to a stronger electrical field in comparison to the ground.Structures shorter than 100 m mainly experience downward lightning (DL) with leaders propagating from the clouds towards the earth surface (e.g., Rakov and Uman, 2003).
As wind turbines are getting taller, UL is the major weather-related cause of severe damages to them (e.g., Rachidi et al., 2008;Montanyà Puig et al., 2016;Pineda et al., 2018;Matsui et al., 2020;Zhang and Zhang, 2020).It can be much more destructive than DL due to its initial continuous current (ICC) lasting approximately ten times longer than the current of DL.
Ground truth lightning current measurements at the specially instrumented tower on top of the Gaisberg mountain (Austria, Salzburg) reveal that more than 50 % of UL is not detected by conventional lightning location systems (LLS).The reason is that the LLS cannot detect a particular subtype of UL having only an ICC (Diendorfer et al., 2015;March et al., 2016).Even though towers exist providing ground truth lightning current data for LLS-detectable UL (UL-LLS) such as the Säntis Tower in Switzerland, the Gaisberg Tower is the only instrumented tower in Europe providing the full information on the occurrence of both UL-LLS and LLS-non-detectable UL (UL-noLLS).
Standards for lightning protection of wind turbines (e.g., IEC61400-24, 2019) crucially underestimate the occurrence of UL at wind turbines since they currently rely only on three factors: The height of the wind turbine, the local annual flash density derived from LLS and an environmental term involving factors like terrain complexity or altitude (Rachidi et al., 2008;Pineda et al., 2018).Lightning activity in summer clearly dominates the annual local flash density due to large amounts of DL caused by deep convection.However, UL is expected to be the dominant lightning type at wind turbines with a tendency to be even more important in the colder season (Diendorfer, 2020;Rachidi et al., 2008).Further the risk assessment standards cannot account for UL-noLLS, but can only account for UL-LLS given that a tall structure is present.
The major objective of this study is to assess the risk of UL-LLS and UL-noLLS at wind turbines over a larger domain.Only at very few points the actual occurrence of UL can be analyzed based on direct measurements.Even though LLS networks exist which might allow to analyze UL-LLS at tall structures, the lightning current measurements show that a significant proportion is missed.Being aware that conventional LLS cannot assess the full risk of UL at wind turbines, this study uses a new approach.
It uses machine learning techniques linking the occurrence of UL to the larger-scale meteorological setting.The occurrence of UL can only be provided by ground truth lightning current measurements.These are the basis to build and train the statistical models used to eventually assess the risk of UL over a whole study domain.Specifically, this study employs conditional inference random forests (Hothorn and Zeileis, 2015), which account for highly nonlinear and complex interactions between the incidence of UL to the tall structures and the atmosphere.The achievement of the major objective requires several steps.
From lightning current measurement data at two instrumented towers in Austria (Gaisberg Tower) and Switzerland (Säntis Tower) two models are constructed: One for UL-LLS and one for UL-LLS + UL-noLLS.These shall first find whether there is a relationship between larger-scale meteorological variables and the occurrence of UL and second demonstrate how well larger-scale meteorology can serve as a diagnostic tool to infer the occurrence of UL.
The benefit of the availability of UL-LLS data helps to verify whether the results from the instrumented towers are transferable.The idea is to extract wind turbine locations within the study domain and identify all lightning strikes to them from the colder season (ONDJFMA) using LLS data.Succeeding in reliably diagnosing UL-LLS from larger-scale meteorology in combination with UL ground truth lightning current measurements provides a stronger reliability of the results when in a final step the risk of UL-noLLS, which cannot be verified using LLS data, is assessed.
The following sections are organized as follows.Section 2 introduces the two instrumented towers providing the necessary ground truth data for this study.The first one is the Gaisberg Tower providing both UL-LLS and UL-noLLS and the second one is the Säntis Tower providing only UL-LLS.Further this section introduces the identification of lightning at wind turbines in the study domain as well as the meteorological data used.
Section 3 summarizes the procedures and major findings from the two instrumented towers.Section 3.1 describes the basic principle of the construction of a random forest model.Section 3.2 presents the performance of the models at the instrumented towers.Further, the most important larger-scale meteorological variables are introduced which lead to a higher risk of UL (Sect.3.3).
Then, Sect. 4 presents the results extending the models from the instrumented towers to the larger study domain to find regions with a higher risk to experience UL.Section 4.1 diagnoses UL-LLS at wind turbines and presents case studies.Then, in Sect.4.2 the risk of UL-LLS and UL-LLS + UL-noLLS at wind turbines is illustrated and discussed using the whole period of consideration.
Section 5 concludes and summarizes the most important findings.

Data
This study combines five different data sources: UL data measured directly at the Gaisberg Tower in Austria (Diendorfer et al., 2009) and at the Säntis Tower in Switzerland (Romero et al., 2012); LLS data measured remotely by the European Cooperation for Lightning Detection (EUCLID, Schulz et al., 2016); larger-scale meteorological variables from the reanalysis database ERA5 (Hersbach et al., 2020); wind turbine locations identified using the OpenStreetMap database.

Direct UL measurements at instrumented towers
Figure 1 shows two of the very few instrumented towers for the direct measurement of currents from UL.These are the Gaisberg Tower (1 288 m amsl, 47 • 48 N, 13 • 60 E) and the Säntis Tower (2 502 m amsl, 47 • 14 N, 9 • 20 E).Lightning at the instrumented towers is almost exclusively UL.Gaisberg Tower recorded in total 819 UL events between 2000 and 2015.Säntis Tower recorded 692 UL events between 2010 and 2017.
A sensitive shunt type sensor at Gaisberg allows to measure all types of upward flashes regardless of the current waveform, i.e., UL-LLS and UL-noLLS.However, inductive sensors employed at Säntis cannot measure upward flashes with only an ICC (approximately 50 %, Diendorfer et al., 2015).
Direct UL current measurements are the crucial prerequisite to construct the random forest models, which are extended to the larger study domain after being trained on the tower data.The combination of data from both towers allows to construct the two types of models, that shall diagnose UL-LLS and both UL-LLS + UL-noLLS.

UL-LLS at wind turbines and study domain
Remotely detected lightning data by the LLS EUCLID and wind turbine locations derived from OpenStreetMap serve as verification of the statistical models assessing the risk of UL-LLS for the selected study domain.
Within the study domain of 50°N-54°N and 6°E-16°E, 27 814 wind turbines have been installed by the end of 2020 (Fig. 1).
Having extracted the exact locations of these wind turbines, lightning strikes within a 0.003°circular area (approximately within 300 m radius) detected by EUCLID are identified and assumed as UL.EUCLID measures DL with a high flash detection efficiency of more than 90 % (Schulz et al., 2016).As mentioned, UL might be detected less efficiently (< 50 % Diendorfer et al., 2015).
Due to its destructive potential and its severe underestimation in the current lightning protection standards, UL shall be explicitly accounted for investigating the risk of UL at wind turbines in the study domain.The tower-trained models are based on UL data throughout the year.However, as UL is dominant in the colder season compared to DL, only the months from October to April, starting from October 2018 until December 2020 are considered in the verification part of the study.Further, since DL is dominant in the warmer season, the extraction of lightning strikes to wind turbines would possibly lead to ambiguity in the identification of DL or UL when considering the whole year.(Farr and Kobrick, 2000).

Meteorological data
ERA5 provides hourly reanalysis of the state of the atmosphere.It has a resolution of 31 km horizontally ( grid cell size of 0.25 °x 0.25 °) and 137 levels vertically.This study uses 35 directly available and derived variables at the surface, on model levels and integrated vertically.These reflect variables relevant for cloud electrification, lightning and thunderstorms (Morgenstern et al., 2022).A full list of variables can be found in the Appendix A. Data are spatially and temporally bilinearly interpolated to each Gaisberg and Säntis Tower UL observation as well as to each grid cell within the study domain in the verification part of this study.

Methodological procedures and findings from the instrumented towers
This section provides the required background information on the basic methods as well as important outcomes from the analysis at the instrumented Gaisberg Tower and Säntis Tower.Three different aspects shall be covered in the following: First the principle how the basic model, i.e., a random forest, is constructed.Second, the performance of the models and third, which variables are most important to identify favorable conditions for UL to occur or not.

Construction and verification of the tower-trained random forests
A machine learning technique, which has been recently widely adopted in various scientific fields, is used to link larger-scale meteorology and the occurrence of UL at the instrumented towers.Random forests are highly flexible and able to handle nonlinear effects capturing complex interactions with respect to the stated modeling problem (Strobl et al., 2009).
The occurrence versus the non-occurrence of UL is a binary classification problem which is tackled using 35 larger-scale meteorological variables (predictors).Each meteorological predictor is linked to a situation with or without UL at the Gaisberg or Säntis Tower using a random forest.A random forest combines predictions from several decision trees, learned on randomly chosen subsamples of the input data.
Specifically, the trees in the random forest are constructed by capturing the association between the binary response and each of the predictor variables using permutation tests (also known as conditional inference, see Strasser and Weber (1999)).
The idea is that, in each step of the recursive tree construction, the one predictor variable is selected which has the highest (most significant) association with the response variable.Then, the dataset is split with respect to this predictor variable in order to separate the different response classes as well as possible.Splitting is repeated recursively in each of the subsets of the data until a certain stopping criterion (e.g., regarding significance or subsample size) is met.The forest combines 500 of such trees, where each tree is learned on randomly subsampled two-thirds of the full data set and only considering six randomly selected predictors in each split.Finally, the random forest averages the predictions from the ensemble of trees, which stabilizes and enhances the predictive performance.See Hothorn et al. (2006) and Hothorn and Zeileis (2015) for more details on the algorithm and implementation.
To validate the resulting models, the input data are split into training and testing data samples.On the training data, the models are trained and on the unseen testing data the diagnostic ability is assessed.Leave-one-out cross-validation is used for validating the models for UL-LLS and UL-LLS + UL-noLLS.The first model for UL-LLS uses both Säntis data and Gaisberg data to increase the size of the training data.The particular flash type that cannot be detected at the Säntis Tower is left out from the Gaisberg data during the training procedure to ensure consistency.The second model for UL-LLS + UL-noLLS uses only Gaisberg data, as only the Gaisberg Tower provides full information on all subtypes of UL.The input model response (i.e., did UL occur or not) is sampled such that the two classes are balanced, i.e., situations with and without UL are present with equal proportions.Assessing the models' performance, the models diagnose the conditional probability on data not considered during training the models, i.e., on the respective day left out.We call the probability conditional due to the balanced model response setup.To diagnose the conditional probability of UL on days without UL as well, days without UL from each season are randomly sampled between 2000 and 2017.High diagnostic ability relates to high probabilities whenever UL occurred at Gaisberg or Säntis in the particular situation (i.e., a high true positive rate) and low probabilities whenever no UL occurred (i.e., a low false positive rate).

Performance of the tower-trained random forests
The tower-trained random forest models can reliably diagnose both UL-LLS and UL-LLS + UL-noLLS when validated on unseen withheld data from the towers.Figure 2 summarizes the cross-validated diagnostic ability according to the random forests for UL-LLS + UL-noLLS (Gaisberg) and UL-LLS (Gaisberg + Säntis).Both model ensembles show a similarly good diagnostic performance.The diagnosed median conditional probabilities are about 0.8 given that UL was observed in the respective situation (minute).This indicates a high true positive rate.Similarly, for situations without lightning (right), the conditional probabilities are low indicating a low false positive rate.
That the random forest including UL-noLLS has the highest diagnostic ability demonstrates that the proportion which cannot be detected by conventional LLS can be indeed reliably diagnosed by larger-scale meteorology alone.This supports the idea to also investigate the risk for unverifiable UL-noLLS and not only for UL-LLS.

Meteorological drivers for UL-LLS at the instrumented towers
Random forests allow to assess the influence of individual variables on the models' diagnostic performance.This is done by computing the so-called permutation variable importance.The idea is to break up the relationship between the response variable and one predictor variable by neglecting its information when assessing the models' diagnostic performance.Neglecting the information of one predictor variable is done by permutation, i.e., randomly shuffling its values and then assessing how much the diagnostic performance decreases.Figure 3 visualizes the computed median permutation variable importance according to 100 different random forest models for UL-LLS.Each of the 100 models is based on a balanced proportion of situations with UL and randomly chosen situations without UL.Results for UL-LLS and UL-LLS + UL-noLLS models are very similar.
Convective precipitation has the largest influence on the occurrence of UL according to the random forests based on direct observations from the Gaisberg and the Säntis Tower (Fig. 3).Neglecting the information of this driver variable reduces the diagnostic performance most.The second and third most important variables are the maximum updraft velocity and convective available potential energy (CAPE), respectively.Statistically summarizing the three most important variables shows that CAPE is both at the Säntis Tower and at the Gaisberg Tower rather low, when UL occurs (median value of 68 J kg −1 ).Convective precipitation comes with a median value of 3.8 mm and maximum vertical updraft velocity with a median of − 1.5 m s −1 .All values are larger in magnitude than on "average" when considering every single hour in the considered time range.However, in comparison to situations with deep convection, the order of magnitude is not exceptionally high as may be observed with deep convection in which particularly the CAPE values are commonly higher than 500 J kg −1 .An important reason for this might be that at the instrumented towers, UL occurs approximately equally distributed throughout the year whereas intense thunderstorms with deep convection and high CAPE values occur mainly in the summer season.Further this might suggest that for UL to occur, requires a combination of many different processes interacting to form favorable conditions for UL which might be even more complex than providing favorable conditions for deep convection.
Other important variables are the maximum precipitation rate, the vertical size of the thundercloud, the amount of ice crystals and solid hydrometeors as well as the 2 m dewpoint temperature are influential.

UL at wind turbines
The extraction of wind turbine locations and identification of lightning strikes to them within 300 m in the colder season (ONDJFMA) shows that there are regions within the study domain that experience UL more frequently than others (see Fig. 4).
Interestingly, regions which are more often affected by UL (panel (b), dark pink) coincide with regions with many wind turbines.However, in general it can be observed that regions with a high number of wind turbines (panel (a), dark green) do not necessarily coincide with a high number of UL as can be seen in the North-Eastern parts of the study domain, for instance.
The following sections present and discuss the results when extending the findings from the instrumented towers to the study domain in which wind turbine locations are extracted and the lightning activity to them is analyzed.

Diagnosing UL-LLS at wind turbines from larger-scale meteorological conditions
The random forest models for UL-LLS and UL-LLS + UL-noLLS based on data from the two instrumented towers identified larger-scale meteorological variables which are most important distinguishing situations with and without UL.Now, the tower-trained random forest models are applied to the larger study domain to assess the risk of UL at wind turbines.
Lightning measurements from LLS data shall verify the results at identified wind turbine locations.
The following results are based on a similar procedure as described in Sect.3.2 except that each grid cell ( 31 km x 31 km ) of the study domain is used as test data instead of the cross-validated data from the instrumented towers.In the following, the tower-trained random forest models are applied to each grid cell of the study domain.To increase the robustness of the results, again 100 different random forest models based on observations from the Gaisberg and the Säntis Tower are used to diagnose the conditional probability of UL on the selected case studies over the study domain.The results in this section visualize the median conditional probabilities diagnosed by the random forest models.

Case studies: UL-LLS at wind turbines
To illustrate the diagnostic ability of the tower-trained random forests for UL-LLS on days with UL events, three different case study days are selected out of colder seasons between 2018 and 2020 in the study domain.The selected case study days are characterized by typical weather situations for the colder seasons in the mid-latitudes.The atmosphere in the transition seasons and in winter tends to be highly variable and influenced by the succession of cyclones and anticyclones determining the meteorological setting (Perry, 1987).
In particular the development and progression of mid-latitude cyclones provides favorable conditions for so-called windfield thunderstorms (Morgenstern et al., 2022).This thunderstorm type is among others associated with strong updrafts, high amounts of precipitation as well as low but present CAPE.
The first case study is considered in more detail regarding the drivers identified at the instrumented towers (Fig. 3). Figure 5 illustrates the larger scale isotherm locations, the spatial distribution of convective precipitation, the maximum updraft velocity and CAPE on the 4th March 2019 at 13 UTC and 14 UTC.LLS detected lightning events to the identified wind turbines within the particular hour are indicated as dark gray dots.Figure 6 visualizes the diagnosed conditional probability by the random forest models in red colors for all three case study days.Panels (a) and (b) show the results for the particular case study discussed in Fig. 5.The diagnosed pattern is a result of combining the influence of the three driver variables.This suggests that not a single variable can be responsible for the resulting probability map but it is rather an interaction of different influential variables yielding areas with increased risk to experience UL.
The yellow symbols again show lightning strikes over the considered hour.Identified lightning events in yellow require a wind turbine within a distance of maximum 300 m as described in Sect. 2. All other tall structures that might have experienced UL are not considered in this figure.Therefore, the diagnosed probabilities do not depend on wind turbine locations meaning that high probabilities might be diagnosed even though there is no wind turbine installed.Grid cells without any wind turbines are shaded in gray.
All three case study days in Fig. 6 show that areas with increased diagnosed probability for UL to occur coincide well with identified lightning events in the respective hour over the study domain.In all three case studies there is a clear separation between areas with very low diagnosed risk and areas with very high diagnosed risk to experience UL.
On the 11th February 2020 shown in panel (c) and (d) of Fig. 6, the study domain is in strong westerly flow again associated with locally increased convective precipitation, CAPE and strong updrafts (not shown here).On the 17th February 2020, the study domain is crossed by a cold front in higher altitudes (above 500 hPa).Regardless of the different meteorological situation, the conditions are again similar to the other case studies showing increased values in the three driver variables that highly influence the diagnosed conditional probability.

Risk assessment of UL at wind turbines
Identifying areas with increased risk of UL due to larger-scale meteorological conditions is a valuable step towards the risk assessment of lightning at wind turbines.The case studies clearly demonstrate that observed lightning at wind turbines coincide with areas of increased probability diagnosed by the random forest models.The following analysis considers all events within the considered period of time in which lightning at wind turbines was identified.Not only the models for UL-LLS shall provide a risk assessment, but now random forests for UL-LLS + UL-noLLS are additionally applied to the study domain and the considered time period.
The considered study period including the transition seasons and winter from 2018 to 2020 counts in total 185 event days with 1 027 single flash hours and 18 602 single flash events.These numbers shall be a measure to verify the resulting diagnosed probabilities by the random forest models.Note that these numbers are the lower limit of actually occurred flashes.Considering the uncertainty of manually identifying flashes at wind turbines as well as the uncertainty of detecting UL by the LLS suggests a significantly larger number of actually occurred lightning events at wind turbines.Further, this verification approach exclusively considers lightning at wind turbines and neglects all other tall structures such as radio towers in the study domain that might be affected by UL.In the following, all days within the considered study period are taken as new data for the random forest models to diagnose the conditional probabilities on hourly basis.
The objective is to identify regions that are more frequently affected by a higher risk of UL compared to other regions according to the random forest models.For this purpose the number of hours in each ERA5 grid cell ( 0.25 °x 0.25 °) that exceeds the conditional probability threshold of 0.5 is counted.
Risk assessment of UL-LLS at wind turbines Figure 7 (a) illustrates that there are regions in the considered study domain having a higher risk to experience UL-LLS more often than other regions.The western and southwestern parts of the study domain have a considerably higher probability for UL-LLS.This is also in agreement with panel (b) in Fig. 4 showing the actually observed hours in which at least one lightning event to a wind turbine occurred within the respective grid cell.
Interestingly, areas with higher UL-LLS probabilities in Fig. 7 roughly coincide with regions of elevated topography in the southern third of the domain (cf.Fig. 1).Possible explanations are an increased lightning-effective height (e.g., Shindo, 2018) of the turbines and increased chances for thunderstorm formation through orographic lifting and thermally-induced breezes (Kirshbaum et al., 2018).Sea breezes might also be an explanation for the higher probabilities in the northwesternmost, seacovered part of the domain.
The successful transfer of the UL-LLS model trained with meteorological data on direct tower measurements to a larger region and its independent verification on wind turbines shows the potential of our approach to be able to produce regionally varying risk maps, which might in turn lead to regionally varying (voluntary or enforced) lightning protection standards for wind turbines.

Risk assessment of UL-LLS + UL-noLLS at wind turbines
The successful transfer of the tower-trained and verified UL-LLS model to a larger domain lends credence to taking the same step with the tower-trained model for all upward lightning (UL-LLS and UL-noLLS) although no data exist for an independent verification.
Panel (b) in Fig. 7 indicates that more flashes are expected when additionally accounting for the LLS-non detectable UL flash type.The pattern of areas with increased risk to experience UL are similar even though some areas affected more often are enlarged.From this it can be suggested that there are similar mechanisms that result from larger-scale meteorology leading to the UL-LLS or UL-noLLS flash types.
The risk in regions with elevated topography in the southern part of the domain and in the coastal northwesternmost region is most pronouncedly increased.

Conclusions
Upward lightning (UL) initiating at tall structures such as wind turbines is much more destructive than downward lightning (DL).Each UL flash starts with an initial continuous current (ICC) lasting about ten times longer than in DL transferring much more charge to the tall structure.Further, direct upward lightning measurements suggest that less than 50 % of UL events can be detected by most lightning location systems (LLS) since they are not able to spot UL with only an ICC.
UL directly measured at the instrumented tower at Gaisberg has little seasonal variation.However, current lightning protection standards are based on the annual flash density derived from LLS data which is clearly dominated by DL in the warm season.UL-noLLS is completely neglected and UL in the cold season is highly underestimated.Basic knowledge about the occurrence of UL is still incomplete impeding a proper risk assessment of UL at wind turbines.
The missing consideration of UL-noLLS and of the importance of the cold season for UL will therefore considerably underestimate the risk of UL to wind turbines.This study leverages rare direct UL measurements with larger-scale meteorological data in a machine learning model in order to estimate the risk of all UL (UL-LLS and UL-noLLS) at wind turbines.
The first step constitutes training and validating two different random forest models based on long-term observations from two specially instrumented towers.One model accounts only for UL-LLS and one model accounts for UL-LLS + UL-noLLS.
The model input data are direct UL measurements from the Gaisberg Tower (Austria, 2000(Austria, -2015) ) and from the Säntis Tower (Switzerland, 2010(Switzerland, -2017)).While the sensor at the Gaisberg Tower measures also UL-noLLS, the sensor at the Säntis Tower misses most of them.
In a second step, the random forest models are extended to a larger study domain (50°N -54°N and 6°E -16°E) to identify areas with increased risk of UL in the colder season (ONDJFMA).As a verification, all lightning strikes at wind turbines in this domain are extracted from LLS and OpenStreetMap data and compared to the diagnosed probabilities by the random forests.Results show that UL can be reliably diagnosed by the tower-trained random forest models at the Gaisberg and Säntis Tower.
The larger-scale meteorological drivers are large amounts of (convective) precipitation, strong vertical updraft velocities and slightly increased CAPE.Further, the vertical extent of the cloud as well as the amount of ice crystals and solid hydrometeors are important variables.
The extension of the random forests to a larger domain shows that probability maps coincide with observed lightning strikes at wind turbines.Extending models trained at the Gaisberg Tower including UL-noLLS flashes reveals that areas with increased risk to experience UL are expected to experience UL even more often.
The western and southern part of the domain in North-West Germany with elevated topography and the coastal region in its northwesternmost part are most at risk of UL at wind turbines.This study demonstrates that direct UL measurements at an instrumented tower can be reliably modeled from larger-scale meteorological conditions in a machine learning model (random forest).The study also proposes a novel way how the transfer of that model to a larger region can be justified by using UL-LLS data at wind turbine locations.Consequently regionally detailed risk maps of UL at wind turbines can be produced.Table A1.Table of large-scale variables taken from ERA5 and variables derived from ERA5.The derived variables (indicated in italics) are suggested to be potentially important in the charging process of a thundercloud or for the development of convection.

Figure 1 .
Figure 1.Geographic overview of the instrumented tower locations (Gaisberg and Säntis) as well as the study domain (box).Green dots are manually identified wind turbine locations based on © OpenStreetMap 2020.Right: topographic map of study domain showing altitude above mean sea level.Data taken from Shuttle Radar Topography Mission(Farr and Kobrick, 2000).
Between 2000 and 2015, the Gaisberg Tower experienced 247 unique days with UL events.Between 2010 and 2017, the Säntis Tower experienced 186 unique days.Combining UL days from both towers yields 406 unique days with UL.Each training input data leaves out one of the 247 (406) days with UL to use it as test data.This is repeated until each of the 247 (406) days has been left out once for training the random forest models.This results in 247 (406) different models trained on equal numbers of situations with and without UL.

Figure 2 .
Figure2.Distributions of diagnosed conditional probabilities in situations with or without UL events.Left: conditional UL probability given that UL was observed in the particular minute (true positive) based on Gaisberg data including all subtypes of UL.Center: conditional UL probability given that UL was observed in the particular minute based on Gaisberg and Säntis data combined.Right: conditional UL probability on randomly sampled days without UL events (false positive).

Figure 3 .
Figure 3. Median permutation variable importance according to 100 different random forests based on balanced proportions of situations with and without UL at the Gaisberg and Säntis Tower.

Figure 4 .
Figure 4. Panel (a): number of wind turbines per grid cell derived from © OpenStreetMap 2020 data.Panel (b): number of hours per grid cell with lightning at wind turbines derived from EUCLID data.

Figure 5 .
Figure 5. Larger-scale meteorological setting on the 4th March 2019 over the study domain.Left column illustrates the setting at 13 UTC, right column at 14 UTC.From upper to lower: spatial distributions of isolines of the 850 hPa temperature (in intervals of 1 K), convective precipitation, the maximum large-scale updraft velocity (negative values is upward motion) and CAPE.Darker colors indicates higher magnitude.Dark gray dots in all figures are flashes within the considered hour and ERA5 grid cell derived from LLS EUCLID data.

Figure 6 .
Figure 6.Median diagnosed conditional probability of UL according to 100 random forest models based on Gaisberg and Säntis Tower data (red areas).Yellow symbols are flashes within the considered hour derived from EUCLID data.Gray shaded areas are grid cells without wind turbines.
The meteorological setting is determined by the passage of a cold front ahead of a trough around noon.Densely packed isotherms at 850 hPa crossing Northern and Central Germany from West to East indicate the approximate location of the cold front in panels (a) and (b).The cold front implies locally enhanced amounts of convective precipitation in (c) and (d), strong updrafts indicated by large negative values in (e) and (f) and slightly increased but in general low CAPE in (g) and (h) in comparison to deep convection in summer.All three variables show maximum increased values in slightly different areas within the study domain induced by the cold front.Convective precipitation shows increased values along the cold front, whereas the other two variables have locally more concentrated areas with maximum values (e.g., maximum updraft velocity in North/Central Germany).

Figure 7 .
Figure 7. Panels (a) and (b): potential maps for UL in the colder season (ONDJFMA) from 2018 to 2020.Orange colors are median of hours per grid cell exceeding conditional probabilities of 0.5 according to 100 random forest models.Panel (a) shows results according to models based on Gaisberg and Säntis data combined.Panel (b) shows results according to models based on Gaisberg data also including the UL-noLLS.Relative proportion of in total 12480 hours are given as reference.

Figure A1 .
Figure A1.Panels (a) and (b): maps for the potential of UL in the colder season (ONDJFMA) from 2018 to 2020.Orange colors are median of hours per grid cell exceeding conditional probabilities of 0.8 according to 100 random forest models.Panel (a) shows results according to models based on Gaisberg and Säntis data combined.Panel (b) shows results according to models based on Gaisberg data also including the UL-noLLS.Relative proportions of in total 12480 hours are given as reference.
m and cloud base m s −1