##### 4.1.1. Correlation and Regression Tree Analysis

[37] An early ominous finding was low correlations between each of the 33 predictors (Table 1) and the predictand (maximum interval). This was manifest by the difficulty of each scheme to predict the outlier values of maximum interval (e.g., Figure 4). The correlation and regression tree analysis (CART) [*Brieman et al.*, 1984; *Venables and Ripley*, 1997; *Burrows et al.*, 2004] will be described first. The first step was to create the tree using a recursive splitting of nodes (i.e., decision points). The nodes are created based on predictors in the dependent data set, and at each node, additional subnodes, or children nodes, then are created. As each node is created, SPSS determines whether the node terminates to provide a final value for the predictand (maximum interval). At this point, the decision tree likely has overfit the data set. This leads to a “pruning” process in which simpler trees are created by removing nodes of lesser importance. From the set of “pruned” trees, an optimal tree is selected that best describes the dependent data set, while not overfitting the data.

[38] The simplicity of this scheme is a positive trait for use in an operational setting since it does not require interpretation of a multivariate regression model. Unfortunately, although CART would be easy to implement operationally, it only provided 56% accuracy and very little precision (Table 2). The actual decision tree (Figure 7) shows the cause of the poor accuracy. Of the seven termination nodes giving the forecast maximum interval, only one is longer than 10 min. This automatically causes CART to miss the three longest maximum intervals in the 16 independent storms. With this bias toward the shorter and more numerous maximum intervals, the CART scheme does not provide a safe cessation forecast.

[39] The CART analysis is heavily weighted toward the time delay between the last CG strike and the last (and final) IC flash of a storm since three of the first four decision nodes use this predictor (Figure 7). This choice specifically addresses characteristics of individual storms. Two of the final three predictors, shear through 6 km and the best lifted index, are environmental parameters, suggesting that the shear that can tilt an updraft and the available instability contribute to lightning activity. The final node, storm duration, is another storm specific parameter that appears in several of the schemes discussed later. It is unfortunate that the optimal decision tree (Figure 7) includes a predictor (time delay between last CG and last IC) that is poorly correlated to the maximum interval (R is −0.1). However, none of the other predictors yielded better results. Of the remaining predictors, instantaneous storm duration, had the best correlation of R = 0.21. Operationally, use of this predictor would require CART to be rerun as the storm persists. Although the CART analysis hints that individual storm predictors are the most effective to use, our scheme was hampered by the predictors available in real time being poorly correlated with the predictand (maximum interval).

[40] The decision tree shown in Figure 7 was the best of several that were developed. The tree creation process was repeated numerous times to help determine what parameters would produce the best decision tree. They included using different numbers of termination nodes as well as how easily a node could split into additional children nodes. The low correlation of the predictors to the predictand limited CART's versatility. While many variations were attempted, none were accurate or effective.

##### 4.1.2. Multiple Linear Regression Schemes (Sounding Only Regression, Sounding and Storm Regression, and Experimental Regression)

[41] Three variants of multiple linear regression [*Chambers and Hastie*, 1992; *Gardner et al.*, 1995; *Wilks*, 2006] were used to select the best combination of predictors for our cessation schemes. These were the experimental regression (ER), sounding only regression (SOR), and sounding and storm regression (SSR). SOR and SSR were developed with the data available to the 45WS forecasters in real time. ER was developed as a “what if” scheme to observe the effect of including parameters not available in real time, such as intracloud flash rate and initiation altitude. Their inclusion attempts to include some information about storm dynamics.

[42] The SPSS software uses a “forward conditional” stepwise selection process with a test for backward elimination in developing the regression model. The first predictor variable selected produces the greatest reduction in the residual sum of squares (or residual deviance (RD)), i.e., the predictor that explains the most variation in the maximum interval. The algorithm next selects the predictor that, together with the first, further reduces the RD by the greatest amount. At each step, the algorithm performs a backward check to determine if the additional predictor causes any previously selected predictor to become insignificant. If this occurs, that predictor is removed. This process continues until the RD can no longer be reduced by a significant amount, or until no predictors remain.

[43] Multiple regression schemes were created for each of the three variants. These schemes were based on adjusting the *p* value threshold for determining which predictors were chosen for the equations as well as the *p* value threshold for determining when a predictor should be removed from the regression model. The *p* value for allowing a predictor to be chosen varied from 0.1 to 0.4 in increments of 0.05. Additionally, the *p* value for discarding a predictor varied from 0.15 to 0.45. Optimally, the *p* values should be relatively small, indicating strong choices for the regression model. However, with our data set, the regression variants performed best under less stringent conditions. SOR had the least strict values of 0.40 and 0.45, although this was partly expected due to the sounding only parameters likely having little relevance to lightning cessation later in the day. The SSR and ER variants were better constrained with 0.25 and 0.30 used for the *p* value thresholds. The discussions below of the three multiple linear regression variants describe the best versions of each scheme. ER is the worst of all the regression variants that were tested. Its equation (1) is given below with the following definitions. The average interflash time is the median of all times between each flash, storm duration is the length of time between the first and last flash in the storm, and the midlevel height is 7–9 km for the number of midlevel LDAR sources.

Table 2 shows a poor 44% accuracy and an R^{2} value of 0.54. However, ER does produce excellent precision. The median error is only 0.1 min less than the maximum interval. Although ER ends the advisories early, its high precision in forecasting the maximum interval is a positive trait. This suggests that if the non-real-time data could be acquired, efforts should be made to refine this version of our multiple linear regression schemes. Also, ER, the percentile method, and event time trend (both described later), are the only schemes to correctly forecast at least one of the three outlier maximum intervals, while the percentile method is the only scheme to forecast all of the outliers.

[44] The remaining two multiple linear regression schemes are SOR and SSR, given by equations (2) and (3), respectively, where CCL is the convective condensation level.

SOR only uses prestorm environmental parameters derived from the morning KXMR sounding. Our objective was to determine if the prestorm environment alone could provide information about how long lightning persists in a thunderstorm. SSR is similar to SOR, except that any of the parameters available to forecasters in real time (Table 1) could be selected. Aside from the possibility to select different predictors, the development of SSR was identical to SOR.

[45] It is useful to discuss the predictors selected for the above three equations. The first term on the right side of the ER equation, the average interflash time, explains the most variance. Although Figure 3 showed a class of storms with a sudden end to lightning activity (dashed line), only 11 of our 116 storms exhibit this trend. These 11 storms require the use of maximum interval to ensure safety. This suggests that, should it be available in real-time, average interflash time may be able to account for the three trends seen in Figure 3 due to ER's high precision. It is interesting that while ER could select any predictor, including those not available in real time, the average interflash rate was the only non-real-time predictor chosen. Although ER's accuracy is poor, this single predictor markedly improves ER's precision compared to the other schemes (Table 2). This suggests that the average interflash rate might serve as a crude indicator of a storm's dynamical and microphysical processes.

[46] Storm duration explains the second most variance in the ER and SSR approaches. It even becomes the lynch pin of a separate scheme discussed later. Storm duration gives insight into whether a storm is a short-lived “pulse” storm or part of a multicellular structure or associated with a charged anvil cloud. Storm duration is an instantaneous variable, changing as the storm produces more lightning.

[47] Two parameters shared by the ER and SSR schemes are closely related, the number of midlevel sources (ER) and the average LDAR source height (ER and SSR). The altitude of lightning sources is related to the strength of the storm's updraft. More sources at higher altitudes suggest a vigorous updraft and therefore a storm that is still intensifying or in the mature stage. Thus, cessation is unlikely when altitude values are high. Alternatively, high-altitude sources can come from anvil lightning, but this can be discerned from lightning in a convective cell by using radar observations.

[48] It is interesting to note that both ER and SSR include a stability parameter, MUCAPE and best lifted index, whereas SOR does not. SOR uses less direct measures related to stability, the convective condensation level (CCL) and the height of the −40°C isotherm. ER and SSR, which have more dynamic predictors, can “afford” to include a less useful stability parameter. Strong instability leads to stronger updrafts and lightning activity, but is less useful for cessation.

[49] SOR and SSR both share the shear predictor through 6 km, while no similar parameter exists in ER, possibly due to similar information being embedded within the average interflash time predictor. Although 6 km shear ranks last in both SOR and SSR, it is a reasonable choice. With the appropriate amount of shear, a storm can develop a tilted updraft that will not “rain out” as quickly. This provides a more intense updraft and a better opportunity for the storm to rise above the freezing level with more hydrometeors available for charging. The fact that SSR contains the maximum vertically integrated liquid predictor supports this hypothesis. Additionally, SOR curiously selects the wind direction predictor. This may occur because storm development in central Florida is governed by the sea breeze during the warm season months. An easterly (onshore) wind at CCAFS/KSC generally leads to weaker storms, while a westerly wind enhances the east coast sea breeze front and the probability of stronger thunderstorm development in the area [*Arritt*, 1993].

[50] Finally, ER and SSR share one last similar predictor, CG rate and total CG, respectively. These parameters were not expected to be selected since it was assumed that the LDAR observations would provide more information.

[51] To our surprise, SOR and SSR tie in accuracy (Table 2), both yielding 75%. However, it is no surprise that SOR yields poor results since there was little expectation that the prestorm environment would provide much information about cessation within a specific, future storm. We had expected that allowing SSR to select any of the candidate predictors would provide improved forecasts. When comparing the underforecast and overforecast errors of both regression variants, neither is promising. SSR only improves SOR's R^{2} value, 0.295 versus 0.08. These values suggest that neither scheme can produce safe forecasts of lightning cessation.

[52] The results indicate that all of the predictors in Table 1, while being the best available, are poorly correlated to the maximum interval. Thus, no combination of predictors, whether in the prestorm environment or during the storm, has a significant chance of safely predicting lightning cessation. Too much important information, such as microphysical activity, is not available. This may explain why ER yields greater precision since its selection of average interflash time may parameterize this information in some way. In summary, the parameters available for the regression schemes are not sufficient for forecasting cessation.

##### 4.1.3. Event Time Trend

[53] Given the poor performance of the CART analysis and all three regression schemes, we devised several other methods as described in section 3. The event time trend (ETT) was developed because storm duration (time from the first to last lightning activity) was selected in the CART, ER, and SSR procedures. Operationally, a forecaster would have to update the forecast as the storm persisted over longer times. Several trend lines relating storm duration to maximum interval were developed (not shown). Equation (4) describes the most successful version, where durations are given in minutes.

In spite of its simplicity, ETT produces 81% accuracy. Also, ETT, along with ER and PM, are the only procedures that correctly predict one of the three outlier maximum interval events in the independent data set. The storm duration predictor in (4) provides some insight into the nature of the storm, including broad assumptions about its microphysical structure. Small durations are associated with short-lived pulse storms with a brief charging period, while long durations indicate multicellular storms with greater charging or storms with a long-lived, charged anvil. ETT initially appears to be the most balanced between accuracy and precision. That is, it gives forecasters a modest level of confidence that cessation has occurred, while simultaneously providing some precision by not greatly overforecasting the maximum interval. This is somewhat misleading since ETT produces a few large underforecast errors that counteract larger overforecast errors. The underforecast errors are partly explained by a scatterplot between storm duration and maximum interval (not shown) that shows storm duration to be poorly correlated with maximum interval. Although ETT successfully predicts one outlier, most of its success is due to the majority of maximum intervals being small.

##### 4.1.4. Lag Time From the Storm's Maximum Height of the Maximum dBZ to the Last Flash (MZM)

[54] Since the three maximum interval schemes described above fail to predict all three outliers, we reconsidered our decision to use it as the predictand. This reconsideration led to the maximum height of the greatest dBZ (MZM) scheme which developed a cubic relation between the maximum height of the storm's maximum dBZ value versus the time to the last flash (Figure 8, equation (5)). MZM is the only scheme that explicitly attempts to forecast a storm's last flash. It utilizes the time delay from when the greatest reflectivity core reaches its highest altitude to the time of the last flash. Equation (5) describes the period of time to wait for additional lightning to occur after the most recent flash. A drawback to MZM is that it must be recalculated when the greatest dBZ value or its height changes, much like ETT with the instantaneous storm duration. Similar approaches were attempted using the number of CG strikes, IC flash rate, and the percentage of CG to IC flashes (not shown). However, none of the individual parameters had the forecast utility of MZM.

[55] A perceived advantage of the MZM scheme is that it attempts to include storm dynamics by utilizing radar data. Relatively intense storms have stronger updrafts [e.g., *Byers and Braham*, 1949], and considerable previous research has studied updraft characteristics, including their size [*Auer and Marwitz*, 1968], vertical velocity [*Battan and Theiss*, 1970; *Marwitz*, 1973; *LeMone and Zipser*, 1980; *Xu and Randall*, 2001], and temperature [*Davies-Jones and Henderson*, 1973]. Furthermore, the electrification process is linked to the storm's updraft [*Gunn*, 1956; *Paluch and Sartor*, 1973; *Stolzenburg et al.*, 1998]. These studies indicate that a storm's updraft is an integral factor in producing lightning and that much of the lightning is contained within this core region [*Carey and Rutledge*, 1998]. We hoped that using the lag time approach would address the problem of forecasting the greatest maximum interval storms (the outliers). Since lag time is unrelated to maximum interval, MZM might be able to discern differences between the independent storms (Table 2).

[56] The MZM scheme is partially successful (Table 2). Its accuracy of 88% makes it the second most accurate scheme. Its underforecast error also is small, with the median error only 3.6 min, making it one of the most precise schemes. The largest underforecast error is only 6.5 min, which is half that of the next closest scheme, aside from the Percentile Method described next. The trade-off for good accuracy (i.e., correctly ending an advisory at or after cessation) and small underforecast error is a large median overforecast time of 12 min and the greatest overall overforecast of 44 min (Table 2). Thus, the MZM scheme provides high confidence that an advisory will be canceled safely after lightning cessation. However, the time savings over current schemes is minimal.

##### 4.1.5. Percentile Method (PM)

[57] The final scheme tested, the Percentile Method (PM), also is the simplest. A scatterplot of maximum intervals for the 100 dependent storms was prepared and then divided into percentiles (Figure 9). The 16 independent storms then were verified against these percentile values. Figure 9 clearly shows why the previous schemes poorly forecast the outlier events. Since most of the dependent storms had a maximum interval less than 10 min, the schemes were skewed to underpredict the outliers.

[58] When applied to the 16 independent storms, the 99.5% percentile is the most successful, always ending lightning advisories after cessation, and not before. The 95th percentile version performs fairly well, with an accuracy of 88%. The major limitation of PM is that it produces very large time errors. The 99.5 percentile version has a median forecast error of 21.2 min, i.e., waiting 21.2 min too long to end an advisory. However, PM is the only method to correctly wait for cessation to occur in all 16 independent storms, including the outliers.

[59] PM's large overforecast errors are not desirable, but the scheme does have several desirable characteristics. First, it is simple to implement. It does not require monitoring individual storms to obtain input for an equation, and it does not have to be recalculated as the storm evolves in time (e.g., ETT and MZM). The forecaster simply selects which percentile to use and applies it to all storms.

[60] This leads directly to another excellent trait, flexibility. PM can be adjusted for risk. If the safety risk is not stringent (i.e., when people outdoors are not involved), a lower percentile can be used to reduce the length of lightning advisories. However, when personnel safety is involved, the 99.5 percentile version offers excellent confidence that lightning has ended. Additionally, since the scheme uses one value every time without calculations, it can be used during morning planning for afternoon activities. Lastly, more storms can easily be added in the future to create a more robust data set.