### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Results
- 4. Discussion
- 5. Conclusions
- Acknowledgments
- References
- Supporting Information

[1] Although the generation and propagation mechanisms for whistlers are fairly well understood, the location and extent of the lightning source region for the whistlers observed at a given station are currently unknown. The correlation of whistler observations against global lightning data allows an estimate of the size and position of the source region. For whistlers detected at Tihany, Hungary, an area of positive correlation with radius of ∼1000 km was found to be centered on the conjugate point. Although the maximal sample correlation coefficient was relatively low, *r* = 0.065, it has a high statistical significance, indicating that it is extremely improbable that the whistlers and lightning in this region are actually uncorrelated. Other smaller areas of positive correlation were found further afield in South America and the Maritime Continent. Lightning in the northern hemisphere displayed a negative correlation with whistlers at Tihany.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Results
- 4. Discussion
- 5. Conclusions
- Acknowledgments
- References
- Supporting Information

[2] Whistlers are dispersed Very Low Frequency (VLF) emissions observed both on the ground and in space. The mechanism expounded in the pioneering work of *Storey* [1953] is generally acknowledged to account for their production. In this classical model, a whistler is initiated by a lightning stroke, which produces an intense pulse of electromagnetic radiation with broad spectral content, brief duration and peak power ∼20 GW, an appreciable fraction of which lies in the VLF range. Within the Earth-ionosphere waveguide this pulse, or sferic, is not significantly dispersed and can travel considerable distances with low attenuation ∼1 dB/Mm [*Bernstein et al.*, 1974]. However, some portion of the initial energy may penetrate upward through the ionosphere and enter the magnetosphere. In the inhomogeneous and anisotropic magnetospheric plasma the waves propagate along magnetic field lines in the whistler mode. Since the whistler mode is dispersive, the propagation delay varies with frequency, transforming the initial pulse into a complex tone with a unique frequency-time signature determined by the magnetic field strength and plasma density along the path traversed through the magnetosphere. An example of a whistler spectrogram is presented in Figure 1.

[3] The morphology of whistlers varies appreciably with latitude. Whereas at *L* ≲ 2.4 whistlers are generally observed as isolated traces, corresponding to propagation along a single path through the magnetosphere, at higher latitudes whistlers are commonly multipath, where a single lightning stroke produces a signal which propagates along multiple magnetospheric paths. The whistlers considered in this study conform to the former category.

#### 1.1. Ducting

[4] Because of anisotropy in the whistler mode dispersion relation, the passage of the waves through the magnetosphere is roughly aligned with the magnetic field. If, in addition, the waves are trapped within a field-aligned plasma density irregularity, or duct, the wave normal direction is more strongly confined to the magnetic field. While in the duct the signal should also be amplified by wave-particle interactions [*Brice*, 1960; *Liemohn*, 1967; *Pasmanik et al.*, 2002]. There may also be unducted portions of the propagation path through the ionosphere and magnetosphere. It is thought that all whistlers observed on the ground are ducted.

[5] When a whistler is received on the ground it has completed a journey with at least three major components: (1) sub-ionospheric propagation from the source to the foot point of the duct, (2) field-aligned propagation to the opposite hemisphere, and (3) sub-ionospheric propagation from the ionospheric duct exit point at the conjugate foot point to the receiver.

[6] Because the refractive index of the neutral atmosphere, *μ* ∼ 1, is much lower than that in the plasma medium, *μ* ≫ 1, waves entering the ionosphere from below are refracted into a transmission cone defined by wave normal directions which lie close to the vertical. In order for a whistler to become trapped in a duct, the refracted wave normal must lie within the trapping cone, which is symmetric around the magnetic field direction [*Helliwell*, 1965]. At low latitudes there is little or no overlap between the transmission and trapping cones. As a consequence, whistlers are most commonly observed at middle to high geomagnetic latitudes, where the magnetic field has a large inclination, the field lines are approximately vertical and the transmission and trapping cones intersect. The intensity of the magnetic field regulates the width of the transmission cone and therefore also has an influence on the degree of overlap. Closer to the poles, where the magnetic field strength at ionospheric altitudes is higher, the whistler mode refractive index in the ionosphere is lower and the transmission cone is broader. Regions of depleted magnetic field strength are thus not favorable locations for the foot point of a duct.

[7] Only those waves incident upon the ionosphere from above with small wave normal angle with respect to the vertical are able to refract from the plasma medium to the neutral atmosphere. All other waves are reflected back into the magnetosphere. Thus the most favorable conditions for the whistlers to penetrate to the ground also occur at higher latitudes.

[8] Ducted whistlers retain the smallest wave normal angles and it is thus generally believed that almost all whistlers which enter into the waveguide at the conjugate point are ducted. In the absence of a duct the waves are magnetospherically reflected and may undergo numerous reflections, possibly forming a uniform band of wave energy [*Sonwalkar and Inan*, 1989]. Since whistler mode waves interact with the ambient magnetospheric plasma, causing electrons to be scattered into the loss cone, these magnetospherically reflected whistlers play a significant role in regulating the population of the radiation belts [*Lauben et al.*, 1999; *Johnson et al.*, 1999; *Rodger et al.*, 2004].

[9] Access to a duct need not necessarily occur at its base, as it is feasible for a whistler to be trapped after leaking in through the side of the duct [*Strangeways and Rycroft*, 1980]. The termination altitude of the duct strongly influences the proportion of wave energy which is trapped through either its side or base, where the side is favored for ducts extending to lower altitudes [*Strangeways*, 1981]. Similarly, waves may leak out of a duct when their wavelength becomes comparable to the width of the duct, and those which exit on the lower latitude side of the duct may still have near-perpendicular incidence on the ionosphere [*Strangeways*, 1986].

[10] Although the majority of lightning is confined to the tropics and sub-tropics [*Christian et al.*, 2003], these regions do not play a major role in generating whistlers. Whistler observations near the geomagnetic equator are rare and thought to arise because of propagation in the waveguide from an ionospheric exit point at higher latitudes [*Koster and Storey*, 1955; *Rao et al.*, 1974; *Helliwell*, 1965]. There is evidence to suggest that a low-latitude cutoff exists for whistlers at around 16° geomagnetic latitude, which may be due to the paucity of ducts at low latitudes [*Rao et al.*, 1974], but might also be due to matching conditions at the ionospheric boundary. *Thorne and Horne* [1994] concluded that VLF signals launched at an invariant latitude below 15° (equivalent to *L* ∼ 1.1) remained trapped in the ionosphere.

#### 1.2. Source Region

[11] The model outlined above is broadly accepted and little subsequent work has been done to verify it in the light of improved equipment and analysis techniques. However, despite the relatively long history of investigations into the whistler phenomenon, some of the details of the chain between the initial lightning discharge and the reception of its dispersed electromagnetic signature are still unclear. The efficiency of transionospheric leakage and transformation to the whistler mode, known to depend on ionospheric conditions and magnetic field inclination, have been examined and it has been established that the majority of lightning strokes generate an upgoing, incipient whistler detectable on Low Earth Orbit (LEO) satellites [e.g., *Hughes*, 1981; *Li et al.*, 1991; *Hughes and Rice*, 1997; *Holzworth et al.*, 1999]. These whistlers are predominantly unducted and the proportion which are ducted to the conjugate hemisphere is currently unknown. There is also rather scarce and inconclusive experimental verification of the typical characteristic of the lightning source region.

[12] The size and location of the effective source region is a source of contention. It is well known that sferics can travel enormous distances in the Earth-ionosphere waveguide with minimal attenuation. Whistlers have been associated with lightning strokes occurring more than 2000 km from the duct footprint [*Weidman et al.*, 1983; *Carpenter and Orville*, 1989; *Li et al.*, 1991; *Clilverd et al.*, 1992; *Holzworth et al.*, 1999]. The lightning source region for whistlers detected at a given location may thus be rather large. Furthermore, it is assumed that this source region is centered on the magnetic conjugate point. Yet this assumption has not been validated. *Yoshino* [1976] observed that the majority of whistlers observed in Sugadaira, Japan, occurred when there was thick cloud cover within ∼500 km southwest of the conjugate point. In contrast, more recently, *Chum et al.* [2006], in an analysis based on isolated pairs of lightning strokes and fractional hop whistlers detected on LEO satellites, found that the point at which the sferic pulse penetrates the ionosphere is ≲1500 km from discharge. Their data included both ducted and non-ducted whistlers, although it is probable that the majority of the whistlers were unducted [*Hughes and Rice*, 1997]. *Collier et al.* [2006] employed a source region with radius 600 km (selected for rather pragmatic reasons: their lightning data did not extend more than ∼600 km south of the conjugate point) in a case study which suggested that whistlers at Tihany, Hungary, are more likely to arise from strokes to the southeast of the conjugate point. Both *Yoshino* [1976] and *Collier et al.* [2006] found that the most effective source region was displaced from the conjugate point toward the magnetic pole.

[13] Since propagation in the waveguide may occur both before and after the signal's passage through the magnetosphere, it is in principle possible for both the causative discharge and the VLF receiver to be significantly distant from the foot points of the guiding magnetic field line. Indeed, the path through the magnetosphere need not have a foot point at either the source or the receiver, but may be displaced with respect to both [*Clilverd et al.*, 1992]. *Allcock and McNeill* [1966] found that transmission loss was minimized for paths aligned along either the magnetic meridian of the source or the receiver. This stands to reason since these configurations result in the minimum loss of power due to the cylindrically symmetric expansion of the wavefront in the waveguide. A path at the magnetic longitude of the source, however, appears to be the most favorable [*Shimakura et al.*, 1987; *Ladwig and Hughes*, 1989]. Furthermore, on the basis of the overlap of transmission and trapping cones, strokes at *L* higher than the duct foot point are most likely to produce whistlers [*Helliwell*, 1965, Figure 3–23].

[14] Since the power of a signal radiating symmetrically in the Earth-ionosphere waveguide is inversely proportional to the distance from the source, it is not unreasonable to suppose that the likelihood of duct excitation by a given lightning discharge is highest in close proximity to the stroke. Consequently one would suppose that a region of limited extent around the conjugate point represents the area in which a whistler's causative lightning stroke is most likely to have been located. Certainly this is a reasonable assumption if there is an appreciable level of lightning activity in the vicinity of the conjugate point. For regions of regular thunderstorm activity, the number of lightning strokes within a reasonably large area around the foot point of a field line far exceeds the number of whistlers recorded near the conjugate point, indicating that lightning is a necessary but not sufficient condition for the generation of a ducted whistler. If, however, the conjugate point is located in a region of infrequent lightning then it is probable that a realistic source region should have considerably greater range. Proximity is thus not the only consideration and there must be other factors involved: the total power radiated and the discharge type (cloud-to-cloud (CC) or cloud-to-ground (CG)) may also play a role.

[15] In some instances it is possible to identify the causative sferic for two-hop whistlers [*Carpenter*, 1959; *Carpenter and Orville*, 1989]. The location of the causative sferic is traditionally consistent with propagation times calculated using the DE–1 diffusive equilibrium model [*Park*, 1972] for the distribution of electron density along the magnetic field lines and the simplified Appleton-Hartree dispersion relation. It has, however, been demonstrated, using a more realistic electron density distribution and an improved dispersion relation, that the calculated nose time may differ by up to 500 ms [*Lichtenberger et al.*, 2008b]. There is also uncertainty regarding the distance that the signal propagated in the waveguide before and after passing through the magnetosphere. Furthermore, the global flash rate of 44 ± 5 s^{−1} [*Christian et al.*, 2003] gives a typical interval of ∼20 ms between sferics, leading to significant ambiguity in the selection of the correct sferic among a number of potential candidates. In addition, because of sub-ionospheric propagation conditions, the actual causative sferic may or may not be visible on a spectrogram. It is thus possible that the causative sferic identified using traditional techniques may not be valid, but simply occurs with serendipitous timing. In either case, in the absence of further information it is almost impossible to uniquely determine the location of the discharge associated with a given whistler.

[16] This paper is an attempt to obtain a better understanding of the positions of causative lightning strokes. The application of whistlers as a remote-sensing tool [e.g., *Carpenter*, 1963] would be greatly enhanced by more accurate knowledge of the initial wave source and the parameters of the path taken through the magnetosphere.

### 3. Results

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Results
- 4. Discussion
- 5. Conclusions
- Acknowledgments
- References
- Supporting Information

[32] The efficacy of a given region of the globe in producing whistlers is assessed by performing a correlation between whistler incidence at Tihany and lightning occurrence within that region. The objective of this analysis is to identify those portions of the globe for which this correlation is high.

[33] The lightning data was first projected onto a 3° by 3° spatial grid. It is possible that the limited location accuracy of WWLLN may influence the statistical analysis of the lightning-whistler correlations, but the use of a relatively coarse grid ensures that the effects of spatial uncertainty are minimized. The whistler and lightning data sets were then prepared for the correlation analysis by dividing the time period between 1 January 2003 and 19 May 2005 into intervals of length Δ*t* = 1 min. The number of events during each interval was then determined. Finally, the event counts were reduced to Boolean values, simply indicating the presence or absence of activity, but not its absolute intensity.

[34] The latitudinal variation in the area of the cells in the lightning grid was not taken into account: cells at higher latitudes, which represent smaller areas, are treated in the same way as equatorial cells. For example, the cells at the conjugate latitude are 16% smaller than those at the equator. This implies that, all else being equal, there would be an increase in lightning counts in cells closer to the equator and a reduced count for higher latitude cells. However, the fact that this analysis is based on the presence or absence of lightning, rather than the absolute lightning count, implies that this does not have a significant influence on the results and the small bias is largely negated by the Boolean treatment of the data.

[35] The selection of an appropriate value for Δ*t* was based on a few considerations: (1) the delay between the causative lightning stroke and the reception of a whistler at Tihany is typically ∼1 s and (2) computational efficiency (reducing Δ*t* results in more time consuming calculations). The choice of Δ*t* should be sufficiently long to minimize the risk of a lightning stroke and the associated whistler being allocated to different time intervals. Yet, Δ*t* should be sufficiently short to reduce the risk of chance coincidence. The hazard of a temporal mismatch might be reduced by applying an offset, equal to the typical whistler propagation time, to either the whistler or lightning time sequence. However, the fact that Δ*t* is substantially longer than the typical whistler propagation time suggests that the probability of a lightning stroke and the resulting whistler being assigned to separate intervals is very small indeed, so that there would be negligible gain from such a transformation.

[36] To illustrate the typical form of the data, a contingency table for the cell centered at 34.5°S 28.5°E, which contains the conjugate point, is presented in Table 1. It is apparent that there is appreciable asymmetry between each of the marginal totals. Applying the Fisher exact test for count data (appropriate for an unbalanced contingency table) to the data in Table 1 yields a vanishingly small *p* value, indicating that the null hypothesis (odds ratio equal to unity) can be readily rejected: there is a significant relationship between lightning and whistler activity, although in the overwhelming majority of intervals there was an absence of either phenomenon.

Table 1. Contingency Table Reflecting the Proportion of Time Intervals for Which Whistlers Were Observed at Tihany and Lightning was Detected in the Cell Centered at 34.5°S 28.5°E. Counts are Normalized Relative to the Total Number of Time Intervals, 1,251,360. | No Lightning | Lightning | Total |
---|

No whistlers | 0.938 | 0.002 | 0.940 |

Whistlers | 0.060 | 0.001 | 0.060 |

Total | 0.997 | 0.003 | 1 |

[38] Figure 5 illustrates the above scheme, comparing the number of observed whistlers to lightning strokes within 600 km of the conjugate point for a 2-h period on 26 February 2004. The histograms reflect counts accumulated in 1 min bins, indicating both total counts (empty bars) as well as a Boolean count (shaded bars). Below the histograms are rug plots indicating the precise epoch of each of the whistler or lightning events. Of the 120 intervals, 48% had whistlers and 78% had lightning. Intervals for which condition (1) was satisfied are indicated by blue diamonds. The condition is true for 56% of the intervals.

[39] It is interesting to note that although the frequency of lightning strokes within the chosen region is roughly constant (a few strokes per 1 min interval) for the full period illustrated in Figure 5, there are sparse whistlers observed during the first hour but many during the second hour. This effect is due to the passage of the day-night terminator, where the latter portion of the data corresponds to night conditions.

[40] Although conceptually revealing, an analysis on the basis of (1) was problematic because no means was readily available to evaluate uncertainties. The relationship between whistler and lighting activity was thus assessed using the Pearson product-moment correlation coefficient, *ρ*, being the ratio of the covariance of the two sequences to the product of their respective standard deviations. The value of *ρ* is an indication of the extent to which variations in two variables occur in concert. The correlation coefficient is defined only if each of the standard deviations is both finite and non-zero. The sample correlation coefficient, *r*, is an unbiased estimator of the population correlation coefficient, *ρ*. The value of *r* lies in the range [−1, +1], where the limits correspond to strong correlation (+1) or anti-correlation (−1). If the variables are independent then the correlation is zero. A vanishingly small value for *r* does not, however, necessarily imply that two quantities are unrelated. Two standardized variables are uncorrelated if the expected value of their product equals the product of their expected values. They are independent if their joint probability distribution function is the product of their individual probability distribution functions.

[41] In general, the correlation between two sequences may be calculated at various lags, where the sequences are shifted relative to each other. However, since this analysis is concerned with the corresponding time intervals in the two sequences, only zero lag is considered. If, however, the interval Δ*t* were made so short that whistlers and the causative lightning strokes frequently occurred in different intervals, then non-zero lags would have to be accounted for.

[42] The correlation between the Boolean sequences illustrated in Figure 5 is 0.18. This statistic, however, applies to only a limited time interval and an extended region around the conjugate point. Figure 6 displays the correlation between the whistler sequence at Tihany and the lightning sequence in each of the grid cells spanning the globe. Data are plotted only for cells in which the correlation is significant at the *α* = 0.01 level. It is apparent that there are regions of non-vanishing correlation, both positive and negative, but that the majority of the cells have values close to zero.

### 4. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Results
- 4. Discussion
- 5. Conclusions
- Acknowledgments
- References
- Supporting Information

[43] A preliminary examination of the data in Figure 6 reveals that the correlation is: (1) generally positive (negative) in the southern (northern) hemisphere; (2) enhanced in a compact region around the conjugate point; (3) relatively high over regions in South America, southern Africa and the Maritime Continent.

[44] There is a distinct transition in the sign of the correlation coefficient across the geographic equator: whereas in the southern hemisphere correlation is predominantly positive, it is principally negative north of the equator. There is no corresponding reversal across the geomagnetic equator. This can be explained by the seasonal difference in lightning activity in the two hemispheres: whistlers in Tihany are most prevalent during the southern hemisphere summer [*Collier et al.*, 2006], which is a time of profuse lightning activity south of the equator, but little lightning in the northern hemisphere. The situation is reversed during the southern hemisphere winter. The occurrence of Tihany whistlers is thus in phase with seasonal lightning activity in the southern hemisphere, but in antiphase with the northern hemisphere lightning season.

[45] The spatially coherent region of positive correlation is centered on the conjugate point and extends to a radius of approximately 1000 km. This indicates that the majority of whistlers are well correlated with lightning activity within a few hundred km of the conjugate point. This region should encompass most of the causative lightning discharges and is thus interpreted as the principal source region for whistlers at Tihany. This finding concurs with the mechanism outlined by *Storey* [1953], but places a reasonable upper bound on the extent of the principal source region.

[46] The extended regions of relatively large positive correlation in tropical South America, southern Africa and over the Maritime Continent might initially appear to be artifacts of the analysis procedure, since it seems somewhat unlikely that whistlers are triggered by lightning strokes occurring at such great distances from the foot point of the magnetic field line. However, sferics may travel enormous distances in the waveguide before they become too attenuated to generate detectable whistlers [*Li et al.*, 1991; *Holzworth et al.*, 1999; *Meredith et al.*, 2006; *Chum et al.*, 2006]. The strokes identified by WWLLN, and hence those considered in this analysis, are those with an above average intensity, so it is not implausible that they radiate sufficient energy to trigger a whistler at distances ∼12000 km. This suggests that sferics generated by lightning over South America and the Maritime Continent are able to survive propagation across the Atlantic and Indian Oceans. The waveguide propagation time for this distance is ∼30 ms, much shorter than the typical magnetospheric propagation time for whistlers received at Tihany, which is ∼500 ms. Furthermore, there is persuasive evidence that, for locations which have a dearth of lightning activity in the vicinity of their conjugate point, the majority of whistlers originate from distant lightning strokes (A. B. Collier et al., Spatial distribution of causative lightning discharges for whistlers observed at Dunedin, New Zealand, manuscript in preparation, 2009). The extreme negative correlations achieved over the Maritime Continent north of the equator may be attributed to the non-uniform spatial distribution of the WWLLN detection efficiency, which is enhanced in this vicinity [*Rodger et al.*, 2006, Figure 12].

[47] A comparison of Figure 6 with global lightning activity [*Christian et al.*, 2003; *Rodger et al.*, 2006] reveals the fact that regions of high lightning activity do not necessarily engender high correlation. The area of maximal lightning activity in equatorial Africa is uncorrelated with the whistlers. Maximal correlation occurs off the southeast coast of South Africa, an area of only moderate lightning activity. The active areas in tropical South America and the Maritime Continent are associated with a lesser degree of positive correlation. The area of positive correlation over South America is more extensive than that over the Maritime Continent possibly due to the smaller land-to-ocean ratio in the latter area, where greater lightning activity generally occurs over land.

[48] Although the spatial distribution of the correlation coefficients in Figure 6 is suggestive of a pattern which is in accord with expectation, the range is rather low, from a minimum of −0.037 to a maximum of 0.065. Under conventional conditions a correlation of 0.065 would be deemed inconclusive. Two topics should therefore be addressed: why are the correlations so low and might one still ascribe any significance to them?

[49] The limited range of sample correlation coefficients may be attributed to a variety of sources: the appreciable mismatch in occurrence frequency between lightning and whistlers; the division of the lightning data among a number of cells; the imperfect efficiency of both the whistler and lightning detection systems; failure to isolate periods during which either system was not fully functional; or the inability of WWLLN to distinguish between CG and CC discharges.

[50] Of the reasons mooted above, the first two are the most compelling. In general, the number of lightning strokes exceeds the number of whistlers during a given interval [*Chum et al.*, 2006]. This analysis proceeds on the premise that every whistler is generated by a lightning stroke but not every lightning stroke results in a whistler. The relatively low correlation even around the conjugate point arises principally due to the fact that a given lightning stroke rarely results in a whistler detectable at the conjugate point. The mean flash rate around the conjugate point is roughly 10 km^{−2} year^{−1} [*Christian et al.*, 2003; *Collier et al.*, 2006], so that a region of radius 1000 km centered on the conjugate point has on average ∼3600 flashes per hour. Values for the average flash multiplicity (number of strokes per flash) vary appreciably according to the observation technique. If one assumes a conservative mean multiplicity of 2.5 [*Rakov and Huffines*, 2003], then this translates to ∼9000 strokes per hour. The average whistler detection rate at Tihany is ∼24 per hour. Neglecting imperfections in the detection systems, it is possible to conclude that only approximately one in every 375 lightning strokes produces a whistler detectable on the ground. It is difficult to achieve a substantial correlation between two phenomena whose frequencies of occurrence differ by at least two orders of magnitude!

[51] The distribution of the causative lightning strokes over a number of spatial cells also has a deleterious effect on the correlation. This is illustrated schematically in Figure 7. The horizontal grids represent a sequence of time intervals. Shaded cells indicate intervals which contain activity, while empty cells are inactive. Above the dashed line is a hypothetical sequence of whistlers, while below the line are two scenarios for gridding the lightning data. In the first scenario all of the causative strokes are located within a single spatial cell and the correlation with the whistler sequence is perfect. However, in the second scenario the original cell is divided into four smaller cells. Now the lightning strokes are divided into four separate sequences according to their location within each of the smaller cells. The correlations between each of the four new lightning sequences and the original whistler sequence are now significantly less than unity. The analogous effect applies to the data presented in Figure 6, where the causative strokes are distributed over numerous spatial cells.

[52] It would indeed be of interest to conduct a similar analysis using a data set which differentiates between CG and CC discharges, since this might resolve the controversy regarding which of CG or CC strokes is more likely to yield a whistler. Although whistlers are associated with the most energetic lightning strokes [*Helliwell*, 1965, p. 121], which are most likely to be CG strokes, there is conflicting evidence [*Ferencz et al.*, 2007] which suggests that the primary source of whistlers are CC strokes, although this may be partly due to the higher incidence of CC lightning.

[53] There is a subtle northwest-southeast orientation of the region of enhanced correlation, suggesting that strokes which are displaced parallel to the horizontal projection of the magnetic field (the local declination is −27°) relative to the conjugate point are preferred. This result concurs with those of *Yoshino* [1976], who found that strokes located between the conjugate point and the magnetic pole were most likely to generate whistlers. Furthermore, although *Strangeways* [1981] found that strokes at lower *L* were more likely to excite a duct, *Helliwell* [1965] indicates that matching between the neutral atmosphere and the ionosphere is best when waves approach the entry point from higher magnetic latitudes. Our results are not able to confirm or refute either of these hypotheses.

#### 4.1. Equatorial Correlation

[54] The regions of high lightning activity in the tropics have flash rates in excess of 10 km^{−2} year^{−1} [*Christian et al.*, 2003]. A 3° by 3° cell therefore experiences around 10^{6} flashes per year or, on average, one flash every 30 s. Therefore every time interval which contains a whistler at Tihany is very likely to also have a lightning discharge associated with it in these high activity cells. So one might be tempted to attribute the elevated correlations over South America and the Maritime Continent to coincidence due to the profusion of lightning in these areas. However, two factors oppose this line of reasoning. Firstly, in keeping with the logic of (1), the intervals which do not have whistlers, yet do have lightning activity, should make a negative contribution toward the correlation. Secondly, the diurnal variation in the correlation over these regions mirrors that around the conjugate point. Maximal correlations in these remote areas are not contemporaneous with peak local lightning activity.

[55] The region of maximal correlation near the conjugate point is one in which there is only relatively low lightning activity. It is thus evident that the positive correlation in this area is not simply due to an overwhelming frequency of lightning fortuitously timed with respect to the whistlers. This fact may be further appreciated with reference to Figure 8, which represents the relationship between the correlation coefficient and the level of lightning activity. It is immediately evident that regions with high lightning density do not necessarily produce above average correlation. The grid cells with the highest correlation actually have very low stroke density. The 0.95 quantile of stroke density occurs at 0.243 km^{−2} year^{−1}. The mean correlation coefficient for cells with stroke densities greater than the 0.95 quantile is 0.0032, which indicates that the regions of greatest lightning activity are only marginally correlated with whistlers at Tihany.

#### 4.2. Diurnal Variation

[56] The fact that whistlers are observed at Tihany with much greater probability during the hours of darkness [*Collier et al.*, 2006], negatively impacts on the quality of a correlation based on data from all local times, since during daylight any lightning stroke, no matter how favorably situated, is unlikely to generate a whistler. The data were thus decomposed into 8 units corresponding to consecutive 3 h intervals. The correlation analysis was then repeated, yielding the results displayed in Figure 9.

[57] During the period 06:00 to 15:00 UTC, when whistlers are scarce at Tihany, there is little significant correlation anywhere. However, between the mid-afternoon and predawn hours the pattern of relatively high correlation emerges. This regimen applies not only around the conjugate point but also over South America and the Maritime Continent. The fact that the variation of the correlation over these remote regions follows the same diurnal pattern as that near the conjugate point suggests that the effectiveness of lightning strokes over these regions is heightened when the ionosphere over Tihany and its conjugate point are in darkness and that the impulses from the distant strokes are travelling a significant distance in the waveguide before entering the magnetosphere. Furthermore, it is important to note that the peaks in lightning activity over the Maritime Continent and South America occur at roughly 08:00 UTC and 20:00 UTC respectively [*Price*, 1993]. In the case of the former, the period of enhanced correlation does not correspond to the peak in lightning activity, indicating again that it is ionospheric transparency over the source and receiver that is operative.

[58] Various observations of whistlers indicate that paths at magnetic longitudes nearest to the source or receiver are most favorable [*Allcock and McNeill*, 1966; *Shimakura et al.*, 1987; *Ladwig and Hughes*, 1989; *Clilverd et al.*, 1992]. If one were to apply this, for example, to causative lightning strokes over South America then two options exist: either the whistlers are ducted to the northern hemisphere from around the lightning discharge location and then propagate in the waveguide to Tihany, or the sferic travels in the waveguide and enters a duct in the vicinity of Tihany's conjugate point. One might fairly readily discard the first option in light of the low-latitude cutoff for whistlers. This then leaves the latter option, which is also compatible with the fact that the illumination of the ionosphere over Tihany and its conjugate point appears to control the correlation over South America.

#### 4.3. Correlation and Causation

[59] The foregoing analysis indicates that the region of highest correlation between lightning activity and Tihany whistlers is located within a region centered on the conjugate point. One is thus inclined to infer a deterministic relationship between the two phenomena. However, even a strong correlation between two quantities does not automatically imply that there is a causal relationship between them. Certainly it is the case that statistically independent quantities are always uncorrelated, but the converse is not necessarily true. However, a positive correlation does indeed suggest a cause-and-effect relationship and may be taken as non-conclusive evidence of such.

[60] Furthermore, although correlation does not imply causation, a correlation that supports an existing hypothesis is more compelling than one obtained on an investigative basis. By analogy, it is more impressive for a golfer to hit a hole-in-one immediately after predicting the feat than simply doing so by chance. In this study the positive correlation surrounding the conjugate point supports the existing theory advanced by *Storey* [1953].

#### 4.4. Statistical Significance

[61] In assessing the plausibility of these results the statistical significance of the calculated correlation coefficients becomes relevant. Figure 6 reflects only those cells which have *p* < 0.01, where the *p* value represents the probability of achieving a result at least as extreme as that obtained assuming that the null hypothesis (no correlation) is, in fact, true.

[62] One in one hundred random data sets might be expected to yield correlations exceeding a given value at the *α* = 0.01 significance level on the basis of chance alone. The likelihood of this occurring might be enhanced by the selection of an area of the globe for which there exists a qualitative similarity between lightning incidence and whistler occurrence. However, the fact that elevated correlations are observed in a spatially coherent region surrounding the conjugate point suggests that this is not a chance occurrence and supports the existing hypothesis regarding the formation of whistlers.

[63] Various caveats apply to the interpretation of the correlation coefficient associated with time sequences. The statistical significance of *r* depends on the sample size or, more specifically, the number of independent observations. The standard technique for calculating the correlation *p* value assumes that the populations from which the samples are drawn are normally distributed and that the samples within each of the sequences are independent. Although the two sequences used for calculating the correlation coefficient consist of Boolean samples, it can be shown that the resulting statistic *r* has a normal distribution. The individual samples are not, however, drawn from a normal distribution. This issue will be addressed in a forthcoming publication. The latter requirement, that each of the series used to calculate the correlation coefficient represents a sequence of statistically independent samples, is rarely satisfied for time series, which generally possess an appreciable degree of autocorrelation. It is quite clear that this is not true of the lightning data, since the presence of lightning in a given time interval certainly increases the likelihood of lightning in the subsequent intervals. The resulting uncertainties may be underestimated as a result. Autocorrelation does not, however, have an effect on the calculated correlation coefficient itself.

[64] The situation may be remedied by employing an effective sample size, , to estimate the number of degrees of freedom, thereby making allowance for autocorrelation within the data. The effective sample size is a function of the nominal sample size and the first-order autocorrelation of both of the variables [*Dawdy and Matalas*, 1964]. Application of an effective sample size should only be considered if the autocorrelation estimates are statistically significant. The distribution of autocorrelations for sequences of *N* random samples is approximately normal with zero mean and variance 1/*N* [*Panofsky and Brier*, 1958; *Chatfield*, 1975]. Thus, with a probability of roughly 0.95, the autocorrelation of a random sequence lies within ±2/ of zero. A value lying outside this range is significant at the *α* = 0.05 level. The autocorrelation of a random sequence of length 1251360 is thus likely to be very small indeed. Although it is possible to obtain spurious autocorrelation values for short random sequences, for sequences of reasonable length, this contingency does not arise.

[65] The nominal sample size, based on the number of time intervals spanning the period of interest, was *N* = 1251360. The resulting values of the effective sample size varied from 502691 to 1251948, with the lowest values corresponding to those areas with the highest lightning activity. A reduction in the effective sample size results in a larger sample standard deviation, which, in turn, broadens the confidence intervals. The average effective sample size was 〈〉 = 1116992 ± 1972, which does not differ significantly from the nominal sample size. The width of the confidence intervals is adjusted by the factor

Since the ratio *N*/ is at most ∼2, the factor in (2) is ∼1.4 or less, and on average is very close to unity. Therefore autocorrelation does not have a dramatic effect on the width of the confidence intervals.

[66] In principle the most robust technique for determining the significance of *r* would be to use a permutation test. This would obviate the need for computing an effective sample size. For the large number of cells (120 × 60) and the long sequences for each cell (1251360 elements) the computations would be both formidable and arduous. However, to illustrate that the use of a parametric test in this case is not inappropriate, we compared the confidence intervals obtained for the correlation coefficient between two synthetic sequences. The 99.0% confidence interval obtained under the assumption that the correlation coefficient has a normal distribution is [0.599, 0.601], centered on the estimate of 0.600. Several options exist for constructing the bootstrap confidence interval: a range of 2.576 standard deviations either side of the mean yields [0.598, 0.602], while the empirical 0.5% and 99.5% percentiles are [0.598, 0.602], both of which are in good agreement with the parametric estimate.

[67] Once the effective sample size is known, determination of the *p* value associated with the correlation coefficient is straightforward [*Fisher*, 1941, p. 186]. Figure 10a plots the relationship between the sample correlation coefficient and the associated *p* value for all cells on the geographic grid. The dashed horizontal line indicates the 0.01 significance level. The null hypothesis may be rejected at the 0.01 level for all points lying below this line. It is thus apparent that correlations with absolute magnitude greater than 0.002 are statistically significant and contradict the null hypothesis. Locations which conform to this criterion for statistical significance occupy 49% of the area of the Earth.

[68] Confidence intervals may be readily computed provided that the populations are normally distributed and pairs of observations are selected at random. Figure 10b relates the 99.0% confidence intervals to the calculated values of *r*. The confidence interval is ±0.00230. It is clear that only a small range of correlation coefficients (delimited by the vertical dotted lines) have a confidence interval which includes the null hypothesis.

### 5. Conclusions

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Results
- 4. Discussion
- 5. Conclusions
- Acknowledgments
- References
- Supporting Information

[69] A correlation analysis may suggest the possibility of a deterministic relationship between two phenomena. Motivated by such an indication, a valid theory may be constructed on the basis of other factors beyond mere correlation. If, however, a correlation analysis corroborates some aspect of an existing theory, then this is very compelling evidence in favor of the veracity of the theory.

[70] Whistler data from Tihany has been correlated against the corresponding global lightning data from WWLLN. The objective of this analysis was to examine the co-occurrence of lightning and whistler activity, not to obtain a one-to-one relationship between individual whistlers and the causative lightning discharges. The investigation was thus not focused on the absolute number of whistlers or strokes in a given interval but whether or not there was such activity. The whistler and lightning counts were therefore reduced to Boolean sequences, simply indicating the presence or absence of activity in a given interval. The use of intervals of reasonable duration and Boolean values partially ameliorated the imperfections of the lightning and whistler detection systems.

[71] The region of highest correlation is centered on the conjugate point with a radial extent of ∼1000 km. This suggests that the majority of causative strokes are located close to the conjugate point, and is in accord with the generally accepted theory of whistler generation. This can thus be regarded as the principal source region for whistlers recorded at Tihany. There are also, however, areas of positive correlation further afield, indicating that sferics may travel great distances in the waveguide before entering a duct. The fact that the diurnal variation in the correlation coefficient in these remote regions is congruent with that around the conjugate point indicates that the characteristics of the ionosphere in the vicinity of receiver's meridian are operative in determining the generation of a whistler by a remote lightning stroke.

[72] The principal source region is also not circular, but exhibits an elongation along an axis in the northwest-southeast direction. The declination at the conjugate point is −27°, so that the axis is roughly aligned with the local magnetic meridian. Furthermore, the region of positive correlation appears to extend to a greater distance in the southeasterly direction, which is toward the magnetic pole, consistent with the fact that the majority of whistlers at Tihany propagated at *L* higher than that of the station. These facts support the model for coupling between the atmosphere and ducts presented by *Helliwell* [1965, Figure 3–23], which suggests that at low and medium latitudes, the coupling improves for waves originating from the poleward side of the foot point of the duct. This should be contrasted with the study by *Strangeways* [1981], who contended that the transmission of whistlers through the ionosphere is most effective when the lightning discharge is equatorward of the ionospheric footprint of the duct.

[73] Although the correlation between whistlers and lightning is strongly significant over 49% of the area of the Earth, the correlation coefficients are small, with the implication that only a minute fraction of the variance in either of the variables can be explained by linear regression on the other variable. However, this was not the objective of this study, which was simply intended to identify those regions of the globe with the highest correlation.

[74] A possible source of concern with the local significance tests on the correlation coefficients for points which are likely to exhibit spatial correlation is whether or not the null hypothesis is rejected for a significant fraction of the globe. To assess this issue, global significance might be determined using Monte Carlo resampling as described by *Livezey and Chen* [1983]. However, it is readily apparent from Figures 6 and 10a that the regions which violate the null hypothesis constitute only a fraction of the globe. Furthermore, the dominant region is centered on that portion of the globe which is magnetically conjugate to the receiver, which is the area from which whistlers were a priori assumed to originate.

[75] The technique outlined here promises to provide a convincing indication of the most likely source of whistlers detected at a given location. The present analysis should be regarded as preliminary. Since the efficiency of WWLLN has improved significantly over the period for which whistler data is available, the quality of the correlations is not consistent over the duration. Because the quality and reliability of the WWLLN data is not always well known, it may be worthwhile repeating this analysis using lightning data from a different source such as ZEUS (http://sifnos.engr.uconn.edu/system.htm).