A Clustering Poisson model for characterizing the interarrival times of sferics



[1] The noise waveform of atmospheric radio noise below 100 MHz is typically impulsive in nature. The impulses are caused by atmospheric events, mainly lightning strokes, that create electromagnetic emissions known as sferics. Sferic impulses in the noise waveform are seen to cluster in groups, indicating an underlying clustering process related to the physical characteristics of the lightning mechanism. The objective of this work is the statistical modeling of the clustering of noise impulses in atmospheric radio noise in the range 10 Hz to 60 kHz (denoted low-frequency noise). Based on hundreds of hours of impulse interarrival time measurements made by the Stanford Radio Noise Survey System on such noise, a new Clustering Poisson atmospheric noise model is developed to describe the clustering process. This new statistical model is based on several previously known statistical-physical models of atmospheric radio noise, but in addition to these models it takes into account the clustering of sferic impulses. It is shown that the clustering model accurately characterizes the impulse interarrival time distributions found in low-frequency radio noise data.

1. Introduction

[2] The noise waveform of atmospheric radio noise below 100 MHz, as seen at a receiving antenna, is typically impulsive in nature. The impulses are caused by atmospheric events, mainly lightning strokes, that create electromagnetic emissions known as sferics. It is also seen in the noise waveform that these impulses tend to cluster in groups, indicating an underlying clustering process related to the physical characteristics of the lightning mechanism.

[3] The objective of this work is the statistical modeling of the clustering of noise impulses in atmospheric radio noise in the range 10 Hz–60 kHz (denoted low-frequency noise). Based on hundreds of hours of impulse interarrival time measurements made by the Stanford Radio Noise Survey System on such noise, a new Clustering Poisson atmospheric noise model is developed to describe the clustering of impulses seen in noise data. This model is based on several previously known statistical-physical models, but in addition takes into account the clustering of sferic impulses. The term “Clustering Poisson” is used to describe the model because noise source events occur as Poisson processes that are triggered by another, independent Poisson process, and thus the noise impulses are seen to occur in bursts, or clusters. It is shown that the clustering model accurately characterizes the interarrival time distributions of the impulses in low-frequency radio noise data.

[4] Section 2 begins with an overview of 10 Hz–60 kHz radio noise, which includes most or all of the extremely-low frequency (ELF; 3 Hz–3 kHz), very-low frequency (VLF; 3 kHz–30 kHz) and low frequency (LF; 30 kHz–300 kHz) bands. It follows with a description of the measurement systems used to collect the noise data, then discusses the impulsive characteristics of the noise as well as the impulse interarrival statistics. Section 3 presents an overview of existing atmospheric noise models and defines the new clustering model. Section 4 derives the impulse interarrival time distribution predicted by the clustering model and shows that the predicted distribution closely matches that of the measured data. Section 5 discusses the physical justification of the model, and Section 6 summarizes the conclusions of this work.

2. ELF/VLF/LF Noise

[5] ELF/VLF/LF radio noise consists of both man-made and natural electromagnetic signals. Examples of man-made signals are power line harmonics, radio communication signals, and interference from electrically powered machinery; naturally occurring noise includes sferics, whistlers, polar chorus and auroral hiss [Helliwell, 1965].

[6] Sferics are typically the dominant source of naturally occurring low-frequency radio noise [Volland, 1995]. Even though lightning activity occurs mainly at low latitudes, sferics can propagate for thousands of kilometers with little attenuation, and as a result they are seen in noise data worldwide. The amount of sferic activity in a given noise sample depends on the worldwide distribution of lightning relative to the receiver location, with nearby thunderstorms contributing a great deal and distant storms contributing less.

[7] A sample VLF spectrogram for 8 s of Arrival Heights, Antarctica, data is shown in Figure 1. The first 1 s includes a calibration tone: harmonics spaced 250 Hz apart with 0.1 pT magnitudes. There is a highpass filter with a cutoff frequency of 5 kHz to remove power line harmonics. The vertical lines are sferics, each representing a lightning stroke that could be many thousands of kilometers away. The line at 10 kHz is an instrumentation signal, and the dashed horizontal lines between 10 kHz and 14 kHz on the order of 1 s in length are from the Omega navigation system [Kirby, 1970]. The horizontal lines above 15 kHz are from communication systems, primarily phase shift keyed digital systems transmitting on the order of 100–200 baud.

Figure 1.

Arrival Heights VLF spectrogram, 08 May 1995, at 16:05 UTC. The vertical lines are sferics, and a distinctive cluster (or multiple clusters) of these sferics can be seen occurring just after the 7-s time mark.

[8] A sample VLF spectrogram for 8 s of Grafton, New Hampshire, data is shown in Figure 2. A great deal of sferic activity is seen, and the sferics are strong enough to clearly dominate the calibration tone. This is because Grafton is much closer to thunderstorm activity than Arrival Heights, and in addition July is the peak of the North American storm season and 00 UTC is the peak of the North American diurnal cycle [Chrissan and Fraser-Smith, 1996a, 1996b, 1997].

Figure 2.

Grafton, New Hampshire, VLF spectrogram, 20 July 1988 at 00:05 UTC. The vertical lines are sferics. Note that many clusters of sferics can be distinguished in this spectrogram.

2.1. Noise Measurement System

[9] The data used for the analysis are from the Stanford ELF/VLF Radio Noise Survey [Fraser-Smith and Helliwell, 1985; Fraser-Smith et al., 1988]. During the years 1985–1986, a noise survey system of eight ELF/VLF (10 Hz–32 kHz) radio noise measurement stations (or radiometers) was installed at a variety of high-latitude and midlatitude sites, in an effort to fill large gaps in the information available on radio noise in the ELF/VLF frequency range. A number of other ELF/VLF measurement systems have been implemented, but this was the most advanced system of its kind in terms of its geographic coverage and continuity of simultaneous data collection.

[10] The radiometers were installed at Arrival Heights, Antarctica (AH; 78°S, 167°E); Dunedin, New Zealand (DU; 46°S, 170°E); Grafton, New Hampshire (GN; 44°N, 72°W); Kochi, Japan (KO; 33°N, 133°E); L'Aquila, Italy (AQ; 42°N, 13°E); Søndrestromfjørd, Greenland (SS; 67°N, 51°W); Stanford, California (SU; 37°N, 122°W); and Thule, Greenland (TH; 77°N, 69°W). Most of the stations operated much longer than program expectations, and the systems at Stanford, Arrival Heights and Søndrestromfjørd are still operating. A complete technical description of the radiometers is provided by Fraser-Smith and Helliwell [1985].

[11] Hundreds of hours of ELF/VLF time series data were recorded by the radiometers, and some additional low-frequency (LF) data were recorded at Thule in the range 30–60 kHz (i.e., in the lower part of the LF range; 30–300 kHz). These ELF/VLF/LF time series data are used to derive the impulse interarrival time distributions against which the Clustering Poisson (CP) model is tested. (Note that for the rest of this paper, the terms ELF, VLF and LF refer to the individual frequency bands, while the term low-frequency refers to all of them collectively.) The radiometers' time-capture interval for one data segment is 1 min, and the data from multiple time-capture intervals are used to obtain distributions with larger sample sizes.

2.2. Time Series Data

[12] ELF broadband time series data are collected for 1 min every half hour, and are sampled at a 1 kHz sampling rate by a 14 bit digital-to-analog (D/A) converter. A 400 Hz low-pass filter is inserted before the D/A to prevent aliasing. The VLF and LF broadband time series data are collected for 1 min every hour; the sampling rate for these data is 62.5 kHz and the D/A converter is 16 bits. The antialiasing filter is set at either 20 kHz or 30 kHz depending on which part of the frequency spectrum is to be analyzed. (The 30 kHz filter allows some aliasing at lower frequencies but does not distort the 20 kHz–30 kHz range.) The LF data at Thule, once downconverted, are processed in the same way as the VLF data.

[13] Since wideband time series data generally contain significant man-made interference, the noise statistics are usually analyzed within a relatively narrow frequency range. In such narrowband analysis, the broadband VLF or LF time series data are digitally downconverted (i.e., frequency-translated) using various center frequencies, and low-pass-filtered using various cutoff frequencies. Statistics are then derived from the resulting low-pass equivalent signals. For the ELF band, the raw data are analyzed without frequency translation.

[14] A narrowband signal n(t) can be written

equation image

where nI(t) is the inphase component, nQ(t) is the quadrature component, and f is the center frequency. The signals nI(t) and nQ(t) are then low-pass signals with bandwidth much smaller than f. The complex-analytic representation is written

equation image

where the magnitude (or envelope) A(t) is

equation image

and the phase Θ(t) is

equation image

Using this latter notation the signal n(t) can also be written

equation image

where ℜ() indicates the real part of the argument.

[15] The noise envelope A(t) of atmospheric noise is a random process that is impulsive in nature; it is this signal from which the impulse interarrival statistics are determined. Note for the analyses in this paper that the noise envelope process A(t) is normalized to its average magnitude value E(∣A(t)∣). The common root-mean-square (RMS) normalization is not used because large impulses can significantly skew the RMS value relative to the background noise level.

2.3. Impulse Interarrival Distributions

[16] Hundreds of hours of ELF/VLF/LF data were analyzed to determine impulse interarrival distributions. In each case, a set of threshold values (and their negatives) were chosen as illustrated in Figure 3, which shows a plot of 52 s of ELF broadband time series data. For each occurrence of an upward crossing of a given positive threshold value or a downward crossing of the respective negative threshold value, the following are noted: (1) time of occurrence of the threshold crossing, linearly interpolated from the two samples between which the level is crossed, (2) the maximum (or minimum) extent of the sferic, and (3) the amount of time the noise remains above (or below) the threshold crossing (i.e., the duration, also linearly interpolated from the two closest sample values). Crossings less than 1 ms apart are assumed to be from the same impulsive waveform and are ignored. For a given ±threshold value, all positive and negative sferics are treated together, and the time differences are determined from one sferic to the next. The result is a set of impulse spacings, or interarrival times.

Figure 3.

Stanford ELF broadband data, 01 April 1990, at 01:04 UTC. Dotted lines are shown at ±5 and ±10 times the average magnitude of the noise.

[17] If successive impulses are indexed by integer k and the interarrival time between impulses k and k + 1 is denoted TI(k), then the set of successive interarrival times TI(k) forms a discrete random process. The probability distribution function (PDF) of this random process, equation image, is defined as the probability that the interarrival time TI(k) takes on a value less than or equal to some given value t: equation image = P(TI(k) ≤ t). Since the PDF is assumed not to depend on the index k, this expression can be simplified to equation image = P(TIt) and can be empirically estimated from a histogram of the interarrival times of the data. The probability density function (pdf) of TI, equation image, is the derivative of the PDF with respect to t and represents the probability density over the range of t (i.e., the probability that TI falls within the range a to b is the integral of the pdf over the range a to b). The pdf equation image is the statistical characterization function used in the remainder of this paper.

[18] For every data sample analyzed, the resulting interarrival time pdf equation image was found to have the general form of Figure 4; this figure shows equation image for Arrival Heights December 1995 data in the 25.5–27.5 kHz noise band, with the impulse threshold set at 20 times the average magnitude of the noise.

Figure 4.

Impulse interarrival probability density function (pdf) for Arrival Heights, December 1995, for the 25.5–27.5 kHz noise band, with the impulse detection threshold set at 20 times the average magnitude of the noise.

[19] Since Figure 4 has a logarithmic ordinate axis, functions of the form ce−λt appear as straight lines with slope −λ and intercept log c. Thus an exponential pdf, indicative of a simple Poisson process, would appear as a straight line. In all the data and for all threshold values, an approximately straight line is found for times greater than roughly 1 s. For values of TI between zero and one, however, the slope is more steeply downward curving. This implies clustering, since it indicates that many of the impulses are bunched together with short times between them. Section 4 shows that a Clustering Poisson assumption closely predicts the behavior of the pdf equation image.

2.4. Correlations Between Impulses

[20] Since the magnitude and direction of each sferic's threshold crossing is noted, it is possible to check for correlations between maximum amplitude levels of adjacent impulses and also between maximum amplitudes and interarrival times. In all the data samples examined and at a variety of threshold levels, however, no significant correlations were found in adjacent amplitude levels or between amplitudes and interarrival times, validating certain independence assumptions of the model to be proposed in Section 3.3. The correlation coefficient between adjacent interarrival times is consistently found to be approximately 0.15–0.20.

3. Clustering Poisson Model

[21] This section presents a statistical-physical radio noise model, denoted the “Clustering Poisson” model, and shows that this model predicts the statistical features of atmospheric radio noise very accurately. The term “Clustering Poisson” is used to describe the model because noise source events occur as Poisson processes that are triggered by another, independent Poisson process, and thus the noise impulses are seen to occur in bursts, or clusters.

[22] A number of statistical-physical noise models (i.e., based on the underlying physical process that creates the noise) have been developed over the past four decades; each can be roughly categorized into one of two types: (1) simple enough to provide concise, closed form answers but only somewhat representative of the true physical situation, and (2) representative of the true physical situation but very complicated to use. The second type often uses specific storm distribution and propagation information and calculates the noise characteristics at a given point numerically [e.g., Warber and Field, 1995], whereas the first type usually assumes independence in space and time of the source distribution (a condition known not to be true in practice). The new model partially bridges the gap between the two types by replacing independence in time with a Clustering Poisson assumption.

[23] In this paper, the model is verified by comparing it to the impulse interarrival distributions of atmospheric radio noise. For an analysis of the model with respect to other statistical features of atmospheric radio noise, e.g., correlations between impulses, amplitude probability distributions (APDs), and power spectral densities, see Chrissan [1998]. It is shown that the impulse interarrival time distributions predicted by the model match those seen in the data very closely, thus verifying the accuracy of the Clustering Poisson assumption.

3.1. Review of Existing Models

[24] Most impulsive noise models do not take impulse interarrival time dependencies into account, but two statistical-physical models that do address impulse clustering are those of Furutsu and Ishida [1961] and Giordano and Haber [1972]. Furutsu and Ishida address only the clustering of pilots and leaders (on the order of milliseconds) in an individual lightning stroke, however, so the analysis does not apply to impulse distributions. Giordano and Haber model impulse clustering by assuming that with each independent impulse there is a finite probability of a similar dependent impulse a time τi later, where τi is an appropriately specified random variable. The analysis is not extended to multiple dependent impulses due to the added mathematical complexity. Both models provide several key features upon which the new clustering model is based, such as specification of the source distribution and propagation characteristics.

[25] Nikias and Shao [1995] use an adapted form of Giordano's model (without impulse clustering), in addition to a method in Zolotarev [1986] for the model of point sources of influence, to prove that atmospheric noise has an α-stable distribution. The notation of Nikias and Shao is convenient for describing the nonclustering model and is thus used as the basis for describing both the nonclustering model and the new clustering model.

3.2. A Nonclustering Statistical-Physical Model

[26] This section defines a nonclustering statistical-physical Poisson model that is common in principle to the Girodano and Haber model. Begin by assuming that the received noise is the superposition of many impulses produced by many sources in a region encompassing the receiver location. Given this assumption, if the exact source distribution were known, as well as the exact time that each source emits an impulse and the exact waveform of each impulse at the receiver (including knowledge of time delay), the characteristics of the received noise waveform could be completely determined. The problem, of course, is that it could only be determined numerically and would not in general yield simple, closed form results. In order to proceed to such closed form results, exact knowledge of the source characteristics must be replaced with simplified statistical approximations and expectations.

[27] The region Ω encompassing the receiver location is defined on Rn, where R denotes the real axis and n ∈ {1,2,3} is the dimension of the space (i.e., a line (R1), a plane (R2), or a volume (R3)). The receiver is at the origin and each source i is at a position xi a distance ∣xi∣ from the receiver. It is assumed that all sources have similar enough waveform generation mechanisms that their emitted waveforms may be modeled as aiD(ti), where the random amplitudes ai are independent and identically distributed (i.i.d.) with pdf fa(a) (denoted i.i.d. ∼ fa(a)) and the random parameters θi are i.i.d. ∼ fΘ(θ). The θi represent arbitrary mappings from a probability space to an ensemble of possible waveforms; they are chosen to be scalars for simplicity.

[28] The effect of the transmission medium is modeled as the combination of a power law attenuation factor ν and a linear, time-invariant (LTI) filtering factor h(θ) such that the impulse aiD(ti) appears as equation image at the receiver, where c1 is a positive constant and ν > 0. Since θi is arbitrary, the convolution D(ti) * hi) may be defined as E(ti) without loss of generality. Further defining the quantity

equation image

and letting the random variable N be the number of contributing impulses at the time of observation, the received waveform Y is then

equation image

The time ti > 0 is defined as the difference between the observation time t = 0 and the source emission time, so it is larger for older impulses (i.e., t is a measure of negative time).

[29] It now remains to specify N. In order to attain the most tractable results while maintaining a reasonably accurate physical representation, N is assumed to be the number of events generated by a Poisson process in space and time with source density function ρ(x,t). The value ρ(x,t) dxdt is then the probability that a noise impulse will be emitted from the (infinitely small) region defined by the line, square or cube (for n = 1,2,3) with far corners x and x + dx, and during the (infinitely small) interval [t,t + dt] prior to observation. The term ρ(x,t) may take a general form over x ∈ Ω and t ∈ [0,∞), but for simplicity it is approximated as

equation image

where ρ0, μ > 0 are constants. Thus the sources are defined to be independent both in time and in direction from the receiver. The exponent μ defines a variation in source density with distance, providing an added degree of freedom with minimal added complexity.

[30] The definition of ρ(x,t) in equation (8) is the key approximation for making further analyses reasonably simple and tractable, but it should be noted that this is not the true physical source distribution of atmospheric radio noise. Various storm centers are active at various times of day and year [Chrissan and Fraser-Smith, 1996a, 1996b, 1997; Goodman and Christian, 1993], so for a given receiver location there are certain directions from which heavier sferic activity arrives than others. Nonetheless, equation (8) has been used in well-known atmospheric noise models [Middleton, 1977] and has been shown to be reasonably accurate.

[31] The commonly used temporal independence approximation, however, is not always accurate. Bursts of sferics on the order of 1 s in length are clearly seen in spectrograms of ELF/VLF/LF time series data at all locations and times (see Figure 1), and they have considerable effect on the time series data. The model presented below adds impulse clustering properties to the model just described in a way that accurately represents this clustering of sferics.

3.3. Definition of the Clustering Poisson Model

[32] Instead of specifying the received waveform Y to be the superposition of impulses distributed throughout the region, specify it to be the superposition of clusters distributed throughout the region, where a cluster is defined as a burst of impulses from a given location over a short time period. (Clusters will be specified in detail in the next two sections.) The resulting waveform at the receiver is then

equation image

where Nc is the number of clusters, Nk is the number of impulses in cluster k, and the indices k,i on a, τ and θ refer to the ith impulse of the kth cluster (i = 0 refers to the impulse that starts the cluster). All the impulses in cluster k are modeled as occurring at the same location xk. The ak,i and θk,i are all independent, an assumption that is verified by the lack of correlations between impulses discussed in Section 2.4. Every cluster is assumed to be independent of every other in both space and time. The Xk's are the waveforms of the individual clusters.

3.4. Specification of Clusters

[33] The number of clusters Nc is analogous to N in equation (7) and the clusters have the source density of equation (8), but the statistical properties of the clusters themselves still need to be specified. The specification chosen must accurately model the physical mechanism of lightning and should also lead to tractable results.

[34] In order to develop a physically accurate model, several features of lightning must be considered. From Uman's [1969] classic text, it is known that a complete cloud to ground discharge, called a flash, is made up of one or more intermittent, partial discharges, called strokes. Each stroke involves a complex process whereby either a stepped leader or a dart leader initiates a strong return stroke, transferring large amounts of charge between the Earth and its atmosphere. A significant percentage of flashes contain many strokes; one data sample in Uman [1969] indicates that 40 percent of all flashes contain at least five strokes and some contain as many as 19. In addition, 50 percent of those flashes with five or more strokes have a duration of more than 400 msec. Data histograms in Uman [1969] and Warber and Field [1995] showing typical flash per stroke distributions indicate that the number of flashes per stroke can be reasonably modeled as a geometric random variable; thus the number of impulses per cluster in the Clustering Poisson model is defined to be a geometric random variable as well.

[35] Now the timing of impulses within a cluster must be specified. Every cluster has at least one impulse, which is the start of the cluster. The interarrival times Ti between any additional impulses in a given cluster are specified as i.i.d. random variables with pdf

equation image

that is, an exponential distribution (denoted EXP(λ1)). Each cluster then is essentially a variable length Poisson process with rate λ1 impulses per second, and as stated above all the clusters are independent of the main process and of each other. Similar cluster specifications have found use in modeling computer failures [Lewis, 1964], earthquakes [Vere-Jones, 1970], and neural impulses [Grüneis et al., 1989].

3.5. Length of Clusters

[36] This section specifies the probability mass function (pmf) of the Nk's and determines the pdf of the cluster lengths TLk. The Nk's are modeled as i.i.d. geometric random variables with pmf

equation image

where the index k is omitted since this pdf applies for all k. The parameter λL is now shown to be the reciprocal of the expected cluster length.

[37] If N = 0, there are no impulses after the first impulse and the cluster length is zero. For a given N > 0, the length TL of the cluster is the sum of N independent exponential random variables with parameter λ1, and therefore it is known to have the Erlang density [Leon-Garcia, 1989]:

equation image

Thus the pdf equation image for t > 0 can be determined by conditioning on N:

equation image
equation image
equation image

where (15) is obtained using the exponential summation formula. Note that the rest of the probability on TL is at the point TL = 0 with probability P(TL = 0) = λL/(λ1 + λL).

[38] Since λ1/(λ1 + λL) is the probability that N > 0, it is apparent from equation (15) that equation image is EXP(λ1λL/(λ1 + λL)) distributed. In addition, since the expected value of an EXP(λ) random variable is 1/λ, it follows from (15) that

equation image

[39] An equivalent cluster specification results by defining clusters to be Poisson processes of random length equation image, where equation image is EXP(λL). In other words, it is equivalent to define either (1) the Nk's as geometric random variables, so that the cluster length TL given Nk is Erlang distributed as in (12), or (2) the variable equation image's as the duration of a rate λ1 Poisson process, such that Nkequation image is a Poisson(λ1equation image) random variable. With definition 2, the pmf of Nk is

equation image
equation image
equation image

where the integral in (18) is solved using item (860.07) in Dwight [1969]. Since (19) and (11) are identical and the interarrival specifications are the same as well, definitions 1 and 2 are equivalent. The variable equation image, however, is not the length of the cluster; it is simply the length of an underlying Poisson process. The variable TL (which is less than or equal to equation image) is the true length of the cluster, for it indicates the location of the last impulse.

4. Impulse Interarrival Distribution of Clustered Poisson Noise

[40] This section shows that the Clustering Poisson model accurately predicts the impulse interarrival time distributions of low-frequency noise. Predicted interarrival distributions are determined from the model, and these are shown to match the distributions derived from the data.

[41] As defined in Section 3.4, the clustering model proposes that the number of follow-on impulses (after the first) within a given cluster is a geometric random variable, and that the interarrival times between adjacent impulses within each cluster are independent and EXP(λ1) distributed. The total process is the superposition of all the impulses in all the clusters, and the clusters themselves are occurrences of a Poisson random process with parameter λ2 clusters per second.

[42] Since the occurrence of an impulse is defined as the crossing of a threshold level as in Figure 3, and we are only concerned here with impulse interarrival times, the individual impulses can be represented for simplicity as Dirac delta functions. The total process can then be expressed by the clustering model as

equation image

where Nk is the number of impulses in cluster k, tk is the start time of cluster k, and τk,i is the time from the start of the cluster k to the ith impulse of cluster k. Note that τk,0 = 0 for all k, and also that Nk = 0 for clusters with no follow-on impulses.

[43] If a stopwatch is started at an arbitrary point in time and stopped at the next occurrence of an impulse, the time shown on the stopwatch is the waiting time for the next impulse, or the forward recurrence time. The forward recurrence time is a random variable TW with marginal pdf equation image.

[44] If the stopwatch is instead started at the occurrence of an arbitrary impulse and stopped at the next impulse, the time on the stopwatch is the impulse interarrival time. This random variable is denoted TI, with marginal pdf equation image. The following sections derive both equation image and equation image for the Clustering Poisson model, and then compare the derived equation image to the interarrival distributions seen in the data.

4.1. Waiting Time pdf

[45] The pdf equation image of the forward recurrence time of the process y(t) is most easily calculated by setting the time origin (t = 0) to be the stopwatch's starting time, then determining the probability that there are no impulses in the time span [0,t]. It follows that 1-P(no impulses in [0,t]) is the probability that the waiting time is less than t, so differentiating this quantity results in the pdf equation image.

[46] In order to find P(no impulses in [0,t]), the pmf of N0, the number of clusters active at time zero, must be calculated. This pmf is calculated in three steps: (1) The number of clusters N created in the range [−T,0] is a Poisson(λ2T) random variable. If ordering is neglected, the starting times of these clusters are uniformly distributed over [−T,0] and are independent of each other. The probability PA that any given cluster Xk with starting time tk ∼ U([−T,0]) is active at t = 0 can then be calculated by conditioning on tk:

equation image

(2) Now condition on N: given N clusters with starting times uniformly and independently distributed over [−T,0], each with a probability of surviving past t = 0 given by (21), then the number of clusters surviving at t = 0 is a binomial random variable:

equation image

(3) Since N is Poisson(λ2T), the final result is

equation image
equation image
equation image

where (24) results from a change of variables nk + n. From (25) it is clear that N0 is a Poisson random variable with a parameter equal to λ2T scaled by the probability (21), a well-known result for a Poisson process modulated by an indicator function. Setting T→∞, the final result is that N0 is a Poisson(λ2L) random variable.

[47] Each of the N0 clusters active at t = 0 has as least one impulse remaining. In addition, from the memoryless property of exponential random variables, the first impulse after time zero in each of these clusters has a time of occurrence that is EXP(λ1) distributed. Therefore, in order for there to be no impulses in [0,t], all N0 active clusters must have their next impulse occur after t. Conditioning on N0 results in

equation image
equation image

Taking into account that there must be no new clusters in [0,t] as well,

equation image

and so the forward recurrence pdf is finally found by negating the derivative of equation (28):

equation image

4.2. Interarrival Time pdf

[48] The interarrival pdf equation image calculation is similar to the forward recurrence calculation except that there is by definition an impulse at time t = 0. It is not known whether this impulse is the first impulse of a new cluster or an additional impulse from an established cluster, however, so it is necessary to condition over both possibilities.

[49] First calculate the expected value of the number of impulses in a cluster, which is 1 + E[Nk]. Using the pmf (19), this value is

equation image

Furthermore, if the process is modeled as existing for all time, the probability that any single impulse is the first of its cluster is the reciprocal of (30). Using both this information and equation (28), in addition to the fact that a new cluster at time t = 0 must not cause an impulse in (0,t], results in

equation image

Negating the derivative of (31) then gives the interarrival pdf

equation image

[50] Note in summary that λ1 is the average rate of impulses within a cluster, λ2 is the average number of clusters per second and λL is the reciprocal of the average cluster length.

4.3. Data Analysis

[51] Now that equation image has been determined, it is compared to the impulse spacings seen in actual low-frequency noise data. Figure 5 shows the fit of equation (32) to the interarrival time pdf of Arrival Heights noise data in the 25.5–27.5 kHz range, for the whole month of December 1995, during the hours 08–16 UTC of each day. The four lines in the plot are for threshold levels of 5, 10, 15 and 20 times the expected value of the magnitude of the noise envelope A(t). The accurate fit of equation (32) to the data is quite apparent from the dotted lines in the figure; the parameter values used for each dotted line are listed in Table 1.

Figure 5.

Fit of the Clustering Poisson impulse interarrival pdf, equation (32) (dotted lines), to measured Arrival Heights impulse interarrival data. The threshold levels are 5, 10, 15 and 20 times the expected value of the magnitude of the noise (E∣A∣).

Table 1. Parameter Values of the Clustering Poisson pdf's in Figure 5
Thresholdλ1, s−1λ2, s−1λL, s−1

[52] Many more data samples were examined in addition to those in Figure 5, but all of them have the same form for their interarrival pdf's independent of threshold level (as long as the threshold is sufficiently above the background noise level to enable the impulses to be distinguished, as in Figure 3). The term λ2 varies with location and seasonal and diurnal variations, but average cluster length is always on the order of 1 s and λ1 is on the order of 5–10.

5. Physical Justification of the Model

[53] Since the Clustering Poisson equation (20) of Section 4,

equation image

differs from equation (9) of Section 3.3:

equation image

it remains to specify the connection between the two. Note that y(t) in (33) is a waveform that represents the detection of impulses above a certain threshold, while Y in (34) is the received signal at a given point in time.

[54] Begin by defining the range r to be the distance from a given cluster's source to the receiver (r = ∣x∣), and note that it depends only on ∣a∣ and r whether or not an impulse crosses a given threshold Yth in the received waveform; i.e., it can be specified with no essential loss of generality that c1 = 1 and max ∣Ek,ik,i)∣ = 1, independent of θ. The probability that an impulse from the cluster crosses Yth, as a function of r and Yth, is then

equation image

However, it is not known from the received waveform how many impulses and/or entire clusters have gone undetected because of small values of a. In order to proceed further, it is approximated that for a given Yth, there exists some cutoff distance rc such that all clusters with r > rc can be ignored. This cutoff distance is chosen to be

equation image

for some constant R0.

[55] For the region r < rc, the spatial density of clusters is given by equation (8):

equation image

but some of the clusters (especially for r near rc) will still go undetected due to propagation attenuation. To compensate for propagation effects, an effective source density is defined as

equation image

[56] Using this effective source density, it is assumed that all clusters with r < rc are detected. Thus the effective number of spatial cluster sources for threshold Yth, in units of 1/s, is

equation image

and carrying out this integration gives

equation image

a polynomial in Yth. To confirm the validity of this approximation, the values of relative threshold level and λ2 listed in Table 1 were fit to equation (40) by taking the logarithm of both sides of (40) and using a linear regression. The exponent n − μ − ν = 0.73 was found to give a very good fit, and it is physically justified by the expected values of n ≈ 2, ν ≈ 1.0, and μ a small positive number. Thus a physical connection between the parameters of (33) and (34) is shown. (The first term in parentheses in (40) is 15.3, but this number is irrelevant since ρ0 is unknown, R0 is arbitrary and only relative threshold levels are used.) It is seen from the values of λ1 and λL in Table 1 that the expected number of impulses per cluster, 1 + λ1L, is on the order of 5–10.

6. Conclusions

[57] This paper introduces a statistical-physical Clustering Poisson model for atmospheric noise. According to the model, the sources of the noise are clusters of impulses with independent and identically distributed waveforms. Cluster occurrence is a spatial and temporal Poisson process with source distribution independent of direction and time. The impulses within clusters are defined as variable length Poisson processes. The Clustering Poisson model is justified by the physical properties of lightning, and in addition, it is derived in conjunction with a statistical analysis of hundreds of hours of globally collected ELF/VLF/LF radio noise data.

[58] The model is verified by comparing its predicted impulse intrarrival distributions to those of measured atmospheric radio noise. Given the accuracy to which the predicted distributions fit the actual data, the Clustering Poisson model proves to be a strong candidate for characterizing the clustering of sferics in atmospheric noise.


[59] This research was sponsored by the Office of Naval Research through grant N00014-92-J-1576. Logistic support for the measurements at Søndrestromfjørd, Greenland, and Arrival Heights, Antarctica, was provided by the National Science Foundation through NSF cooperative agreement ATM 88-22560, and NSF grants DPP-8720167 and OPP-9119552, respectively. Current logistics support is being provided through grants ATM-9813556 and OPP-0138126.