Neural networks for automated classification of ionospheric irregularities in HF radar backscattered signals

Authors


Abstract

[1] The classification of high frequency (HF) radar backscattered signals from the ionospheric irregularities (clutters) into those suitable, or not, for further analysis, is a time-consuming task even by experts in the field. We tested several different feedforward neural networks on this task, investigating the effects of network type (single layer versus multilayer) and number of hidden nodes upon performance. As expected, the multilayer feedforward networks (MLFNs) outperformed the single-layer networks. The MLFNs achieved performance levels of 100% correct on the training set and up to 98% correct on the testing set. Comparable figures for the single-layer networks were 94.5% and 92%, respectively. When measures of sensitivity, specificity, and proportion of variance accounted for by the model are considered, the superiority of the MLFNs over the single-layer networks is much more striking. Our results suggest that such neural networks could aid many HF radar operations such as frequency search, space weather, etc.

1. Introduction

[2] Neural networks have many potential applications in signal processing. For example, Lippmann [1987] in an introduction to computing with neural networks, describes a number of applications to signal processing. Gorman and Sejnowski [1988] have shown that neural networks can be used to discriminate sonar signals from a mine and a similarly shaped rock with a high degree of precision. Lapedes and Farber [1987] have demonstrated that neural networks can be used to predict points in highly chaotic time series; Boone et al. [1990] have shown that neural networks can perform as well as trained human experts in detecting certain nodules in radiological data. Other examples of using neural networks to perform signal processing tasks can be found in the special section on neural networks that appeared in the July 1988 issue of IEEE Transactions on Acoustics, Speech, and Signal Processing.

[3] In this paper we describe the application of neural networks to high frequency (HF) radar backscattered returns classification problem. Networks were trained to discriminate “usable” from “unusable” radar backscattered signals from the ionospheric density irregularities or clutters. Examples of these are given in the section 2. Such networks could be used in many radar operations, e.g., automated frequency search for optimal backscattered radar signals, classifications of ionospheric/magnetospheric regions or boundaries, etc. Section 2 gives a brief description of the radar system and how the signals were processed. Section 3 describes the networks that were used for the discrimination task. Section 4 discusses implementation issues. Finally, section 5 presents the results of the experiments with a number of different networks and an analysis of the classification strategy developed by the networks.

2. Radar System

[4] The radar data used in this study were collected by the Goose Bay HF radar operated by the Johns Hopkins University Applied Physics Laboratory. The radar, which operates at frequencies from 8 to 20 MHz, consists of 16 log-periodic antennas which are electronically steerable to produce 16 beams or viewing angles [Greenwald et al., 1985]. Each beam consists of up to 75 range gates or intervals. The Goose Bay radar is part of the worldwide SuperDARN radar network that is used to study high-latitude ionosphere at E and F layers, 100 km to 500 km in altitude [Greenwald et al., 1995]. The radar detects the backscattered signals from the “soft target,” that is, the ionospheric density irregularities or clutters that have a spatial periodicity equal to half the radar wavelength. In addition, the backscattering process is a coherent one and the backscattered signal is proportional to the square of the number density. A detailed analysis of the backscattering process can be found in Walker et al. [1987].

[5] For the purpose of the analysis in this paper, it is sufficient to note that the radar utilizes a multipulse technique to determine complex autocorrelation function (ACF) at each range gate along each of the 16 beams [Farley, 1972; Greenwald et al., 1985]. From these ACFs, several parameters can be extracted and subsequently used to derive physically important quantities. For example, the ACFs can be used to calculate the Doppler velocity that in turn can derive the bulk motion of the ionospheric plasma and the ionospheric electric potential perpendicular to the magnetic field line (convection patterns). The ACFs can also give the spectral width that can be used to infer certain ionospheric and/or magnetospheric regions and boundaries. A description of the data analysis procedures can be found in Baker et al. [1988].

[6] If we denote the j-th sampled of backscattered signal with a time delay t from transmission by

equation image

and then the autocorrelation function (ACF), R, is given by

equation image

where the asterisk indicates the complex conjugate, k is the lag number, M is the maximum lag number, N is the number of samples, and T is the elemental time spacing (lag). It is the discrete values of the ACF that serve as inputs to the neural networks.

[7] For this study, we selected the Goose Bay radar data recorded in 1986–1987. During this period, the radar used a seven pulse pattern, resulting in 17 lag numbers (M = 17) [Baker et al., 1988]. The number of pulses in the pattern and M can vary depending on the mode of operation, for example, recently the radar experimented with an eight pulse pattern with 22 lags. N typically ranged from 50 to 60 for a 5-min integration time per beam. Compared to the present SuperDARN radars, the Goose Bay radar during 1986–1987 had lower output power, resulting in somewhat noisier ACFs, which are particularly suitable for demonstrating the capability of the neural networks. If they worked well with a lower power radar, then they would certainly do as well, if not better, with a higher power radar.

[8] Figure 1 shows typical ACFs in the data set. There are two parts to the curve: the real part (solid line), and the imaginary part (dashed line), corresponding to the complex autocorrelation function that results from the complex electromagnetic signal. Ideally, the “usable” ACFs have the real and imaginary parts behave approximately like exponentially damped cosine and sine functions, respectively. In reality, the ACFs can deviate from the ideal shape for several reasons, including (1) interference from others' or its own transmitters, and (2) signal levels that are below the system noise level, e.g., incoherent scattering, ionospheric absorption, etc. [Hanuise et al., 1985]. Little deviation or noise can usually be tolerated and the “ideal” curve can be approximated through least square fit. However, too many bad data points or too much deviation would render the ACFs “unusable” for further analyses. The criteria for selecting usable and unusable ACFs are the same as those used in the standard DARN/SuperDARN ACF analysis [Baker et al., 1988]. Figure 1 shows examples of usable ACFs on the left column and the unusable ACFs on the right column. Identifying usable and unusable ACFs is a hard problem even if done manually by scientists trained in the field. To design a computer algorithm to automate the procedure is even harder (technically called NP hard problem). The radar can produce up to 75 of these 17–22 pairs of numbers every 1–7 s all year-round. Thus there is a tremendous amount of work required to weed out unusable ACFs in the course of radar data analysis, but this kind of task is particularly appropriate for neural networks.

Figure 1.

Typical ACFs. The four curves in the first column are “usable” ACFs; the four in the second column are “unusable” ACFs. Solid and dashed lines indicate real and imaginary parts of the ACFs, respectively.

3. Methodology

[9] The networks used in this paper are feedforward networks comprised of an input layer, an intermediate or hidden layer and an output layer. All units in any given layer are connected to all units in the layer above. There are no other connections (see Figure 2). Input units do not perform computations but serve only to distribute the input to the computing units (neurons) in the hidden layer above. Units in the hidden layer have no direct connections to the outside world but after processing their inputs, pass their results to the units of the output layer.

Figure 2.

Typical feedforward network comprised of an input layer (level 0), a hidden layer (level 1) and an output layer (level 2). Flow is unidirectional, bottom to top.

[10] In addition to the feedforward mechanism described above, errors in the output are propagated backwards from output layer to hidden and from hidden layer to input layer. During this learning phase the connection weights are modified so as to minimize the total error in the output of the network.

[11] The network learns to map a set I = [I(1), I(2), …, equation image] of np input vectors to a set T = [T(1), T(2), …, equation image] of np corresponding target vectors. During the learning phase the network builds its own internal representation of the mapping. Experience has shown that the representations learned by the networks have the important property of generalization, i.e., the networks are able to map input vectors not used in the training set, {I, T} to the appropriate output vector.

3.1. Feedforward Mechanism

[12] We now give a more detailed exposition of the operation of the network. Let Ii(p) denote the i-th element of the p-th input vector; wji(1) the weight (strength) between the j-th node of the hidden layer (layer 1) and the i-th node of the input layer (layer 0). The corresponding weight for the connection between the hidden layer and the output layer (layer 2) will be denoted by wji(2).

[13] For a given input vector I(p) = (I1(p) …, equation image(p)) the output vector of the input units is given by the vector O(p,0) = I(p). Here ni is the number of input units and the superscript 0 in (p,0) serves to denote that this is output from level 0. The input Ij(p,1) to the j-th node of the hidden layer (level 1) is given by

equation image

The processing units in the networks used in this study compute their output by applying a sigmodial or logistic function, f, to their input:

equation image

In addition, all hidden and output units have associated with them biases which modulate the sensitivity of the unit to its input. Thus the output Oj(p,1) of the j-th hidden node is given by

equation image

where bj(1) is the bias of the j-th hidden node. We note that the bias term can be interpreted as the weight of a connection between a virtual input node which is always “on” (and is usually called the true node) and the j-th unit. Thus the bias is just another learnable weight.

[14] Similarly, the input and output of the j-th unit of the output layer (level 2) are

equation image

and

equation image

where nh is the number of units in the hidden layer. This completes the forward pass for the input vector I(p).

3.2. Learning Mechanism

[15] We want to minimize the total error

equation image

where no is the number of output vectors.

[16] Using equations (6) and (7) in (8) we see that E is a function of the wji(2) and differentiating E with respect to wji(2) gives

equation image

where we have used the fact that f′(x) = f(x)(l − f(x)) and we define δj(p,2) to be

equation image

Since the gradient is the direction of the maximum rate we wish to move in the opposite direction so that the equation of the weight change, Δwji(2) is

equation image

where η is a learning rate parameter. Thus the expression for the weight wji(2) at the (t + 1)−st presentation of the set of training vectors is given by

equation image

In equation (12), the last term is a momentum term which enhances the stability of the computations by suppressing high frequency fluctuations in the weights.

[17] The dependence of E on the wji(2) is more complicated but easily expressed using equations (3) and (5)–(8). From these we find that

equation image

where

equation image

Thus the equation for the new value of wji is

equation image

4. Implementation

[18] We selected 350 ACFs from the 1986–1987 Goose Bay radar database for this study. Each of these ACFs was manually identified as usable or unusable. Out of these 350 ACFs, we randomly selected 200 ACFs for the training data set and 150 for the testing data set. The training set consists of 101 usable and 99 unusable ACFs. The testing data set consists of 123 usable and 27 unusable ACFs (unusable ACFs were much less common in the data than were usable ACFs.)

[19] The ACF contains 17 pairs of discrete points, as discussed in Section 2, resulting in 34 points per ACF. It is these 34 values that serve as input to the networks. Each input was normalized to the range [−l, l]. The number of hidden nodes was varied from 0 to 15. Since our networks are currently being used to classify the inputs into only two classes (usable and unusable), only one output node was needed. This node outputs a 1 for a usable ACF and 0 for a unusable ACF. In general, usable ACFs are indicated by well-defined damped sinusoidal curves. The unusable signals are characterized by a variety of deviations, perhaps indicating the variety of the sources of the disturbances, e.g., incoherent scattering, ionospheric absorptions, transmitter interference, etc. Thus unusable ACFs are of a more diverse character than usable ones. This difference is evident in the network's behavior, which is discussed in Section 5.

5. Results and Discussion

[20] The networks were trained with the training data set described in Section 4. Networks with 0, 3, 5, 8, 10 and 15 hidden nodes were used. We will refer to networks with no hidden nodes as perceptrons, and those with hidden nodes as multilayer feedforward networks (MLFNs). It is well known that MLFNs can learn more complex mappings than can perceptrons [e.g., Rumelhart and McClelland, 1987]. We used the perceptrons to give us a basis for quantifying the additional power obtained by using hidden nodes in this problem. It should be noted that if the output node of the perceptron simply outputs its input, (i.e., the activation function is the identify function), then the error to be minimized is

equation image

which is identical to the quantity that is minimized when a linear regression is applied to the training set. The only difference is that the regression coefficients are now found by an iterative steepest descent method described in Section 3 rather than by inverting a correlation matrix.

[21] Figure 3 shows learning curves on the training set for the two perceptrons, and for a typical MLFN with five hidden nodes. The two perceptrons differ only in that one used a linear transformation for its activation function (i.e., the identity function), and the other used the logistic transformation of equation (4) for its activation function. All MLFNs used this logistic transformation. It can be seen that the learning curves begin at values of 38–58% correct and all moved to above 80% correct after 25 presentations of the training set. All have nearly reached their final values by 100 presentations. The lower curve represents the linear perceptron which eventually converged to 92.5% correct. The middle line represents the nonlinear perceptron which eventually converged to 94.5% correct. The top curve represents an MLFN with five hidden nodes. It, and the MLFNs used in this study, eventually converged to 99.5% to 100% correct. Thus, it is clear that the MLFNs are superior to the perceptrons in learning the classification task.

Figure 3.

Network learning curves for the linear perceptron, the nonlinear perceptron and a typical MLFN with five hidden nodes.

[22] The superiority of the MLFNs over the two perceptrons becomes more apparent when they are each tested against data in the testing set. The linear perceptron was able to correctly classify 90.67% from the testing set; the nonlinear perceptron 92%. The MLFNs averaged greater than 96% correct, with a range from 94% to 98%. (Figure 4 shows the worst case; the best case; the average over 10 different starting networks for 3, 5, 8, 10 and 15 hidden node MLFNs; and a one standard deviation band around the average.)

Figure 4.

Percent correct classification of MLFNs on the training set as a function of the number of hidden nodes. The middle curve is an average of results for ten MLFNs with different initial weights. The dashed lines on either side of the average are a one standard deviation band. The dotted curves indicate the best and worst performance of the ten networks for 3, 5, 8, 10 and 15 hidden nodes.

[23] Further analysis showed clear differences in sensitivity and specificity of the various network types. Sensitivity is a measure of accurately detecting a usable ACF when a usable ACF was in fact present. The sensitivity of the linear perceptron was 95.9% (it correctly classified 118 of 123 usable ACFs); that for the nonlinear perceptron was 98.4% (121 of 123); and that for the best MLFNs was 100%. Specificity is a measure of how well the networks correctly classify unusable ACFs. The specificity of the linear perceptron was only 66.7% (it correctly classified 18 of 27 unusable ACFs); that for the nonlinear perceptron was 63% (17 of 27); and that for the best MLFNs was 88.9% (24 of 27). The worst MLFN had a sensitivity of 100% and a specificity of 66.7%. Thus the worst MLFN did as well as the best perceptron.

[24] The results are captured in the receiver operating characteristic (ROC) curves of Figures 5 and 6. Each of these curves shows the hit rate (sensitivity) as a function of the false alarm rate (1-specificity). It is clear that the ROC curve of the MLFN is far closer to that of a perfect discriminator than that for the perceptron. These conclusions are amplified in Figures 7 and 8 in which sensitivity, specificity, proportion of variance accounted for, and percent correct are shown as a function of the usable/unusable threshold value for both the linear perceptron and the best MLFN. It is particularly notable that for a threshold of 0.5, the MLFN accounted for 83.8% of the output variance while the perceptron accounted for only 49.1%.

Figure 5.

False alarm rate versus sensitivity for a linear perceptron. False alarm rate is the probability of predicting “usable” when a usable ACF is not present. Sensitivity is the probability of predicting “usable” when a usable ACF is present.

Figure 6.

False alarm rate versus sensitivity for the best MLFN. False alarm rate is the probability of predicting “usable” when a usable ACF is not present. Sensitivity is the probability of predicting “usable” when a usable ACF is present.

Figure 7.

Sensitivity (squares), specificity (pluses), proportion of all cases correct (triangles), and proportion of variance in output accounted for by the network (diamonds), as a function of the threshold chosen for classification of predicted output for the linear perceptron.

Figure 8.

Sensitivity (squares), specificity (pluses), proportion of all cases correct (triangles), and proportion of variance in output accounted for by the network (diamonds), as a function of the threshold chosen for classification of the network's output, for the best multilayer feedforward network (MLFN).

6. Conclusions

[25] We have demonstrated that classification of radar backscattered signals is a task for which neural networks are very well suited. Furthermore, neural networks with hidden nodes substantially outperform those without hidden nodes. The improvement in performance extends to both sensitivity and specificity measures—MLFNs outperformed perceptrons, and perceptrons performed as well as a multiple linear regression analysis in their ability to discriminate between usable and unusable ACFs.

[26] The original goal of this research was to demonstrate that neural networks could operate at a level of performance high enough to be a real aid in the automation of the classification task. This goal was clearly met.

[27] It would be interesting to see if more sophisticated neural networks with a few more output nodes could perform more complicated classification schemes with the radar signals. For example, recently André et al. [2002] introduce an ACF classification scheme according to its spectral width and standard deviation. The latter is used to infer the presence of single or multicomponent spectra. They show that each class of ACF preferentially occurs in certain local times and can infer the physical processes in that region with the ACF properties. Their ACF class with multiple components would probably be classified as unusable in our classification scheme (based from the standard SuperDARN ACF analysis algorithm), but these ACFs turn out to be quiet useful for identifying the presence of certain wave activities in the magnetosphere and/or ionosphere in those local times [André et al., 2002]. It would be interesting to see if a neural network could be used to classify their ACF classification scheme. Similarly, it would be interesting to ascertain whether or not neural networks could identify other sources of the unusable ACFs such as incoherent scattering, absorption, interference from the transmitters, etc. Finally, we would like to note that recently the space physics community has been paying an increasingly large attention to space weather. Neural networks are particularly suited for these kind of tasks, which require (1) a fair amount of automation and (2) solving a function that maps a set of inputs to a set of outputs, but the mapping function algorithm itself is too complicated to design and/or the exact relationships between inputs and outputs are not yet well understood.

Acknowledgments

[28] We would like to thank Kile Baker for many discussions and suggestions. This work was supported by NASA grant NAG5-10971; and NSF grants ATM-9819705 and ATM-0000256. SuperDARN is a NSF upper atmosphere facility supported by NSF grant ATM-9812078.

Ancillary