#### 2.1. Probability Density Functions

[4] The fundamental difficulty with separation of signals of ionospheric scatter and ground scatter origin is that in terms of line of sight velocity and spectral width, there is no distinguishing feature of ionospheric scatter [*Baker et al.*, 1988]. That is, the distribution of *w* and *v* values produced by ionospheric scatter covers the entire range *v* > 0 and *w* > 0 up to the maximum values measurable and overlaps with the distribution of *w* and *v* values produced by ground scatter. Therefore, one can only speak of the probability *P* that a given echo is of ground scatter origin *G* given particular values of width *w* and velocity *v*. In Bayesian statistics, this probability is written as *P*(*G*∣*w*, *v*) and is referred to as the posterior probability of ground scatter conditional upon the values of *w* and *v*. This probability is distinguished from to the probability of ground scatter regardless of the values of *w* and *v*, which is written as *P*(*G*) and is referred to as the prior probability of ground scatter. Similarly, *P*(*w*, *v*∣*G*) is the posterior probability of *w*, *v* conditional upon the echo being of ground scatter origin *G* and *P*(*w*, *v*) is the prior probability of *w*, *v* regardless of the origin of the echo. These four probabilities are related by Bayes' theorem [cf. *Schmidt*, 1969],

[5] To determine the relevant probabilities necessary to calculate *P*(*G*∣*w*, *v*), we acquire data from the Kapuskasing, Ontario, Canada HF Radar (49.39°N, 82.32°W geographic coordinates; 60.06°N, 9.22°W AACGM magnetic coordinates) and the Saskatoon, Saskatchewan, Canada HF Radar (52.16°N, 106.53°W geographic coordinates; 61.34°N, 45.26°W AACGM magnetic coordinates) over the following time intervals: 25 March 2001 0000–2400 UT, 27 March 2001 0000–2400 UT, 11 June 2001 0000–2400 UT, and 22 December 2001 0000–2400 UT. These intervals, 48 h under equinoctial conditions and 24 h each under summer and winter solstice conditions, were chosen to eliminate seasonal and local time biases. The specific dates were chosen following a visual inspection of the data to select intervals with a high occurrence rate of both ground and ionospheric backscatter. These intervals yield a total of 386,081 individual ACFs and XCFs.

[6] We first determine *P*(*w*, *v*) by calculating the joint histogram of the backscatter spectral parameters *w* and *v* in 10 m/s × 10 m/s bins and normalizing by the total number of ACFs to obtain the probability density function. We assume that the total prior joint probability distribution function *P*(*w*, *v*) is the sum of two component distributions functions

This assumption does ignore a third possibility, which is that of mixed scatter, in which case scattering from both the ground and ionosphere occurs at the same range along two different ray paths. However, unless the signals from the different paths are of comparable power, we expect the autocorrelation function to display the characteristics of the dominant signal. Therefore, the signal may be classified as being predominately of ionospheric scatter or ground scatter origin, and we neglect the case of equal strength mixed scatter as being of low probability. Using the total probability theorem, equation (2) can be written as

Since the posterior probability density functions are normalized to 1, we obtain *P*(*G*) and *P*(*w*, *v*∣*G*) from

and

[7] Based on observations by *Breech et al.* [2003] that the probability density function of the electric field in near-Earth space obeys an exponential distribution, we assume a similar distribution of the ionospheric electric field and consequently *v*. We allow the probability density function to vary freely with *w*. Therefore,

where we determine the values of *A*_{I}(*w*) and *b*_{I}(*w*) by least squares fitting. A priori, we have no similar information on the functional dependence of *P*(*w*, *v*∣*G*) on *v*. Therefore, after least squares fitting of candidate functions to the actual distribution, we assess the merit of each candidate function on the basis of a chi-square goodness-of-fit test. Once *P*(*w*, *v*∣G), *P*(G), and *P*(*w*, *v*) are determined, we calculate the posterior probability *P*(*G*∣*w*, *v*) of ground scatter from equation (1). We then establish a criterion for identification of a particular signal as ground scatter based on *P*(*G*∣*w*, *v*) for that signal.

#### 2.2. Error Probabilities

[8] Once we determine a criterion for identification of a particular signal as ground scatter, we examine the probability of false positive errors (actual ionospheric scatter erroneously identified as ground scatter) and false negative errors (actual ground scatter erroneously identified as ionospheric scatter) in the identification of ground scatter in the SuperDARN dataset. The probability of a false positive error is given by

and the probability of a false negative error is given by

Both of these probabilities are a measure of the overlap between *P*(*w*, *v*∣*G*) and *P*(*w*, *v*∣*I*).

[9] We consider the *P*(*G*∣*w*, *v*) = 0.5 contour to be the optimal threshold for routine identification of a particular signal as ionospheric scatter or ground scatter. In the case of *P*(*G*) ≈ *P*(*I*), using the criterion *P*(*G*∣*w*, *v*) ≥ 0.5 to identify a signal as ground scatter minimizes the total error *P*(false positive) + *P*(false negative). Choosing a higher or lower threshold on *P*(*G*∣*w*, *v*) to identify a signal as ground scatter reduces or increases, respectively, the false positive rate while having the opposite effect on the false negative rate. In certain applications, however, such a trade off may be desirable. Therefore, we also analyze the effect of different criteria on the false positive and false negative rates.

#### 2.3. Sensitivity of Results to *P*(*G*)

[10] Experienced users of SuperDARN data are aware that the prior probability of ground scatter *P*(*G*) depends on environmental factors that vary with time, such as operating frequency and ionospheric density. Under the assumption that these factors affect only the prior probability of ground scatter and not the posterior distributions of *v* and *w* in the case of ground or ionospheric scatter, *P*(*w*, *v*∣*G*) or *P*(*w*, *v*∣*I*), equations (1) and (3) permit us to recalculate the criterion for identifying a signal as ground scatter for different hypothetical values of *P*(*G*) other than the value of *P*(*G*) actually observed in this study. In this way, we study the sensitivity of the results obtained to *P*(*G*).

#### 2.4. Validation Using Backscatter Virtual Height

[11] Finally, we use the additional information present in the cross-correlation functions to calculate the backscatter virtual height to provide a physical validation of the posterior probability of ground scatter *P*(*G*∣*w*, *v*) produced by this method. Backscatter virtual height is the apparent height at which backscatter appears to occur assuming a straight-line propagation path from the scattering to the receiver. For given range *r* and elevation angle ɛ of arrival of the ray at the receiver, the virtual height is *h* = *r* sin ɛ. *Yeoman et al.* [2008a, 2008b] observed that for -hop *F* region ionospheric scatter the most likely virtual height *h** as a function of range *r* is given by

This model does not apply to ground scatter. As seen in Figure 1, the virtual height of ground scatter at range *r* will be greater than the virtual height of ionospheric scatter at the same range. Therefore, if *P*(*G*∣*w*, *v*) is a valid indicator of the probability of ground scatter, then we expect that the average relative virtual height anomaly (*h* − *h**)/*h**will increase with increasing probability of ground scatter *P*(*G*∣*w*, *v*). Note, however, that the efficacy of this validation will be limited by the aliasing of elevation angles above a certain critical angle that is dependent upon the radar operating frequency and the separation of the main and interferometer antenna arrays [*Baker and Greenwald*, 1988]. At an operating frequency of 10 MHz, which was the dominant operating frequency during the data intervals chosen for study, the critical angle is 45° for both the Saskatoon and Kapuskasing radars. This effect appears as a random error with a systematic negative bias in the determination of the virtual height. In the next section we present the result of these analyses.