Analysis of wideband forward looking synthetic aperture radar for sensing land mines



[1] Signal processing algorithms are considered for the analysis of wideband, forward looking synthetic aperture radar data and for sensing metal and plastic land mines, with principal application to unpaved roads. Simple prescreening algorithms are considered for reduction of the search space required for a subsequent classifier. The classifier employs features based on viewing the target at multiple ranges, with classification implemented via a support vector machine and a relevance vector machine (RVM). Concerning classifier training, we consider cases for which training is performed on both mine and nonmine (clutter) data. In addition, motivated by the fact that the clutter statistics may vary significantly between the training and testing data, we also consider RVM implementation when we only train on mine data.

1. Introduction

[2] There are many scenarios for which one must contend with sensing land mines. After or during a conflict, one may deploy airborne sensors to determine where possible mine fields reside, with these interrogated subsequently via ground-based sensors. While airborne mine detection (via radar [Carin et al., 1999] or infrared [Larive et al., 1999] sensors) is very challenging, the fact that one is often only interested in sensing a mine field, rather than each individual mine, implies that one need not detect every target to detect the mine field (i.e., the detection probability for individual mines may be low, and one may still detect high-density mine fields).

[3] Once on the ground, there are many options for mine detection via a handheld sensor, including radar [Bourgeois and Smith, 1996; Montoya and Smith, 1999], electromagnetic induction [Geng et al., 1999; Carin et al., 2001], vapor sensors [Cumming et al., 2001], quadrupole resonance [Garroway et al., 2001] and acoustic sensors [Sabatier and Xiang, 2001; Scott et al., 2001]. For ground-based sensing, unlike airborne sensing of mine fields, one typically requires a high probability of detection on individual mines. However, one advantage often available is that the sensor is close to the mine (i.e., there is a small “standoff”) and one can move slowly. By contrast, the airborne sensor operates at a large standoff and must quickly interrogate large areas of terrain to define possible mine fields.

[4] There are problems in land mine sensing for which one must move quickly, with a very high detection probability and a low false alarm rate (FAR) on individual mines, while also requiring significant standoff. The example of interest to this paper is vehicle-mounted sensing of land mines situated in an unpaved road [Kositsky and Milanfar, 1999]. One requires that the sensor provide standoff capability such that the vehicle has time to slow down and stop when a mine is detected. In addition, any missed mine has the potential of inflicting significant harm, necessitating a very high detection probability on individual mines. The requirements of speed also imply a low-FAR requirement (such that one need not stop the vehicle for many false alarms).

[5] Radar represents one of the few sensors that provides both standoff and the potential of sensing buried land mines. The sensor bandwidth determines the down-range resolution [Soumekh, 1994], and the sensor aperture defines the cross-range resolution [Soumekh, 1994]. Since it is often difficult to deploy a large real aperture, we here consider a synthetic aperture, constituting forward looking synthetic aperture radar (SAR). In particular, we consider data measured by SRI International (Menlo Park, California) using a forward looking SAR mounted to the top of a van. The sensor has bandwidth covering 0.3–3 GHz, the total synthetic aperture is 4 m, and VV and HH scattering data are measured. A detailed discussion of the sensor may be found in the work of Kositsky and Milanfar [1999]. Because of the time required to collect data via this prototype system, SAR data are available for three sensor ranges with respect to a given target, corresponding to ranges of 10, 15, and 20 m (rather than for a continuous set of ranges as the vehicle moves down the road). Since the sensor is at a fixed height on the vehicle, these ranges correspond to sensing at different target sensor angles, providing different information that may be fused for target classification. It has been demonstrated computationally [Vitebskiy and Carin, 1996; Geng and Carin, 1999] that there is often significant information available from viewing the target at different angular orientations.

[6] Using the prototype SRI sensor, we have three SAR images for each portion of the roadway, corresponding to the aforementioned three ranges. In addition, at each range, both VV and HH SAR imagery is collected, thereby allowing classification based on a total of six images. Since in practice the vehicle is moving, it is desirable to implement simple “prescreeners” to define portions of the road that appear to be clearly mine-free, with the other regions investigated subsequently via a more sophisticated classifier. We here consider a sequence of prescreeners, of increasing complexity, to sequentially prune the road area. The prescreeners are similar to those considered previously in the context of airborne SAR sensing of unexploded ordnance (UXO) [Sullivan et al., 2000; Dong et al., 2001].

[7] For those regions that pass all the prescreeners, and therefore require further consideration, we consider several classifier implementations. In particular, the support vector machine (SVM) [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002] represents a recent advance in the area of classifier design. The SVM defines a hyperplane in feature space, by which one partitions feature space into regions characteristic of each of the binary decisions (for the problem of interest here, the binary case of mine and no-mine). It is well known that a hyperplane classifier is often overly limiting, in that the optimal decision surface is often more complicated than a hyperplane. As discussed further below, the SVM introduces a “kernel,” which implies that the SVM hyperplane is implemented in a new generally high-dimensional space, while in the original feature space, nonhyperplane decision surfaces are manifested.

[8] As indicated above, the SVM is based on binary classification, although authors have considered extension to the M-ary problem (M classes) [Scholkopf and Smola, 2002]. There are situations for which one wishes to define the classifier on the basis of training data from a single class (e.g., using only mine data for training). During the testing phase, the classifier asks whether given testing data is similar to the data used for training, and if not, it is deemed “novel” (i.e., not characteristic of a mine in the problem considered here). Novelty detection has been considered in the context of SVMs, but the solutions are often ad hoc [Smola et al., 1999; Scholkopf et al., 1999]. We therefore also consider the newly developed relevance vector machine (RVM) [Tipping, 2001], which constitutes a Bayesian generalization of the SVM. As demonstrated below, the RVM allows a general and powerful novelty detector design. This is motivated by the fact that clutter statistics often vary significantly from site to site, and therefore classifier performance may not be robust if one trains using clutter data (in addition to mine data). We also note that the RVM is more general than the SVM in that it allows consideration of arbitrary kernels, while the SVM is only rigorously applicable to “Mercer” kernels [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002].

[9] Considering feature extraction, for the SVM and RVM classifiers, we employ simple mine signature templates with which the SAR imagery is correlated. We discuss several designs for these filters, as well as means by which all six SAR images of a given roadway may be fused to yield a cumulative feature vector.

[10] The remainder of the paper is organized as follows. In section 2 we present example SAR imagery measured by the SRI system for several land mines to give a better sense of the problem under study. Prescreeners, feature extraction, the SVM, and the RVM classifiers are discussed in section 3. In section 4, example results are presented as a function of the mine type (metal and plastic) and for different burial depths, considering two soil conditions. A summary is provided in section 5, wherein conclusions and suggestions for future research are provided.

2. Properties of the SAR Data

[11] SRI International collected SAR imagery for metal and plastic antitank mines on the surface and buried at several depths at two locations in the United States. One site was arid and represented a desert scenario (sandy soil), with the other site more temperate and characterized by clay-like soil.

[12] In Figures 1a and 1b we present example SAR imagery for VV and HH polarization, respectively, considering a fixed sensor range position. These data correspond to the arid site, and most mines in this example are metal-cased antitank mines (approximately cylindrically shaped, with 16 cm radius and 15 cm height). Note that each fixed position of the sensor van yields an image of almost 30 m length down the road. Therefore the position of the sensor with respect to a given mine is dependent upon where the mine exists in range on the road. The closest a given mine is to the sensor is ∼10 m, and since the vehicle moves in discrete 5 m range increments, each mine is approximately observed at distances of 10, 15, and 20 m (with the exact distances dependent on exactly where the mine is in range).

Figure 1.

Example SAR imagery collected at the arid site (magnitude). The top figure is HH imagery, and the bottom is VV. The blue background corresponds to weak signal strength, with a strong signal in red. The vertical axes correspond to cross-range (parallel to the linear synthetic aperture), and the horizontal axes represent down-range. The circles denote mine locations, where “M-S” identifies a metal mine on the surface and “P-S” denotes a plastic mine on the surface. The white points and asterisks locate position of fiducials when present (note the two fiducials at an approximate range of 9 m).

[13] From Figure 1 we also note that fiducial targets were placed on the road to aid alignment of the data with ground truth. The plastic and metal mines were placed along a diagonal row (see Figure 1), on the surface, flush buried, 5 cm deep, 10 cm deep, and 15 cm deep. In Figure 1, data are shown for surface metal and plastic mines. These examples are presented because the mine signatures are relatively easy to observe in the raw imagery, while for buried targets the signatures are generally weaker, particularly so for plastic mines when there is low mine-soil electrical contrast.

3. Classifier Design

3.1. Prescreeners

[14] Multistage prescreening is undertaken to identify possible mine locations, and regions that pass the prescreeners are considered subsequently by more sophisticated classifiers. The following prescreeners were used sequentially to define a region of interest.

3.1.1. Amplitude Threshold

[15] The amplitude of the SAR image is subjected to a threshold, and all points exceeding the threshold are marked. A clustering algorithm [Maulik and Bandyopadhyay, 2002] is used to cluster the positions in the image that exceed the amplitude threshold. If the size of a given cluster is large enough spatially, it is declared a possible mine. The choice of the amplitude threshold and the threshold on the size of the cluster are determined on the basis of training data (discussed further in section 4). For the results presented here the radius used to define the cluster threshold is 20 cm, commensurate with the nominal mine size.

3.1.2. Standard Deviation

[16] The mean amplitude is computed for each cluster that passes the above test; the associated amplitude standard deviation is also computed. A given cluster is considered for further interrogation if the associated standard deviation is less than a prescribed threshold. This prescreener is useful for elimination of randomly spread out clutter, which might pass through the amplitude prescreener. For the data considered, it has been found that this prescreener contributes significantly in reducing the false alarms.

3.1.3. Template Based on Spatial Extent of Energy in the Image

[17] The final prescreener considers the energy contained in the two regions defined by the template in Figure 2. Let Eout represent the energy outside the elliptical template, with Ein denoting the energy inside. If the ratio Eout/Ein is less than a prescribed threshold, the region is then sent for consideration by a more sophisticated classifier. This prescreener is motivated by the fact that the mine responses often have an ellipse-like energy spread. Example signatures are presented in Figure 3, with the vertical direction in Figures 2 and 3 parallel to the SAR aperture. The resolution in the vertical (cross-range) direction is dictated primarily by the size of the synthetic aperture, and the resolution in the horizontal (down-range) direction is dictated primarily by the sensor bandwidth. The system operates from ∼300–3000 MHz, and therefore it is characterized by relatively good down-range resolution, and the relatively small aperture size yields relatively poor cross-range resolution (sensor details are discussed in the work of Kositsky and Milanfar [1999]). This motivates the ellipse shape in Figure 2, which accounts for differences in down-range/cross-range resolution. The dimensions of the ellipse were chosen to match those of the mine signatures (on the basis of observing data like that in Figure 1; this is discussed further in section 4).

Figure 2.

Template applied in one of the prescreeners used to eliminate clutter that occupies a spatial extent in the SAR imagery too large or small to be consistent with a mine. The size of the template components are dictated by the expected extent of a mine in the imagery.

Figure 3.

Average land mine signatures in the SAR image, for VV and HH polarization, for nominal ranges from the sensor of 10, 15, and 20 m.

[18] As indicated in section 2, six images are available for each region on the road, corresponding to VV and HH imagery at three distinct ranges. When the classification is based on both VV and HH SAR data, prescreening is performed for each of these six images, and a given region is passed on for further processing if it passes the prescreening test for all images. This is referred to as “and” fusion since all tests must be passed. There are many other options one may consider for fusing these multiple images in the prescreening phase [Duda et al., 2001].

3.2. Feature Extraction

[19] In Figure 3 we plot the average land mine signatures in the SAR image, for VV and HH polarization, for nominal ranges from the sensor of 10, 15, and 20 m (within the approximations of the range positions discussed in section 2). These images are averaged across all mine types and for all depths (these data are for the arid site, although the templates are very close to those found for the temperate site as well). We utilize these three average images as templates employed in feature extraction. Let TVV(d) represent the average VV template (image) when the sensor is approximately at a distance d from the mine, where d = 10, 15, or 20 m. Analogous templates are defined for THH(d). The templates are normalized for unit energy (the inner product of TVV(d) and THH(d) with itself is 1).

[20] It is of interest to note in Figure 3 that the down-range (horizontal) characteristics of the signature appear to be represented by two strong responses, particularly for the VV polarization. These correspond to scattering from the leading and trailing edges of the mine relative to the SAR aperture. This reflects that the system has good range resolution for the relatively large antitank mines considered. By contrast, if the target size is small relative to resolution afforded by the system bandwidth (e.g., a small rock or antipersonnel mine), then the signature's down-range characteristics are typically represented by a single strong signature. This down-range resolution relative to antitank mines plays a key role in classification performance. If one considers a wider bandwidth system operating at higher frequencies, better resolution is possible, but then signatures from small clutter sources (roughness) may present more confusion to the classifier. Moreover, soil attenuation typically increases with increasing frequency.

[21] Assume a given region R on the road passes the prescreening tests. Region R has an associated set of six SAR images. For example, let IVV(R, d) represent the VV SAR image for region R, corresponding to the sensor being approximately distance d from R. Again, we have images for d approximately equal to 10, 15, and 20 m. The images IVV(R, d) and IHH(R, d) are also normalized.

[22] The feature fVV,dI associated with VV imagery at an approximate range of d is defined as the maximum correlation between IVV(R, d) and TVV(d), where the maximum is defined across all possible relative (two-dimensional) shifts of these two images. The different shifts are implemented efficiently via a fast Fourier transform (FFT). In this manner we define a six-dimensional feature vector for each region R that passes the prescreeners; this vector is defined by fVV,dI and fHH,dI, for d = 10, 15, and 20 m.

[23] Several comments are in order before proceeding. Note that for a given polarization (VV or HH) and a particular d, we have a single template TVV(d) or THH(d) for both the plastic and metal mines and for all of the mine depths (surface, flush buried, 5, 10, and 15 cm). Clearly, plastic and metal mines often have very different signatures, and in addition the signatures are typically a function of the mine depth [Vitebskiy and Carin, 1996; Geng and Carin, 1999]. The simplified template design was motivated by the fact that there were a limited number of metal and plastic mines considered in the test, and the number of examples for each depth was also relatively small. Therefore it is anticipated that classifier performance may be improved when a larger data set is available for algorithm training and testing, thereby allowing design of a more specific set of target templates.

[24] For a given range d and polarization VV or HH, the templates were designed by averaging the SAR images across mine type (plastic/metal) and target depth. Rather than averaging, there are other options one may consider. For example, one may employ a principal components analysis (PCA) [Belhumeur et al., 1997] to design an orthonormal set of templates for each d and polarization. We considered this and found that the data were typically characterized by a single principal component (corresponding to a single large eigenvalue in a singular value decomposition [Belhumeur et al., 1997] of the imagery), and this principal eigenvector was very similar to the average SAR templates considered here.

3.3. Support Vector Machine Classifier

[25] Let Rn represent the nth region that passes the prescreeners, for which there is an associated six-dimensional feature vector fn, as defined in section 3.2. The goal is to distinguish those fn associated with mines from those associated with clutter (the latter, for example, is representative of surface roughness, subsurface wet regions, rocks, etc.). Assume we have a set of N training examples {fn, ln}n=1,N, where ln represents the known label of fn, where ln = +1 corresponds to a mine and ln = −1 corresponds to clutter. On the basis of this training data, the goal is to design a classifier c(f) that maps a given feature vector f into the proper label l, i.e., c(f) → ±1. We here effect c(f) via the support vector machine (SVM) and relevance vector machine (RVM). There are numerous tutorials available on the SVM as well as books [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Smola et al., 1999; Scholkopf et al., 1999], and therefore the discussion here seeks to deliver the basic ideas and assumptions of this algorithm. Concerning the RVM, this is a newer algorithm, with a complete discussion found in the work of Tipping [2001].

[26] Both the SVM and RVM define the classifier in the form

equation image

where K(f, fn) is termed a “kernel,” quantifying the similarity between feature vector f and the nth training feature vector fn. The SVM construction places restrictions on the kernel form. In particular, for the SVM K(f, fn) must be a “Mercer” kernel [Burges, 1998], which means that it must be expressible in the form

equation image

where 〈ϕ(f), ϕ(fn)〉 is an inner product between the vectors ϕ(f) and ϕ(fn). The function ϕ(f) corresponds to a general mapping of f to an arbitrary vector space, even possibly of infinite dimension (although f is clearly of finite dimension). The choice of the vector mapping ϕ(f) is arbitrary, and therefore there are an infinite set of possible Mercer kernels, with popular choices discussed in the work of Burges [1998] and Cristianini and Shawe-Taylor [2000]. It is important to note from equation (1) that the classifier does not explicitly employ ϕ(f) or ϕ(fn) but rather the final inner product between these two. Therefore, although ϕ(f) may constitute a large or even infinite-dimensional vector, the classifier only requires the scalar K(f, fn).

[27] The SVM designs a hyperplane classifier in the space ϕ(f), with this yielding general decision surfaces (typically not a hyperplane) in the original feature space of f. Specifically, the SVM designs in the space ϕ(f) that hyperplane that maximizes the “margin” between the training data associated with the +1 and −1 labels (see Figure 4). Specifically, let D+1 represent the minimum distance in ϕ(f) space between the hyperplane H and that training feature vector from the +1 class that is closest to H. The distance D−1 is similarly defined for the −1 class, and the margin is defined as D+1 + D−1. Design of H becomes an optimization problem with the goal of maximizing the margin, with this solvable via quadratic programming [Cristianini and Shawe-Taylor, 2000]. If it is not possible to draw a hyperplane in the space ϕ(f) that perfectly separates the +1 and −1 training data, then errors occur while training. The SVM training now becomes a constrained optimization problem, solved via Lagrangian methods, wherein the goal is to define H in the space ϕ(f) that maximizes the margin while simultaneously minimizing the error realized in the training set. One must set a parameter to control the relative strengths of these two goals while training the algorithm.

Figure 4.

Margin of a hyperplane (here in two dimensions) for separation of two classes of data. This example assumes that the data are perfectly separable by a hyperplane, although this is not required of the SVM.

[28] An important feature of the final SVM, after training, is that most of the weights wn in equation (1) are zero. The SVM is therefore termed a “sparse” classifier since most of the training examples {fn, ln}n=1,N are not utilized in the final classifier c(f). As indicated by the above discussion, the training examples that matter most to the design of the hyperplane H are those that are nearest to H (along the “support” of H, thus the name support vector machine). These typically represent a small percentage of the total training set. This sparseness property is very important in the subsequent classifier performance on testing data since the sparse representation mitigates “overfitting” of c(f) to the training data [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002].

3.4. Relevance Vector Machine Binary Classifier

[29] The RVM employs the same basic model as in equation (1), but now there are no restrictions with regard to the kernel K(f, fn). In particular, K(f, fn) is viewed as a general basis function, with functional dependence for a given testing vector f related to training example fn in an arbitrary manner (no restriction to Mercer kernels). In addition, the RVM employs a Bayesian statistical approach rather than optimization of a hyperplane. In particular, the probability that feature vector f is associated with label l is defined by the logistic link function [Tipping, 2001]

equation image

where T represents the training data {fn, ln}n=1,N, and the vector w represents the N + 1 weights in equation (1). We also have p(l = −1∣f, T, w) = 1 − p(l = + 1∣f, T, w).

[30] The RVM treats the weights as a random process, with statistics dependent on the properties of the training data T. In particular, the weights are characterized by p(wT, α), where the N + 1 dimensional vector α (one component of α associated with each component of w) defines the parameters associated with the density function for w. The classifier is “regularized” by introducing a prior density function for the parameters α, with which one may impose desirable constraints on the form of p(wT, α). Recognizing that sparseness is a desirable property of the final classifier, the density function associated with α, denoted p(α), is designed to impose a sparse solution (most weights in w being zero). There has been much work done on the design of sparseness priors, a Laplacian model [Tipping, 2001] for p(α) representing a popular example. The Laplacian prior is used for all results presented here.

[31] The final probability calculated for classification is represented as

equation image

and p(l = − 1∣f, T) = 1 − p(l = + 1∣f, T). An approximate but accurate iterative procedure [Tipping, 2001] has been developed for computation of the integrals in equation (4), yielding a relatively simple algorithm.

[32] Note that the SVM yielded sparseness indirectly in design of a maximum margin hyperplane. The RVM Bayesian construct explicitly builds in sparseness via the sparseness prior p(α). This prior yields weights that are zero with very high likelihood, defined by p(wT, α). Only a small number of weights have likelihood of nonzero weights, and these are deemed the most “relevant” for the goal of classification, thus the name relevance vector machine.

[33] The advantages of the RVM vis-à-vis the SVM are (1) there are no limitations on the type of kernel that may be used, (2) we obtain an explicit probability that a given feature vector is associated with a given class, and (3) as demonstrated when presenting results, the RVM typically yields a sparser representation. As discussed in the next section, the RVM also permits a very natural design of a novelty detector, where ad hoc techniques are required to realize an SVM novelty detector [Smola et al., 1999; Scholkopf et al., 1999].

3.5. RVM Novelty Detector for Mines

[34] In the above discussion of the SVM and RVM algorithms, we have assumed binary labels l (e.g., l = +1 for mines and l = −1 for nonmines or clutter) with training performed on data from both label types. Often we have a limited set of measured training data, and therefore there is danger of statistical differences between the training and testing data. The mines of interest are often of small number (e.g., for sensing of mines in an unpaved road; we are typically most interested in a small set of antitank mines, antipersonnel mines being of less concern for vehicle traffic). Consequently, even with a small training set it is reasonable to expect that one can capture the statistical properties of the mine signatures as a function of soil type and target depth. Clutter, however, comes in infinite varieties and may vary substantially from site to site and even along different sections of the same road. Therefore we have explored mine detection via a novelty detector.

[35] In a novelty detector we assume that we have training data from one label type, e.g., {fn, ln = +1}n=1,N when training is performed on class l = +1, here characteristic of mines. The SVM, which is based on defining a hyperplane between two classes of data, is not particularly well suited to novelty detection. The RVM, by contrast, is extended to novelty detection in a simple manner. In particular, the probability that a feature vector f is associated with class l = +1 is defined exactly as in equation (4). When we train the RVM novelty detector, the goal is to realize p(l = +1∣f, T) → 1 when f is a member of the training set, i.e., when fT. Implicitly, if a testing vector f is similar to the training set, p(l = +1∣f, T) will be large and bounded above by 1. By contrast, if f is not characteristic of T, and therefore “novel” and representative of clutter, p(l = + 1∣f, T) will be small and bounded below by 0. Therefore the RVM design of a novelty detector represents a modest change from the original binary RVM [Tipping, 2001], but it allows consideration of the realistic case for which one may have limited a priori knowledge of the clutter characteristics.

[36] It is important to note that there are other options one may consider for design of a novelty detector. For example, using the feature vectors from the training set one may design a Gaussian mixture model (GMM) [Roberts et al., 1998] and perform classification using the associated likelihood function. The GMM does not explicitly impose sparseness in the model design, and there is often a question of the number of Gaussians to use in the GMM. A tradeoff between model complexity and accuracy may be achieved by considering, e.g., the minimum description length (MDL) method [Hansen and Yu, 2001]. We have selected the novelty detector RVM because it affords a direct comparison to the binary RVM and SVM, without requiring density function estimation.

4. Example Results

4.1. Description of Experiments and Algorithm Implementation

[37] The SRI system was used to collect data at the two sites discussed in section 2, and for both the metal and plastic land mines, ∼70% of the mines were on the surface, flush buried, or at a depth of 5 cm. The remaining mines were at a depth of 10 and 15 cm. There were a total of 60 metal mines and 30 plastic mines. The data were collected by going down the road in both directions, effectively doubling the number of mines encountered. Data were collected at two sites, one arid and the other temperate.

[38] The mine templates are designed using ∼10% of the measured data, with the signatures chosen for this purpose selected randomly. The dimensions of the prescreener in Figure 2 were also selected from this same data; we did not examine all of the measured data to try to optimize the properties of the templates or of the prescreeners in an effort to avoid overtraining. The data used for these purposes were not used again in any phase of subsequent classifier training or testing. Regions that passed the prescreener were correlated with templates and the correlations were used as features for the SVM and RVM classifiers. As discussed further below, when the SVM and RVM training and testing were applied to data measured at a fixed site (temperate or arid), the training and testing were performed using a leave-one-out procedure. Specifically, if N signatures (feature vectors) are available for classification after the prescreener, N-1 of these signatures are used for training, and then the algorithm is tested on the remaining feature vector. This is repeated N times.

[39] We also present results below for which algorithm training is based on data from the arid site, and testing is performed using data from the temperate site. Let Na represent the number of feature vectors available from the arid site, and let Nt represent the same for the temperate site. In this case the training and testing data are entirely separate, and leave-one-out training is not used. Specifically, when training on arid data and testing on temperate data, training is performed with all Na arid site feature vectors, and testing is performed with all Nt temperate site feature vectors.

[40] All SVM/RVM results were processed using a radial basis function kernel [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002], although we found for the data considered that the particular kernel selected did not affect results significantly (i.e., we also considered polynomial kernels [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002], with very little difference in classification performance). In section 4.6 below we provide further details on the number of parameters estimated when training the SVM and RVM classifiers.

4.2. Binary Classifier With Prescreener

[41] As indicated by the example imagery in Figure 1, the metal mines typically produce a much stronger signature than their plastic counterparts. This was particularly true for the arid site, for which the dielectric contrast between the plastic mines and soil was generally weak. In particular, measurements of the soil electrical properties for the arid site indicated a dielectric constant of approximately ɛr = 3 (with very low loss), with such very similar to the electrical properties of most plastics used to constitute plastic land mines. Therefore almost all the plastic land mines were missed by the prescreener, and therefore the overall classification performance for plastic mines, when a prescreener was used, was very poor. We should note that the plastic mine signature strength does not always increase when considering data from the temperate site, for which the soil-mine dielectric contrast may be larger; this is because the soil at the temperate site is typically more lossy than that at the arid site.

[42] It is important to emphasize that the plastic mines are missed by the prescreeners because of their associated weak signature strength. However, as we indicate in section 4.3, the spatial SAR signatures of plastic land mines, although weak, are often distinctive, thereby affording the potential for classification. Specifically, in section 4.3, we present classification results for plastic mines without using a prescreener. In this case the plastic mine classification performance is encouraging, using an RVM or SVM classifier. This indicates that improved plastic mine detection is possible (1) if prescreener performance can be improved or (2) if the entire domain is processed with the RVM/SVM classifier directly, avoiding the prescreener altogether. The principal reason for considering prescreeners is to reduce the domain over which sophisticated classifiers are needed, enhancing operational speed.

[43] In Figure 5 we present classification results for metal antitank mines. The mines are approximately cylindrical in shape, with ∼16 cm radius and 15 cm height (see section 2). Note that, because a prescreener is employed, the detection probability (probability of correctly detecting a metal mine) does not go to unity since some mines are missed in the prescreener stage and therefore are not seen by the subsequent classifier. In Figure 5a, results are presented for the arid site, and in Figure 5b results are presented for the temperate site. Note that the peak detection probability is higher for the arid site. This is attributed to the increased loss (attenuation) associated with the soil at the temperate site, which makes the sensing of buried mines more difficult. These results are for metal antitank mines buried at two depths: flush to the air soil interface and at a depth of 5 cm (from the top of the mine).

Figure 5.

Binary RVM classification results for metal antitank mines employing the prescreeners. The mines are buried at two depths: flush to the air soil interface and at a depth of 5 cm (from the top of the mine). Classification results are presented for VV and HH polarizations separately, as well as for the fusion of both. (a) Arid site. (b) Temperate site.

[44] In Figure 5, three results are plotted, for processing VV and HH polarized SAR imagery, as well as for the fusion of these two polarizations. For the fused results, any potential target detected by the prescreener in either polarization is subsequently considered by the classifier. As indicated in section 3.2, a three-dimensional feature vector is employed when processing the HH polarized data, with a separate three-dimensional feature vector employed for the VV polarized imagery. When performing fusion, these two vectors are concatenated to produce a six-dimensional vector.

[45] The receiver operating curves (ROCs) were produced as follows. We consider all the data that passes the prescreener. For each case that passes the prescreener, we define an associated feature vector f, where f is three- or six-dimensional, depending on whether it is associated with HH polarized data, VV polarized data, or the fusion of the two. Let the set S = {ln, fn}n=1,N represent all feature vectors and their associated labels (ln is binary, corresponding to mine or clutter). To generate classifier statistics, we perform leave-one-out training and testing. In particular, N-1 examples in S are used to train the classifier, and testing is performed on the remaining example in S. Assume the nth example (ln, fn) in S is used for testing, and therefore the other N-1 examples are used to define the classifier, with the classifier represented as cn(f). The testing output of the classifier, performing testing on feature vector fn, is denoted cn(fn). By rotating N times which data is considered for training and testing, we obtain the N classifier outcomes {cn(fn)}n=1,N. For the RVM classifier, cn(f) represents a probability ranging from 0 to 1, with the l = +1 class ideal associated with probabilities near 1, and the l = −1 class ideally associated with probabilities near 0. To perform classification, we define a threshold T, and if cn(fn) > T feature vector fn is deemed associated with the l = +1 class (here defined to correspond to mines), otherwise the data are associated with the l = −1 class (representative of clutter). Since the true identity of fn is known via the set S = {ln, fn}n=1,N, we may score algorithm performance. Specifically, if ln = +1 is improperly associated with the l = −1 class (because cn(fn) < T), then the probability of correctly identifying a land mine is reduced. If ln = −1 is improperly associated with the l = +1 class (because cn(fn) > T), then a false alarm occurs since a clutter example is incorrectly denoted as a mine. By varying the threshold T, we consider a range of detection probabilities and false alarm rates (FARs), constituting a receiver operator characteristic (ROC). The FAR is quantified here as the number of false alarms per square meter interrogated.

[46] From Figure 5 we note that the overall classification performance for VV and HH polarizations are comparable. In addition, the results of fusing the HH and VV data are not significantly different from the VV or HH results considered separately. The results in Figure 5 have been computed via an RVM; we compare RVM and SVM performance in the next section.

4.3. Binary Classifier Without Prescreener

[47] The results in section 4.2 considered a prescreener. As indicated, since the plastic mines are characterized by such weak signatures, they are typically missed by the prescreener. A separate issue is whether the plastic land mine signature has sufficient distinctiveness to afford classification in spite of its relatively weak strength. Note that the features discussed in section 3.2 normalized the SAR data for a region under consideration, and therefore the weak signal strength can be overcome. In this section we therefore consider classifier performance, for both metal and plastic mines, without a prescreener. Now the probability of detection in the ROC goes to an upper bound of unity (no mines are “lost” to a prescreener).

[48] In Figure 6 we present results for metal mines only, at three depths: on the soil surface, flush buried and at a depth of 5 cm, with results presented for the temperate site (VV polarization). The purpose of this figure is to examine the relative performance of the RVM and SVM classifiers. Identical radial basis function (RBF) kernels K(f, fn) were used in both the RVM and SVM. In Figure 6 we note that the ROCs computed via the RVM and SVM are almost identical. The principal distinction between these results is the fact that the RVM yields a much sparser classifier than the SVM. In particular, on average 34% of the training examples are used as SVM support vectors, while 14% of the training examples are used as RVM relevance vectors. Enhanced sparseness is expected to yield better robustness to differences in the training and testing data [Burges, 1998; Cristianini and Shawe-Taylor, 2000; Scholkopf and Smola, 2002; Tipping, 2001], and therefore in the remaining examples, all results are presented for an RVM classifier.

Figure 6.

Binary classification results for metal mines only, without a prescreener, as computed via RVM and SVM classifiers. The mines are at three depths: on the soil surface, flush buried, and at a depth of 5 cm, with results presented for the temperate site (VV polarization).

[49] In Figure 7 we compare classification performance for both metal and plastic mines (each considered separately), for VV polarization, and for the temperate site. It is interesting to note that, while the performance of the classifier for the metal mines is typically better than that for the plastic mines, the results are comparable. This underscores that the plastic mines appear to have distinct (but weak) SAR signatures, as compared to clutter. However, it must be emphasized that the road lanes considered in both test sites were relatively smooth and devoid of significant subsurface clutter since this was a first test in a relatively benign environment. The ability to classify plastic mines is likely to deteriorate as the characteristics of the clutter become more random and nonstationary. In a separate example below we train the classifier on one site (arid) and test on the other (temperate). This provides a better examination of robustness since the training and testing data are distinct.

Figure 7.

Binary RVM classification performance for metal and plastic mines, for fusion of VV and HH polarization, without a prescreener. Mine depths: surface, flush buried, 5, 10, and 15 cm, with results corresponding to the temperate site.

[50] The results in Figure 7 considered mine depths of surface, flush buried, 5, 10, and 15 cm. To separate out the depth dependence of the classification performance, for the plastic mines in Figure 8, we show these same results for plastic land mines, but now separate ROCs are presented for plastic mines that are on the surface, flush buried, and 5 cm deep, with a comparison to considering all depths. We note, as expected, a substantial degradation in ROC performance when the deeper mines are considered, particularly for 0.04–0.07 false alarms per square meter.

Figure 8.

As in Figure 7, for the plastic mines, with results separated out as a function of mine depth.

4.4. Novelty Detector

[51] The previous results were based on a binary classifier, for which a priori knowledge of the target and clutter characteristics was assumed (when training). The range of possible clutter examples is essentially infinite. Therefore, in the context of a binary classifier, one must be concerned that the type of clutter used for algorithm training may be different from that observed during testing. In this case one would expect degradation in classifier performance. As indicated in section 3.5, the RVM classifier affords the opportunity to design a “novelty detector.”

[52] In the next set of results the classifier is only trained using data associated with mines, and no clutter data are observed prior to testing. Specifically, along the lines of the discussion in section 4.1, let So = {fn}equation image represent the set of features associated with mines only, where we assume No such examples. We use No-1 of these feature vectors to design an RVM novelty detector, and this detector is then applied to the remaining mine example in So and to all of the clutter examples (not seen while training). This is done No times, for all No mine signatures, thereby generating RVM novelty detector outputs for all target and clutter examples (in fact, if there are Nc clutter examples, this process generates NoNc outputs for the clutter examples). The RVM outputs are thresholded as in the binary case, thereby yielding a ROC curve.

[53] To limit the number of plots, we again show results for the temperate site, although analogous behavior has been observed for the arid site. In Figure 9 we consider plastic and metal mines (separately) for all depths (surface, flush buried, 5, 10, and 15 cm). These results are for the fusion of VV and HH SAR imagery. A careful comparison of Figures 7 and 9 reveals that the binary classifier does, as expected, perform better than the novelty detector. However, the novelty detector performs only slightly worse (comparable FAR when the detection probability is maximized) despite the fact that it did not use clutter data when training. Such relative binary detector and novelty detector performance was observed consistently for all the data considered here (as a function of test site, mine type, polarization, and target depth).

Figure 9.

Novelty detector RVM results for plastic and metal mines, for fusion of VV and HH imagery. Mine depths: surface, flush buried, 5, 10, and 15 cm. These results correspond to the temperate site.

4.5. Examination of Robustness

[54] All of the previous results corresponded to training and testing at the same site (arid or template). It may be argued that these results are optimistic in the sense that the training and testing data are likely similar. To provide a better test of algorithm robustness, we now consider training the algorithm using data from the arid site, with testing performed using data from the temperate site. This test is likely to provide a better measure of expected performance since it accounts for anticipated differences between training and testing data (here the training and testing data are entirely distinct).

[55] The ROC results for this case are depicted in Figure 10. In this example, all mines are buried: flush buried, 5, 10, and 15 cm deep (from the top of the mine). These results demonstrate the anticipated improved performance for metal mines but nevertheless reveal reasonable performance for plastic mines as well. These results are for a novelty detector (no training on nonmine signatures). Similar levels of performance were observed for the binary classifier. We focus on the novelty detector in Figure 10 because it is most realistic, in general because of clutter statistics variability; in these relatively benign tests the clutter was less variable, and therefore the binary classifier was relatively robust when trained/tested for different sites.

Figure 10.

Novelty detector RVM results for metal and plastic mines, with algorithm training based on data from the arid site, and testing performed using data from the temperate site.

4.6. Setting and Learning of Algorithm Parameters

[56] In the context of implementing the algorithms discussed above, one must set several parameters in the prescreener, and the classifier parameters must be learned in the training phase. It is of interest to examine the number of parameters to be set and learned. Concerning the parameters of the prescreener, as indicated in section 4.1, ∼10% of the data were used to design the template in Figure 2, and these data were not used in subsequent SVM/RVM training and testing. We set the thresholds used for the three prescreeners in section 3.1 by examining data from this same 10% subset.

[57] In the context of training an SVM or RVM, weights must be estimated for each feature vector in the training set (wn in equation (1)). The number of weights required for this purpose is different for each of the examples presented above. For example, consider the case of metal mines only at the arid site without a prescreener for mine depths of surface, flush, and 5 cm deep (Figure 6). In this case, 153 signatures pass the prescreening phase, and the number of training patterns in the leave-one-out procedure are therefore N = 152. Of the 153 signatures that pass the prescreener, 27 are associated with mines. As discussed in the work of Burges [1998], Cristianini and Shawe-Taylor [2000], and Scholkopf and Smola [2002], the SVM and RVM yield sparse classifiers, thus reducing the likelihood of over training. Of the total number of weights wn estimated by these algorithms, typically only ∼30% of these have nonzero weights for the SVM, and the RVM typically only requires ∼15% of the weights to be nonzero (for the data considered, although such percentages are consistent with previous studies [Tipping, 2001]). This algorithm sparseness essentially reduces the complexity of the classifier, yielding relatively simple decision surfaces in feature space, thereby mitigating overtraining.

5. Conclusions

[58] We have addressed the problem of sensing metal and plastic antitank land mines using a forward looking radar system. The focus has been on feature selection and subsequent classifier design. The features are based on a simple correlation procedure, yielding a low-dimensionality feature vector (three-dimensional for HH or VV polarization and six-dimensional for polarization fusion). Support vector machine (SVM) and relevance vector machine (RVM) classifiers have been examined, yielding virtually identical results. The advantages of the RVM are two-fold: (1) the number of RVM relevance vectors employed is consistently much smaller than the number of SVM support vectors, yielding a sparser classifier that is typically more robust to differences in training and testing data [Tipping, 2001]; and (2) the RVM is naturally amenable to design of a novelty detector, implying that the classifier is designed on the basis of training data from one class (mine data) while during testing being able to distinguish mines and clutter (the latter not observed during training). For the examples presented here the performance of the novelty detector was only slightly worse than that of the binary classifier (trained using mine and clutter data). Although one cannot prove it here with the limited data under study, it is anticipated that the novelty detector will generalize better to different sites with distinct clutter statistics. This is an issue worthy of further study, requiring a more extensive data base.


[59] Joel Kositsky of SRI International has provided significant insight over the course of numerous discussions. His inputs have contributed significantly to our research progress.