Aircraft tracking in infrared imagery with adaptive learning and interference suppression

Airborne target tracking is a crucial part of infrared imaging guidance. In contrast to visual tracking tasks, the target in infrared imagery shows different visual patterns. Moreover, severe background clutter and fre- quent occlusion caused by infrared interference make it a challenging task. Recently, discriminative correlation ﬁlter (DCF)-based trackers have shown impressive performance. However, the features adopted in DCF-based trackers are either handcrafted or pre-trained from a dif- ferent task, which do not closely intertwine with the domain-speciﬁc video. To settle this problem, it is proposed to make full use of online training to learn domain-speciﬁc features. By integrating the correlation ﬁlter layer into the convolutional neural networks, the feature do- main and the response maps of the DCF can be optimized iteratively in the initial frame. Meanwhile, utilizing the measurement of the re- sponse maps’ peak strength, further adjustments to the feature domain can be made to achieve a sharper peak and suppress the interference region during the tracking process. Evaluations are conducted to prove the validity of proposed aircraft-tracking algorithm.

Airborne target tracking is a crucial part of infrared imaging guidance. In contrast to visual tracking tasks, the target in infrared imagery shows different visual patterns. Moreover, severe background clutter and frequent occlusion caused by infrared interference make it a challenging task. Recently, discriminative correlation filter (DCF)-based trackers have shown impressive performance. However, the features adopted in DCF-based trackers are either handcrafted or pre-trained from a different task, which do not closely intertwine with the domain-specific video. To settle this problem, it is proposed to make full use of online training to learn domain-specific features. By integrating the correlation filter layer into the convolutional neural networks, the feature domain and the response maps of the DCF can be optimized iteratively in the initial frame. Meanwhile, utilizing the measurement of the response maps' peak strength, further adjustments to the feature domain can be made to achieve a sharper peak and suppress the interference region during the tracking process. Evaluations are conducted to prove the validity of proposed aircraft-tracking algorithm.
Introduction: Airborne target tracking based on infrared technology is capable of working in various weather conditions. However, infrared images of the airborne target are often presented with low signal-to-noise ratios. In the meantime, the tracking process is accompanied by frequent interference caused by cloud or infrared decoys, as shown in Figure 1, making it a challenging task to maintain a robust aircraft tracking.
Recently, DCF-based trackers have achieved great success due to their robustness and high-speed performance. Feature representations play a critical role in DCF-based trackers. Bolme et al. [1] firstly apply correlation filter to visual tracking and adopt single-channel grayscale as feature representations. Henriques et al. [2] extend the work of [1] by employing the multi-dimensional HOG features and achieve significant improvement. With the integration of convolutional neural networks (CNN) features, DCF based trackers have achieved a remarkable increase. Danelljan et al. propose a continuous formulation to fuse multiresolution CNN feature maps [3], achieving outstanding performance in the Visual Object Tracking Challenge.
For thermal infrared object tracking, Liu et al. [4] combine multilayer CNN features extracted from the pre-trained VGG-Net with kernelized correlation filters (KCF) [2] trackers, and the response maps from multiple trackers are fused to consolidate the performance of the ensemble tracker. To obtain richer feature representations, Li et al. [5] design a Siamese CNN that integrates multi-level features, and propose a spatialaware network to further improve the performance. Further, they propose to learn two complementary feature models composed of infraredspecific discriminative features and fine-grained correlation features from a larger infrared dataset, providing a performance boost [6].
Despite the integration of CNN features provides an additional performance boost, the CNN features designed for them are usually pretrained on large datasets, which might not be optimal for aircraft tracking in infrared imagery. Motivated by the current progress of formulating the correlation filters as a differentiable layer and the training mechanism of correlation filters, we propose to combine the learning ability of CNN and high detection efficiency of kernelized correlation filters (KCF) in a unified framework to learn domain-specific features online which fit correlation filters. Compared with DCFNet [7], the proposed method does not require pre-training on large datasets. The shifted versions of the target in the initial frame are constructed as the training data. Thus the training of the network is consistent with the training of correlation filters. Therefore, the features obtained from the network are Fig. 1 The aircraft-tracking process faces the challenges of poor signal-tonoise ratios and frequent interferences tightly coupled with both the current video domain and the correlation filters-based tracker. The optimization of the network is based on the response maps of the correlation filters, which enables tailoring feature space for the current video domain in discriminative correlation filters framework. Furthermore, we can adaptively tune the network after encountering large fluctuation in the response maps caused by background clutters or infrared decoys to suppress the interference region. Experiments are conducted to prove the tracking performance improvement in contrast to the baseline method.
Aircraft tracking algorithm: In the implementation of DCF, the correlation filters w are obtained based on the following function: where f (x) denotes the feature of the image patch x, λ refers to a regularization factor, and y stands for a Gaussian label. The solution can be derived as, where F (·) represents the FFT, * denotes the complex conjugate, and stands for the element-wise product. After obtaining the correlation filters F (w) and the feature f (z) of the new frame, the aircraft's position is estimated based on the response map r, which is given by, where F −1 (·) refers to the inverse FFT. The features space f (·) adopted in Equation (1) is crucial to the generation of response map. For an ideal feature space, the response maps should approximate the Gaussian labels with high response values to the target and vice versa. By integrating the correlation filter layer into the convolutional neural networks, we can take advantage of the learning ability of CNN to project x into various feature spaces, and choose a feature space that best fits the current video domain. In our formulation of the training process, the inputs of the network consist of a fixed original sample x 0 and a shifted sample z i . The network is implemented via Siamese architecture with tied parameters. The training is achieved by minimizing: where y i and M stand for the Gaussian label and the count of shifted samples. According to [7], the backpropagation gradients with respect to f (x 0 ) and f (z i ) are formulated as follows, After training the Siamese network in the initial frame, we adopt one branch to extract the features of subsequent frames. During the tracking process, the aircraft may encounter clouds or infrared decoys, resulting in large fluctuations in the response map. To settle this problem, we propose to finetune the network according to the response maps' peak strength and employ peak to sidelobe ratio [1] as the measurement, which is given by, where g max is the maximum value of the score maps, σ s and μ s represent the standard deviation and the mean value of the sidelobe surrounding the peak. Then we use the variance of PSR to measure the fluctuation. The variance of the historical frames' PSR σ 2 n−1 and the variance of the current frame's PSR σ 2 n are defined as, If σ 2 n exceeds σ 2 n−1 and the PSR of the current frame is lower than the historical average, the network would be further adjusted to suppress the interference region. The overall procedure of the aircraft-tracking algorithm is shown in Figure 2.
The operation of interference suppression is mainly to suppress the response value of the background region when a large fluctuation in the response map is observed. Since the generation of the response map is closely related to the feature space, the feature space can be adjusted according to the guidance of the ideal response map. After adjusting the feature space, the response value of the actual response map in the target region is enhanced, while it in the background region gets suppressed. The operation of interference suppression guides the adjustment of the feature space during the tracking process, improves the adaptability of the tracker to environmental changes, reduces the risk that the response value of the background region exceeds the response value of the target region, and thereby improves the tracking performance.
Experimental results: For the feature learning process, we adopt a learning rate of 1e-3 to minimize the loss function in Equation (4). The number of shifted samples and maximum iteration is set to 256 and 10, respectively. We adopt the Xavier method to initialize the weights of the network. Training is carried out using stochastic gradient descent (SGD) optimization with a momentum of 0.9 and a weight decay of 0.0005. For the correlation filter part, the regularization parameter λ is 1e-4. The learning rate of model update is 0.02, which is consistent with the parameters in KCF. The experiments are conducted on real infrared imagery and synthetic infrared imagery. The real infrared imagery    Table 1. We implement our tracker in MATLAB using the MatConvNet toolbox. The experiments are performed on a PC with an Intel i3-4030U 1.9 GHz CPU with 4 GB of RAM.
To design a shallow network with fewer parameters, we conduct experiments with basic network modules, including convolution layer, local response normalization (LRN), and rectified linear unit (ReLU). The parameters of the convolutional layers are listed in Table 2. The network with convolutional layer1, LRN, Relu, and convolutional layer2 achieves better tracking performance, as shown in Figure 3, and is selected as the feature extraction network.
We evaluate the proposed aircraft tracking algorithm, which is termed as KCF_AL (KCF with adaptive learning) and KCF_ALIS (KCF with adaptive learning and interference suppression), and KCF [2], ECO [3], DCFNet [7], SiamFC [8], SiamRPN [9], DaSiamRPN [10], HCFT [11], BACF [12], TLD [13], Struck [14], OAB [15], SemiT [17], and the trackers in tracking benchmark [16]. Overall performance is summarized by precision plots and success plots [16], as shown in Figure 4. Compared with the baseline method (KCF), the tracking performance of the proposed trackers (KCF_AL, KCF_ALIS) achieves significant improvement. For a more intuitive analysis, we visualize the tracking results and response maps of the KCF and KCF_AL, and KCF_ALIS, as shown in Figure 5. After replacing Hog features (KCF) with the proposed feature learning module (KCF_AL, KCF_ALIS), the features learned through the network assist the tracker in discriminating the target from the background clutters. In the KCF_ALIS, the module of interference suppression is activated after frame 123. Although the response maps of the KCF_AL and KCF_ALIS point to the same regions, the KCF_ALIS has lower response values in the background area, consolidating the tracking results of KCF_AL. In addition, a qualitative comparison is conducted to further analyze the effect of interference  Figure 6. In frame 71, the peak strength is weakened due to the interference, showing a multimodal distribution. In subsequent frame 72, the response value of the interference region surpasses the region of the target without interference suppression. As a result, the tracking results drift to the region of the decoy. In contrast, the response value of the interference region decreases after suppressing the interference region, alleviating the model drift problem. The visualization of the tracking results is shown in Figure 7.
Conclusion: In this article, an aircraft-tracking algorithm based on kernelized correlation filters is presented. In order to learn features fitting the current video domain, we integrate a correlation filter layer into the KCF tracking framework. Thus we can optimize the feature domain and the response map in a unified framework. To address the issue of interference, we propose to further tune the network to suppress the interference region based on the peak strength of the response maps. The combination of adaptive learning in the initial frame and interference suppression during the tracking process improves the tracking performance of the baseline method.