Salient target detection in hyperspectral image based on visual attention

Salient target detection in hyperspectral image is a signiﬁcant task in image segmentation, target tracking, image classiﬁcation and so on. Many existing saliency detection algorithms for hyperspectral image detection cannot present the boundary of the salient target well and the description of the target is not enough. A method based on visual attention to detect the salient target of hyperspectral image is proposed in this paper. In this method, frequency-tuned (FT) salient detection model is combined with spectral salient to detect target in hyperspectral image. FT model is used to get target with clear border, and spectral information is made full use of to improve the accuracy of target detection. Firstly, FT is used to detect saliency of hyperspectral image and the saliency map is generated. Then, spectral information of the hyperspectral image is measured by similarity, and the spectral saliency is obtained by calculating spectral angle distance between the spectral vectors. Finally, the FT’s saliency map and the spectral saliency map are combined to form the ﬁnal saliency target maps. Experimental results show that our method is superior to other meth-ods in saliency target detection of hyperspectral image, and the precision-recall curve and F-measure are better as well.


INTRODUCTION
Hyperspectral image contains tens or hundreds of contiguous narrow spectral bands cantered on uniformly distributed wavelengths ranging from the visible to near-infrared spectrum [1,2].
This spectral response is associated to the object materials in the imaging scene [3], which makes it widely used in some areas such as remote sensing [4,5], object detection [6,7], agricultural application [8,9] and other fields. However, the huge spectral information increases the difficulty of information processing. To reduce the redundant information, many hyperspectral applications get region of interest (ROI) through visual saliency detection [10]. The rich spectral information is an important tool to solve the problem of saliency detection and other visual tasks. The human visual system can quickly find some conspicuous areas in natural scenes and focus attention on important parts. The saliency detection models [11][12][13] are to simulate the visual attention mechanism [14] to explain a given scene, which has a broad practical value. Visual attention is divided into two mechanisms: bottom-up and top-down. Bottom-up is a data-driven attention mechanism, which uses low-level features such as the colour, lightness, and edges of the image to determine the difference between the target area and its surrounding, and then calculates the saliency of the image area, and guide people's viewpoints to prominent areas in the scene [15]. The top-down mechanism is a goal-driven process that uses human cognition to identify the current goal and obtain a salient goal map. In the development of saliency detection, Itti method can be called pioneering work, in which Itti et al. [11] extract low-level features, such as colour, intensity, and orientation, use centre-surround difference operations at multiple scales to generate feature maps that reflect the saliency measure, and combine these feature maps to obtain the final saliency map. Based on the Itti's method, many subsequent literatures [16][17][18] were inspired and the saliency detection has been developed in the long term. The field of computer vision mainly focuses on visual saliency from the bottom-up, and the visual saliency from topdown is not sufficiently understood about the structure of the human brain and cannot reveal the principle of action deeply. Research in the field of computer vision is also corresponding rarely.
Visual saliency detection can quickly select region of interest from a natural scene, thereby facilitating subsequent calculations. Therefore, it is also beneficial to some applications of hyperspectral data processing. In recent years, some researchers have applied the saliency models to the target detection of hyperspectral image, and achieved good results. The salient target detection of hyperspectral image has gradually received attention. Some studies [19][20][21][22][23] have proved that using the spectral feature provided by hyperspectral image can improve the accuracy of saliency target detection. Most of these models are based on extensions of the Itti model [19,20]. By using the saliency detection method to process the hyperspectral image, the region of interest can be quickly obtained. At the same time, the spectral information of the salient object is used to provide more useful information for the detection of the salient target. However, there are existing some problems such as incomplete detection of objects and blurred target edges.

Proposed work and contributions
This paper aims to efficiently extract salient target region from hyperspectral image. In this paper, the frequency-tuned (FT) saliency detection model is combined with the spectral feature of hyperspectral image to obtain saliency object map with obvious boundary. Specifically, first of all, frequency-tuned saliency detection algorithm is used to detect hyperspectral image, which can quickly and effectively obtain the salient target area with clear boundary; then, the spectral information of hyperspectral image is measured by similarity measurement, and spectral angle distance between bands is calculated to obtain spectral saliency map. The spectral information is used to supplement the process of target detection to obtain more accurate target. Finally, the saliency map of hyperspectral image is obtained by fusing saliency map with spectral saliency map. Compared to traditional pixel-level operations, this method takes full advantage of the spectral and spatial information of the target. In addition, we have proved the effectiveness and superiority of our method through experiments. The contributions of this paper are as follows: 1. By introducing FT model into hyperspectral image saliency detection, target with clear boundary can be obtained; 2. By combining FT saliency detection model with spectral saliency, spectral characteristics is made full use of to get more accurate targets; The rest of this paper is arranged as follows: the related work is introduced in Section 2 and the detail of our method is presented in Section 3. Experiments are described and analysed in Section 4 and the conclusions are drawn in Section 5.

RELATED WORK
In this section, we will briefly introduce some classic saliency detection methods and related literatures on salient target detection in hyperspectral image.

Saliency detection method
Traditional saliency detection methods can be divided into three categories: biologically motivated, purely computational, and involve both aspects. For biological-based methods, the typical representative is the Itti model. Itti et al. [11] extract the colour feature, intensity feature, and orientation feature of the image, and use the centre-surrounded difference of multi-scale image features to define the saliency of each feature of the image, and each conspicuity maps are merged to obtain the final saliency map. According to the method of Itti model, Frintrop et al. [12] proposed a new method. They used the square filters to calculate the centre-surrounding differences, and used the integral image to speed up the calculation. For purely computational methods, Hou et al. [13] proposed a saliency model based on spectral residuals, focusing on solving the problem of image saliency calculation from the perspective of exploring background attributes. Gao and Vasconcelos [16] obtained the saliency of the image by maximizing the mutual information between the centre and surrounding feature distributions. Both the Itti model and others belong to the saliency detection of predicted points. In the initial visual saliency detection, saliency is defined as the prediction of a fixed point, and the detected result is the approximate salient point area. Achanta et al. [17] proposed a saliency area detection model based on frequencytuning, which uses the difference between pixel colour and average image colour to directly define pixel saliency. Here, the significance of the proposed saliency map lies in its practical application. The saliency detection has shifted from the traditional evaluation of gaze point prediction to the prediction of the object region binary map, making the saliency detection further back to the actual in application. The third method is realized by combining biological model and computational model. Harel et al. [18] used Itti's method to generate feature maps, and then used graph theory-based methods to perform normalization operations.

Salient objects detection in hyperspectral image
So far, saliency target detection has been well developed and applied, but it is still a challenging topic to apply it to hyperspectral image target detection. Liang Jie et al. [19] first tried to FIGURE 1 Framework of our proposed model combine hyperspectral data with salient target detection by converting HSI into a three-color image, and then applying Itti's model to the three-color image. At the same time, it explained the spectral band opponent and original spectral characteristics to replace the colour opponent components in the Itti model. Y. Cao et al. [20] proposed a method to detect targets in hyperspectral image, by extending Itti's model to calculate salient targets in hyperspectral image. It uses four different features (the three component features, direction, spectral angle, and visible spectral band components) to get four feature maps, and then sum them to get a saliency map to get a more robust result. This method is more suitable for high dimensional reflectance vector. H. Yan et al. [21] used the spectral gradient contrast method to calculate the salient objects of the hyperspectral image, and solved the problem of high-contrast edge sensitivity caused by the detection of the Itti model. Steven et al. [22] combined Itti model with the dimensionality reduction method of principal component analysis for visualization, and defined the concept of spectral saliency. In the method, the notion of spectral saliency was first defined as the extent to which a certain group of pixels stood out from an image in terms of reflectance. With this definition, the range of saliency detection can be extended from ordinary image to spectral image. To sum up, the existing methods of hyperspectral image saliency detection also have some shortcomings such as incomplete detection object, fuzzy object boundary and so on.

PROPOSED METHODOLOGY
The process of the proposed salient target detection method in hyperspectral image based on visual attention is shown in Figure 1. Firstly, for an input hyperspectral image, FT algorithm is used to detect the salient region of the hyperspectral image and the saliency map is obtained; at the same time, the spectral angular distance is used to measure the similarity of the spectral information, and the spectral saliency map is achieved; then, the FT's saliency target map is fused with the spectral saliency map to get the final saliency map of the hyperspectral image.

FT's method
So far, many saliency detection models have some disadvantages such as low resolution (such as Itti model, SR model), unclear boundary definition (such as CA model), and slow operation efficiency (such as CA model) when generating saliency map [17]. The main reason is the use of inappropriate spatial frequency ranges. FT saliency detector has some advantages such as emphasizing the largest salient objects, highlighting the entire salient areas uniformly, establishing clear salient objects boundaries, ignoring high frequencies and outputting full resolution saliency map efficiently. To get better results, FT model is used for saliency detection in hyperspectral image in our proposed method. The colour and lightness feature are utilized by the FT algorithm to calculate the saliency of each pixel relative to its neighbourhood. The whole image is used as neighbourhood and more spatial frequency information can be obtained. Since the region of interest contains a wide range of frequencies, the algorithm uses band-pass filter to preserve a large amount of frequency content. The algorithm takes W lc as the low frequency cut-off value and W hc as the high frequency cut-off value. In order to highlight those large regions of interest, W lc must be low enough. In order to make the boundary clear, high frequencies need to be preserved. However, in order to avoid the disturbance of noise, texture, the highest frequencies need to be neglected. Thus, the range of [W lc , W hc ] is wide, and combining the output of several band pass filters is appropriate.
Difference of Gaussian (DoG) filter is selected as band pass filtering, which is given by Equation (1).
Based on the above-mentioned arguments, selecting the appropriate parameters of 1 and 2 is important to determine the band-pass filter to retain the frequencies for saliency values. When the standard deviation of two Gaussian functions is 1:1.6 [24], it is considered to be the most suitable operator for detecting lightness change. However, a filter with practical length may not satisfy needs, and 1 is set to infinity. In order to avoid noise, a small Gaussian kernel is used.
In our method, salient region detection of the hyperspectral image based FT's saliency maps can be carried out with the following steps.
First, the Gaussian filter is used to process the hyperspectral image, and a Gaussian blurred image I g which eliminate the adverse effect of noise on the image processing process is obtained; Second, the image I g is transformed into Lab colour space. Here, we use the lightness feature and colour feature to detect the saliency of the sequence.
where G is a Gaussian filter, the size is generally 3 × 3 or 5 × 5, we choose 5 × 5 here as in Achanta et al. [17] and in order to objectively evaluate our method subsequently. Its principal formula is as follows: where σ is the standards deviation of Gaussian. Then, calculate the mean value of the entire image corresponding to each feature I = [L , a , b ] T , assuming the size of the input image is m × n ×3; where L ,a ,b are the mean value of each feature in the Lab colour space.
Next, use Euclidean distance to calculate the difference between each feature pixel in Lab colour space and the average value of the entire image, as the saliency value of the pixel, as shown in Equations (5)- (7).
Finally, the final saliency value of the pixel is obtained through feature fusion, and a saliency map is generated. In the process of generating the saliency map, because some of the pixel values calculated by the Equation (8) are bigger than 255, the image cannot be displayed directly. Therefore, normalization operations are generally performed here to eliminate adverse effects.
where S L (i, j ), S a (i, j ), S b (i, j ) are the saliency values of the pixel under each feature.

The saliency map of spectral angle
The spectral data contains a lot of information that is difficult for human eyes to observe. When spectral saliency is used, the rich information embedded in the spectral data can be fully exploited [25]. According to general multi-scale operations, the difference between the spectral response and its neighbourhood can be calculated. In order to make use of spectral feature to make-up the information of the detected target, the spectral saliency is used in our method and can be obtained in the following steps.
To calculate the spectral saliency map. Spectral angular distance (SAD) can be used to measure the similarity between two spectral vectors M c and M s (hyperspectral image is a threedimensional data cube, in which the third dimension is the spectral dimension, and the response of any point in space on different spectral dimensions forms a curve, which is the spectral vector) to obtain spectral saliency. SAD is computed via Equation (9). Normalization of the spectral feature map. The "centresurround" difference is used to calculate the spectral characteristics through multi-scale operation. The spectral feature maps of different scales are fused and normalized to obtain the spectral feature map. The spectral saliency map is obtained as Equation (10). (SAD (c, s)) (10) where  (.) is a map normalization operator proposed by Itti et al, which globally promotes maps where a small number of conspicuous locations is present, while globally suppressing maps which contain numerous comparable peak responses. Centersurround is implemented as the difference between fine and coarse scales: The centre is a pixel at scale c {2, 3, 4}, and the surround is the corresponding pixel at scale s = c + , with {3, 4}.

Generation of saliency maps
The map of salient target region and the spectral saliency map obtained using spectral information are combined to generate the final saliency map, as shown in Equation (11).
where Sal FT is the normalized FT's saliency target map and Sal Spectral is the normalized spectral saliency map. The final saliency map not only retains clear boundaries, but also makes the target object more complete with the supplement of spectral information. The introduction of spectral information shows that not only the visual cues can be obtained according to colour and lightness, but also the corresponding material properties can be extracted, which provides additional information beyond human visual ability for visual saliency detection.

Dataset description
HS-SOD hyperspectral image dataset [26,27] is used to evaluate our proposed method. HS-SOD data set is a hyperspectral saliency object detection data, which contains 60 hyperspectral images. There are three parts in the dataset. The first part is the hyperspectral image which is saved in the ".mat" format and the spectral range is defined as 380-780 nm. The second part is sRGB colour images corresponding to hyperspectral images and the third part is the corresponding ground-truth salient object binary images. Part of the image is shown in Figure 2(a).

Visual effect
Images from HS-SOD dataset are used to evaluate the performance of the proposed salient target detection method in hyperspectral image. Our method is the combination of FT algorithm and spectral saliency, which are called FT-SAD(FS) in this experiment. To compare our experiment with results of the FT-ED (FE) (which combines the spectral saliency map calculated using the Euclidean distance with the FT model) and other combinations which further prove the superiority of the FS algorithm. Salient target detection results by Itti [11], FT [17], SSO [19] and our proposed method are shown in Figure 2. Itti [11] used Gaussian sampling method to construct the Gaussian pyramid of colour, lightness and direction of the image, and then used the Gaussian pyramid to calculate the image feature maps. The final visual saliency map is obtained by fusing the feature maps of different scales. FT [17] used difference of Gaussian (DOG) to calculate the saliency map. SSO [19] combined the saliency detection model Itti with spectral saliency and tested various methods for hyperspectral image of natural scenes. In Figure 2, images in column (a) are the original images; images in column (b) are the ground truth; images in column (c) are salient target detection results based on Itti model [11]; images in columns (d)-(f) are salient target detection results based on SED-OCM-GS [19], SED-OCM-SAD [19], and SSO [19]; images in column (g) are salient target detection results based on FT model; images in column(h), (j) are the results based on the combination of FT and other spectral saliency; images in column (i) are salient target detection results based on our proposed method.
As can be seen from Figure 2, all the methods can detect saliency targets. However, the results shown in columns (c)-(f) are fuzzy and unclear. The results in column (g) have clear boundary, but the description of the object is not comprehensive because only FT saliency detection algorithm is used to detect hyperspectral image. The results in column (i) have clear boundary and the description of the target object is clearer. The main reason is that we use FT saliency model and add the detection of spectral information. The simple saliency detection algorithm can quickly find the saliency target, but the description of the target is not complete. When the spectral information is added, the detection of saliency target becomes more sufficient. In general, by combining FT model with the spectral characteristics, our method can get better visual effect results than before, and can improve the accuracy of object detection as well. However, for the image containing people (the fifth picture) or the image with complex background (the sixth picture), the target cannot be detected well and the background will interfere with the detection process. As can be seen from Figure 2, FS has the most outstanding performance of saliency target.

Objective evaluation
Precision and recall curve are calculated to provide quantitative analysis of Itti, SSO, FT, FE, FS, FES methods. The precision is calculated as the percentage of real object pixels in all detected pixels, as shown in Equation (12). The recall calculates the percentage of real object pixels that have been detected, as shown in Equation (13).
Recall = tp tp + fn (13) where tp represents the true positive and fn represents the false negative. Precision and recall curves of different methods are shown in Figure 3. From Figure 3, we can see that our proposed FS method can get high precision and better recall rate, which are obviously superior to other methods. The reason is the introduction of spectral feature, which can help improving the results of salient objects. That is, by combing the traditional saliency detection method with spectral information, the results of our proposed method is better than the traditional saliency detection in natural images.
In some situation, the recall cannot comprehensively evaluate the proposed method. As discussed by Liu et al. [28], recall rate is not as important as precision for attention detection. For example, 100% recall rate can be achieved by simply selecting the whole image. Therefore, F-measure is used to further analyse the superiority of our experiment. F-measure is the weighted harmonic mean of recall and precision under non-negative weight , which is shown in Equation (14). In our experiment, 2 is set to 0.3 as in Achanta et al. [17] to weigh precision more than recall. Figure 4 shows the F-measure results of our method and other methods. As shown in Figure 4, our proposed FS method is superior to the traditional saliency detection algorithms and some previous algorithms, which indicate the advantages of combining spectral information measures for saliency detection.
Our proposed FS algorithm combines the traditional saliency detection model FT algorithm with the calculation of the spectral saliency, so that our final experimental effect has been better performed. Firstly, FT algorithm allows us to obtain a target object with a clearly defined boundary, which improves the accuracy of target detection; second, because of the addition of spectral information, it further complements the target information. Therefore, the combination of those improves the accuracy of target detection and is better than other methods in experimental evaluation performance. Above all, in terms of visual effects, compared with the previous method, the FS method obtains clearer target edge, which can make the target detection more accurate; compared with only using the saliency detection method FT model, the FS method highlights the complete target more comprehensively; secondly, from a subjective point of view, we quantitatively analysed the precision-recall and F-measure of FS method. It can be seen from Figures 3 and 4 that the FS method has better performance than other methods.

CONCLUSIONS
A method based on visual attention to detect the saliency target of hyperspectral image is proposed in this paper. In this method, we combine the frequency-tuned model with the spectral feature of hyperspectral image. FT algorithm is used to detect the salient region of the hyperspectral image to get the saliency map and the spectral angular distance is used to measure the similarity of the spectral information to get the spectral saliency map. The final saliency map of hyperspectral image is obtained by the fusion of FT's saliency target map and spectral saliency map.
Our proposed method has two advantages. First, by introducing FT model, the visual saliency detection algorithm can quickly obtain the salient target region with good boundary. Second, the combination of FT saliency detection model with spectral saliency can make the target detection more accurate because the spectral information can be used as supplement information. Experiments show the visual and objective effectiveness of the proposed method in salient target detection. However, the proposed method cannot effectively suppress the interference of background to the detection target in the case of complex background. In the future, we will make efforts to improve the detection of saliency objects in hyperspectral image while suppressing the influence of background and try to use other saliency models in hyperspectral target detection.