Infrared imaging enhancement through local window-based saliency extraction with spatial weight

Infrared image enhancement is an effective way to solve contrast reduction or details degra-dation in infrared imagery. An infrared enhancement approach based on local saliency extraction is proposed here. First, saliency maps are extracted within a local window by combining spatial weight. Second, with the change of the window size, potential targets and details in different sizes can be extracted. Considering window sizes as the scales, the saliency maps are obtained and infrared images are enhanced at different scales, and ﬁnally, multi-scale fusion is used to achieve the enhancement. Eight popular infrared enhancement approaches are introduced for comparison. Subjective qualitative observation experiments show that our strategy based on local saliency analysis and multi-scale fusion can well extract potential targets and areas of varying size from the source images with different sizes, to obtain a good enhancement effect. Meanwhile, we introduce six objective evaluation methods to measure the results, and the evaluation data to prove the effectiveness of the proposed algorithm. The experiments also indicate the real-time processing capabil-ity of the proposed method. The proposed method is ﬁnally well applied in the hardware system and shows its good performance.


INTRODUCTION
Because of the characteristics of non-active imaging, infrared imaging is widely applied in modern military and civilian fields [1,2]. However, infrared imaging also has disadvantages, such as low target-background contrast, blurred or degraded target edges and details. Therefore, the infrared image enhancement method is a very effective way to improve image quality. The research on infrared image enhancement is a hot topic, which can be roughly divided into spatial domain enhancement and frequency domain enhancement [3][4][5]. Because of fast implementation, histogram equalization (HE)-based methods have been successfully applied. HE methods usually rescale the dynamic range and grayscale histogram distribution [6,7]. However, the HE approaches would lead to weak enhancement, loss of detailed information, and amplification of image noise. Considering the above disadvantages of HE, some improved algorithms are designed, including enhancement method using an adjacent-blocks-based modification for local HE [8], contrast limited adaptive histogram equalization (CLAHE) [9], adaptive gamma correction based on cumulative HE approach [10] and adaptive histogram equalization and brightness correction (AHPBC) [11]. To enhance the contrast and details, Ashiba et al. improve and propose three ideas for infrared enhancement [12]: The first approach makes use of gamma correction with histogram matching, the second one depends on hybrid gamma correction with CLAHE, the third method utilises a trilateral enhancement which includes three stages, segmentation, enhancement and sharpening. Those methods would well highlight the target areas, but they would create some negative effects, such as enhanced background and amplified noise. For the problem of low dynamic range, researchers propose a local edge-preserving filter-based method (LDPF) [13], which could well highlight the details. In LDPF, the authors use entropyguided gamma correction to adjust the base layer, but gamma correction may amplify some noise and make the image distorted. Moreover, to improve the visual effects of degradation, a water surface infrared image enhancement method based on radiative transfer theory is proposed [14]. Some multi-level or multi-scale-based spatial domain enhancement methods like the multi-scale top-hat transform (MSTH) method [15], multi-level image enhancement [16] and sub-band decomposition [17], all can produce results with good visual effect. However, these algorithms may produce an unsatisfactory result, amplifying the grey levels of the target area and introducing noise. And those multi-level or multi-scale-based methods work relative slowly. Moreover, image enhancement approaches based on convolutional neural network (CNN) and deep learning are studied recently [18][19][20]. Inspired by Retinex theory, Zhang and his partners build a simple yet effective network for kindling the darkness (KinD) [18], and it performs well in contrast and detail enhancement. An image enhancement using zero-reference deep curve estimation (Zero-DCE) is proposed [20], which formulates enhancement as a task of image-specific curve estimation with a deep network. Kuang et al. mention a single infrared image enhancement using a deep CNN [19], which produces a result with good contrast. The power of CNN and deep learning is high, but they need accurate, proper and enough data to train. The data in our infrared enhancement may not satisfy these demands, the deep learning-based method needs to be improved with more data in real applications.
The visual effect is one of the most important factors for infrared enhancement. Studying from the human visual mechanisms, saliency analysis can reflect the ability of objects in the scene to attract visual attention. Thus, visual saliency is introduced into infrared enhancement. Visual saliency is a research hotspot in the field of computer vision in recent years [21]. Now, many studies show that this theory has been applied well in the field of infrared enhancement and further studies, including object detection [22], image fusion and image enhancement [23].
Here, we propose a local saliency extraction-based infrared image enhancement method. The first main contribution and novelty is local-window with spatial weight-based grey distance idea which is utilised for saliency extraction. Second, to emphasize those potential targets and details with different sizes, we achieve enhancement through fusion at different scales based on multi-window saliency extraction.

THEORY AND METHOD
When observing a natural scene or an image, the human visual system (HVS) will quickly watch through the whole scene or image and focus on what we concern [24]. To quantitatively measure how much those areas attract the attention of HVS, researchers can use different weight values for those regions to form a map of the same size, which is named visual saliency map. Simulating the ability of HVS, we desire to design an algorithm to achieve visual saliency extraction to improve the effectiveness of our enhancement.
Here, saliency map V is obtained with our visual saliency extraction algorithm. V ∈ [0,1], denotes the weight distribution that image f attracted the attention of HVS. For an arbitrary pixel (i,j), the value V(i,j) closer to 1 indicates more attention of HVS, on the contrary, the value V(i,j) closer to 0 means the more ignored of HVS. We propose an algorithm based on pixel value distance and local window with spatial weight.

Local window-based saliency extraction
Since HVS is sensitive to image contrast, researchers present a histogram-based contrast (HC) method to get a saliency map based on colour statistics of the original image [25], and they operate the algorithm in L*a*b* space. Inspired by this work, we prefer to use pixel value distance in infrared images to achieve saliency extraction. The saliency value V(i,j) at pixel (i,j) is defined as: where f denotes the original image, and (x,y) is an arbitrary pixel within f. f i j and f xy represent the pixel value or grey value at the corresponding position or pixel, ∀(x, y) ∈ f means any arbitrary pixel (x,y) in f. So, according to Equation (1), each pixel (x,y) that satisfies the condition( . D g (⋅) means the grey value or pixel value distance between the two in a bracket, that means There are potential targets with different sizes, so we intend to highlight features of different targets, especially those with small sizes. Since global-based saliency extraction determined by Equation (1) cannot be adjusted for different sizes, this kind of approach would be out of action sometimes. We expect to design a local-based strategy. In addition, considering the spatial correlation between neighbouring pixels, pixel-to-pixel correlations need to be taken into account when calculating pixel differences. Thus, we impose a spatial weight matrix W on localbased saliency extraction. The spatial relationship can be found in Figure 1, and Equation (1) is rewritten as follows: where U denotes a local window with a size of M×M, whose centre is (i, j). AndΩdenotes a local disk-like region, whose radius is r, meanwhile, the centre is still (i, j). Ω ∈ U .C U Ω means the area U except Ω. f Ω denotes the average grey value within a local region Ω. In Equation (2), W (Ω, xy)represents the spatial weight for the position (x, y), which is determined by the spatial distance betweenΩand (x, y). Since the spatial correlation between neighbouring pixels, we consider the small weight in neighbours and large in distant. That means the weight of the nearest pixel should have the smallest weight. And for an arbitrary pixel (x, y) in Figure 1(b) outsideΩ, the correspondingW (Ω, xy)is defined as: where σ 2 is the dilatation factor, andD s (Ω, xy)denotes the spatial distance between Ω and pixel (x, y) using numbers of pixels. From the equation, we could find that the closer between (x,y) andΩ, the smallerW (Ω, xy)would be, and that means the smaller influence f xy affects f Ω which includes f i j . For the region, the weight of the nearest pixel should have the smallest weight. The nearest pixel is just one pixel away. This pixel (x,y) satisfiesD s (Ω, xy)=1. Little influence means small W. The maximum is 1, decrease one or two orders of magnitude is enough to satisfy small W to achieve little influence. So, 1 decrease one order of magnitude is 0.1, and 1 decrease two orders of magnitude is 0.01. We need W = 0.01-0.1. Finally, we select the middle value 0.05, which is determined as the weight for the nearest pixel fromΩin U. Thus, here W = 0.05, and σ 2 satisfy the following formula: Thus, σ 2 is fixed, σ 2 = 4.0697. The computation is operated within an M × M local window as shown in Figure 1. The saliency value V(i, j) is calculated as follows: U is segmented from image f as shown in Figure 1(a). • In the centre of U, extract a disk-like region Ω with radius r. • Using Equation (2), the saliency value of centre pixel is calculated for patch U, the value is treated as V(i, j).
Thus, following the above three steps, V(i, j) is obtained. To get the whole map V, we need to move the local window pixelby-pixel. Proper size M and r would highlight salient objects, regions and pixels. And M is an odd number. Finally, the function for V could be shortly rewritten as func(•): The final saliency map or matrix V needs to be normalized.

Analysis for local calculation
From Section 2.1, we have mentioned the local calculation and spatial weight. What is the role of the above two designs? We will briefly analyze this problem. Figure 2 shows examples of spatial weight matrixes, which are generated by Equation (3)   Examples of saliency maps are shown in Figure 3, which denotes our method and global idea [25], respectively. , respectively. And (e) is the global method's result. Those characteristics with different sizes can be well extracted with proper parameters under the local strategy, which performs well in the next image enhancement. And r = 1 is the most proper for our enhancement. We will further discuss this in Section 3. Thus, Equation (5) is further improved:

Enhancement
We can generate a saliency map V by utilizing the above theory. And in our design, those areas and pixels with relatively salient values should be highlighted. With the selection of the proper size of the local window, potential targets and details with different sizes could be well extracted. Thus, we expect a , respectively, (e) saliency map using global method [25] multi-window design to produce multi-saliency maps to achieve enhancement for the potential targets and details with different sizes. Different target regions can be highlighted at different scales, which are determined by the selection of local window M × M. For these multiple windows, we assume n scales, and a local window at each scale k (1 ≤ k ≤ n) is determined as follows: where M 0 represents the initial size or basic size, and ∆m is the incremental size step. Thus, using multiple saliency analysis, the kth (1 ≤ t ≤ n) saliency map could be produced according to Equation (6): E k denotes the intermediate result with the potential target region to be enhanced at the kth scale, and E k is defined by: where ∘ denotes component-wise multiplication operator. With changing of M k , potential target regions with different sizes are extracted for our enhancement. Furthermore, the intermediate result with enhanced target regions of multi-scale E s can be obtained as follow: Making use of Equation (10), we aim to enlarge the contrast of those potential target areas, and meanwhile, the general background information should also be kept. Finally, a fusion strategy is designed: where E denotes the final result we need, f is our original infrared image, E s is from Equation (10). 1 and 2 are adjusting factors. Smaller 1 and larger 2 will produce a result with brighter or blacker salient regions, which is particularly useful for further target detection and recognition. But Smaller 1 and larger 2 will lead to the poor visual effect of background regions. So, the two factors should be balanced. Usually, 1 = 2 = 0.5 can balance the enhancement and visual effect.

Implementation
In Figure 4, we have shown a flow diagram of the proposed approach. Our method mainly contains three steps as follows: The flow diagram of our approach Step 1. Generate Saliency maps for n scales according to Equation (8), referring to Section 2.1.
Step 2. Apply Equation (9) to produce images for enhancement of potential target regions. And obtain enhanced target regions of multi-scale utilizing Equation (10).
In our method, the parameters need to be set including local window M (initial M 0 , scales n, and the incremental size step ∆m) and r.

EXPERIMENTAL RESULTS AND ANALYSIS
We operate MATLAB R2017a on the personal computer with a dual-core CPU(2.2GHZ) for the experiment.
The parameters M and r should be set. The size of the window M is closely related to the locality of the target in the infrared image. A very small M cannot highlight the differences between centre pixels and surrounding pixels, while a very large M cannot help analyze local regions. Moreover, since we need a central pixel for the local window, the window size M should be an odd number. The parameter r mirrors the difference between the pixel values within r radius and those of the rest. If r becomes larger, the saliency map will be more blurred. A blurred saliency map is good for large objects, while a less blurred one is more suitable for small targets in the infrared image. According to our experience, we set local window M (initial M 0 = 15, scales n = 3 and the incremental size step ∆m = 10) and r = 1. We will discuss them in Section 4.
To prove the effectiveness of the proposed algorithm, eight popular infrared enhancement approaches are selected for comparison, including, HE [6], radiative transfer and edge weight analysis (RTEWA) [14], CLAHE) [9], multi-scale top-hat transform (MSTH) [15], adaptive histogram partition and brightness correction approach (AHPBC) [11], local edge preserving filter method (LDPF) [13], and two deep learning-based methods KinD [18] and Zero-DCE [20]. Because of the lack of enhanced versions of low-contrast infrared images, usually visible images are used for training for deep learning-based methods. In this paper, we just use the original data of the two approaches as they provide the codes and training process.
Meanwhile, to verify the nine algorithms, six images are adopted just as shown in Figure 5, which are collected from open-source datasets, especially from [14,15,26] . Those images contain different potential targets of different sizes. We have tried our best to adjust the parameters for those six methods to get the best performance. These potential targets include buildings, bridge, ships, vehicles and persons, etc. Figures 6-7 show the corresponding results.

Subjective qualitative observation
HVS is one of the most valid methods to evaluate the quality of enhanced images. Thus, we analyze subjective qualitative observation of results produced by nine algorithms for six images. As shown in Figure 5, there are six original images, which are collected from open-source datasets, especially from [14,15,26].  Figure 6, (a0) (b0) (c0) are three source images named buildings on the other side, bridge and traffic, and road and buildings. In Figure 7, (a0) (b0) (c0) are three source images named a ship, a yacht, and two yachts. In both two figures, (ai) (bi) (ci) (i = 1, 2,…,9) are the corresponding results using RTEWA method, HE method, CLAHE method, MSTH method, AHPBC method, LDPF method, KinD method, Zero-DCE method, and the proposed algorithm, respectively. show the enhanced images that corresponding to Figure 6(a0) using nine methods. According to these results, (a1) looks well with appropriate contrast, but with unclear details. And there are some shadows in (a1). In (a2) and (a3), the methods of HE and CLAHE leads to worse visual effects, as some regions seem over-enhanced. (a4) is better than (a2) and (a3), which means MSTH performs better than HE and CLAHE. But MSTH fails to enhance some buildings. The result of AHPBC produce a good cargo ship but over enhanced buildings and LDPF make the image be with low contrast, and the details are greatly smoothed. (a8) produces a result with poor contrast. And (a7) seems better than (a8), but it has produced background noise. The result of (a9) looks the best, as the proposed algorithm makes a good balance between enhancement of potential targets and the suppression of the background.
In Figure 6(b0), potential targets consist of bridges, vehicles, and buildings. And there exist both small and large target regions with rich details. The enhanced results are shown in (b1)-(b9). In Figures 6(b2) and (b5), HE and AHPBC algorithms have over enhanced some areas. According to (b6), LDPF's result indicates that the details are greatly smoothed with low contrast. Meanwhile, the approach of CLAHE leads to a worse visual effect with noise amplification. Images (b1) and (b7) look better, the two have improved contrast for details. From the image (b9), the proposed method also performs well, enhancing potential targets and improving the contrast. Figure 6(c0) has several potential targets, which include four people and details of buildings. The enhanced results are shown in Figures 6(c1)-(c9). In this figure, small targets are more prominent, especially the four people. The methods of HE and CLAHE produce much noise, leading to uncomfortable visual effects in (c2) and (c3). (c4) is better than (c2) and (c3), but the road has grey distortion, and some areas are overhighlighted. In (c6), the contrast looks low and the details are greatly smoothed. (c8) produces a result with poor contrast. Image (c1) looks better than (c5) and (c6), while the contrast is unsatisfactory enough. Image (c7) also produces a good result, but with some background noises. For our algorithm, (c9) keeps more details than (c1) and (c7), and enhances target regions better. The results of the proposed method achieve the best visual effects. Figure 7(a0) contains a ship with abundant details. And Figures 7(a1)-(a9) show the results of all nine methods. The HE and CLAHE algorithms both have enhanced potential target areas. However, they still created a lot of noises, especially (a2) is over enhanced. According to image (a4), the MSTH method well enhanced the ship, but it also produced a distorted background such as sea spray. The result of LDPF in (a6) has low contrast. The background of (a1) seems too dim. The results of (a5) and (a9) have better visual effects. (a8) seems a little low contrast. (a7) also looks very good with good contrast and abundant details.
The potential target in Figure 7(b0) is a yacht, which is a small target compared to the previous image. But according to nine methods of results in Figures7(b1)-(b9), the difference is even starker. For methods of HE, CLAHE, and LDPF, images (b2) (b3) and (b6) enhance target uncomfortably, we even cannot find the yacht. In image (b4), the yacht is enhanced greatly, while the background is fuzzy. And in (b5) and (b7), it produced a lot of noises.(b8) creates a low contrast result. (b9) is better than (b1) by contrast. The proposed method preserves more details and achieves a better visual effect than the others methods.
The potential targets of Figure 7(c0) are two small assault boats. And the two targets look too vague to distinguish in the original image. So the targets need to be greatly enhanced. All results of the seven algorithms are shown and compared in Figures 7(c1)-(c9). In (c2), the HE method creates an overenhanced result. According to (c3) and (c6), the results of CLAHE and LDPF cannot highlight the targets enough. In (c4) and (c7), the background has been amplified. (c1), (c5), and (c9) perform better than the others. The contrast of (c9) looks the best and strikes a good balance between potential target enhancement and background suppression.

Further subjective experiment
In addition to the above intuitive visual performance assessment, a subjective evaluation experiment is needed to evaluate the enhancement effect. Learning from Ref. [27], our   subjective metric includes four factors: Description of infrared image (DIR), Contrast (Con), Edge Feature (EF) and Background Noise (BN). DIR denotes the similarity of characteristic information between the result and source images. Con means the contrast level of the result, and we consider the contrast of the whole image as well as local regions. EF represents the sharpness of edges and contours of objects. EF could be noticed as target details enhancement. BN could mirror the negative facts of the enhancement method, especially the noise level. We set four levels to describe the worst to the best for these four factors to evaluate the enhanced results. '1' means very bad, '2' means not good, '3' means acceptable, but it is not very good, '4' means very good. For every person, the source image and results produced by all methods are shown together to give a score based on the above four factors. So, we could get an average score of each factor for each enhanced image. We invite a total of 50 people in our college to join in our research. Our subjective results for Figures 6 and 7 are given as following Tables 1 and 2. Table 1 lists subjective evaluation for  Figure 6(ai) (bi) (ci) (i = 1, 2,…, 9), and Table 2 shows subjective evaluation for Figure 7(ai) (bi) (ci) (i = 1,2,…,9). From our design, every figure could get a score between 1 and 4. A higher score means better subjective assessment. Learning from Table 1, we can analyze the four factors. For the 'DIR' factor, HE and the proposed method are better. Our approach performs best in 'Con' comparison. In the 'EF' factor, RTEWA and our method seem better. And the proposed algorithm produces the best results in the 'BN' comparison. Overall, the proposed method is the best in the subjective experiment.
According to Table 2, we can also analyze the four factors. For the 'DIR' factor, the proposed method performs best. RTEWA and our approach are outstanding in the 'Con' comparison. In the 'EF' factor, AHPBC and our method get better scores. In the 'BN' comparison, the proposed method, RTEWA, and MSTH are the top three. In general, the proposed approach performs best in subjective evaluation.

Objective evaluation
We introduce six objective evaluation methods, including linear index of fuzziness (LIF) [28], measure of enhancement (EME) [29], information entropy (Entropy) [30], the lightness order error (LOE) [31], natural image quality evaluator (NIQE) [32] and tone-mapped image quality index (TMQI) [33]. These could be used for objective evaluation for infrared enhancement. Next, the derivation process and definition of each parameter are introduced in detail. LIF metric is defined as follows: where f(x, y) denotes the grey value at pixel (x, y), min(), and max() are operators to calculate the minimum and maximum value, respectively. In Equation (12), W and H represent the width and height of the image (unit: pixel), respectively. The smaller the value is, the better the enhanced image achieves.
The measure of enhancement (EME) evaluates the dynamic range of the whole image [29]. Thus, larger EME means a large dynamic range, which indicates a better quality of the enhanced result. In the EME approach, image f is divided into N =z 1 × z 2 blocks, and we treated each block as B(q) (q = 1,2,…,N)with a size of m 1 × m 2 , finally EME is determined by: where f max;B(p,q) and f min,B(p,q) represents the maximum and minimum of B(p, q), respectively. is a tiny constant to avoid ill-problem.
The third metric is called information entropy, which usually measures the information the image contains. When the entropy value is larger, the image contains more information, thus the image is better. The entropy is defined by: where L represents f 's grey level, i is grey value. For example, if f is an 8-bit-depth image, the grey level L = 2 8 = 256. P f (i ) means the probability for i in the whole image f. The LOE metric is based on the lightness order error between the original image and its enhanced version. The metric is defined as: where W and H represent width and height of the image (unit: pixel), (x, y) denotes pixel coordinate. And RD represents a difference matrix of the lightness. More definition could be referred to Ref. [31]. The smaller the LOE value is, the better the lightness order as well as the enhanced result. The NIQE evaluates the quality of the image by measuring the distance between the quality-aware natural scene statistic (NSS) feature model and the multivariate Gaussian (MVG) fit the features. The expression of NIQE is complex, and more details could be referred to Ref. [32]. A smaller NIQE value means better quality of the enhanced image.
By combing a multiscale signal fidelity measure and a naturalness measure, the TMQI is defined as: wherea ∈ [0, 1]is used to adjust the relative importance of the two terms, and the sensitivity is determined by , . N is the statistical naturalness measure, and S is the overall structural fidelity [33]. These six objectives metrics are all utilized to evaluate all results of the above six images processed by nine algorithms. Since there is lots of data, we have to list four tables. Tables 3  and 5 show the calculated former three metrics (LIF, EME, and entropy) for Figures 6 and 7, respectively. Tables 4 and 6 show the calculated latter three metrics (LOE, NIQE, and TMQI) for Figures 6 and 7, respectively.
In Table 3, smaller LIF, larger EME, and larger entropy denote the better image quality, respectively. Learning from Table 3, we can analyze and compare the proposed method with other ones. The proposed method gets the smallest LIF value, and it indicates that our algorithm is the best. For EME, the proposed method is the second best. HE method looks like the better results in the EME metric. But through the observation of subjective evaluation, HE creates overenhanced results. Finally, the entropy metric indicates that the proposed method achieves almost the second-best performance by comparison, while KinD wins the best performance.
In Table 4, smaller LOE, smaller NIQE, and larger TMQI indicate better image quality, respectively. For LOE and NIQE, HE seems the best. We think the overenhanced image affects these two metrics. LDPF also performs well. The proposed method wins the third best. In TMQI comparison, our method performs the best in this objective evaluation.
According to the results of six metrics for Figure 6 in Tables 3  and 4, each method has its own advantages, but the proposed one obtains the overall best.
In Table 5, smaller LIF, larger EME, and larger entropy denote the better image quality, respectively. Learning from Table 3, the proposed method and RTEWA are the top two in LIF comparison. For EME, the Zero-DCE method seems the best, and many methods create results worse than the source image. The proposed method performs not well enough. In entropy comparison, Zero-DCE performs the best, and our method could be listed in the middle.  Figure 6, smaller LIF, larger EME, and larger entropy indicate the better image quality, respectively

TABLE 4
The calculated latter three metrics for Figure 6, smaller LOE, smaller NIQE, and larger TMQI indicate the better image quality, respectively   Figure 7, smaller LIF, larger EME, and larger entropy indicate the better image quality, respectively   The calculated latter three metrics for Figure 7, smaller LOE, smaller NIQE, and larger TMQI indicate the better image quality, respectively  In Table 6, smaller LOE, smaller NIQE, and larger TMQI indicate better image quality, respectively. For LOE, HE and the proposed method perform the best. In the NIQE comparison, HE looks the best, and our method could be listed in the middle. For the TMQI metric, our algorithm is the best one.
According to the results of six metrics for Figure 7 in Tables 5  and 6, each method has its advantages. Overall, the proposed method and Zero-DCE are the best two.
Because of too much data, we will sometimes misread these tables. Learning from Ref. [34], we consider box plots for all images used, and then performance measure results can be observed better. The numerical results have been given visually in Figure 8. In this figure, (a)-(f) are LIF, EME, entropy, LOE, NIQE, and TMQI, respectively. The source image could be used as a reference. When we compare the results for each method, the source image will be ignored.
As shown in Figure 8(a), the smallest LIF value belongs to the proposed method. Meanwhile, the proposed method also has the smallest mean value of LIF. It indicates the best performance of our approach. Figure 8(b) shows the box plot on the EME of the test images. The Zero-DCE method obtains the best performance. And the proposed method ranks seven in this category. The entropy box plot has been shown in Figure 8(c). In Figure 8(c), KinD obtains the highest value, and the proposed method wins the third highest among other enhancement methods. LOE measures have been shown in Figure 8(d). MSTH and the proposed method have the smallest LOE, which mirrors the two that have the best performance. The NIQE box plot has been shown in Figure 8(e). In Figure 8(e), HE and AHPBC seem the best, but the proposed method, KinD, MSTH, and CLAHE are very close to them. Figure 8(f) shows a box plot of TMQI. The proposed method, RTEMA, HE, MSTH, and LDPF have the largest value of TMQI. Furthermore, according to the mean value of TMQI, the proposed approach ranks one. Overall, the box plot proves the relative superiority of the proposed method.

Efficiency
For efficiency comparison, we also have made a table to analyze different methods. From Table 7, the HE and CLAHE are the first and second most efficient methods. They could rapidly finish the enhancement. The proposed method ranks three among these nine algorithms. The RTEWA method is close to the proposed method. The MSTH, LDPF, and KinD run slowly, and Zero-DCE and AHPBC are much slower. Take From the above comparison, we could find the differences in efficiency between different methods, especially the difference in the order of magnitude. By comparison, our method is relatively fast. The current speed can be used in a real system with algorithm acceleration.

Parameters
From Section 2.1, we have mentioned the local calculation and spatial weight. According to Section 2.4, to implement our method, we need to select several important parameters, including the size of the local window M and r.
Since the window area needs to have a central pixel, the window size M is generally an odd number. The size of the window M is closely related to the locality of the target in the infrared image: If the window is too small, it cannot quantitatively reflect the differences of the pixels around the centre, and if the window is too large, it cannot reflect the information characteristics of the local area. The local difference value r refers to the difference between the pixel values in the r radius

FIGURE 9
Saliency maps with different r (r = 1, 3, 5, 7, 9 and 11), M = 35 and the rest of the pixel values. The larger the difference value is, the more unable to accurately highlight the information characteristics of the local area. According to our experimental experience, when r becomes larger, the saliency map will be ambiguous, and we find the best r = 1 or 2. As shown in Figure 9(a) is an original image, (b)-(g) are based on different r = 1, 3, 5, 7, 9 and 11 when M = 35. From the results, it can be concluded that the value of r becomes larger, the saliency map becomes more blurred, the more unable to accurately highlight the information characteristics of the local area, and the more blurred the enhancement of small targets becomes, the outline of the characters in Figures 9(b)-(g) are getting blurred. We will discuss the effect on the infrared image results when r changes later. When M or r changes, the infrared image result will also change. As shown in Figure 10, we use six images for experiments and use the objective metric LIF to evaluate the enhanced results. The relationship between LIF and M is shown in Figure 11(a). Since a smaller LIF indicates better image quality, the optimal M = 35. When M becomes larger, the external image enhancement result becomes worse. The objective measure LIF is used to evaluate the results. The relationship between LIF and r is shown in Figure 11(b). Because a smaller LIF indicates better image quality, the optimal r = 1 or 2. When r is large, the external image enhancement result will be worse.
From the above analysis, we set local window M (initial M 0 = 15, scales n = 3 and the incremental size step ∆m = 10, so M = 15, 25, 35) and r = 1.

Further attempts in a hardware system
To further attempt our algorithm, we are expected to apply it on the hardware system. This method is coded and operated on FPGA (Field Programmable Gate Array) platform. Figure 12 shows the setup of the whole system, including infrared camera (infrared lens, camera component), FPGA (FPGA-based processing circuit), data transmission, and output display. The camera is a long wave infrared (LWIR) camera (infrared lens and camera component), which is bought from Zhejiang Zhao Sheng Technology CO., LTD (http:// www.suncti.com/). This uncooled 384×288 IRFPA camera's detector pixel size is 17 um, and it works at 8-14 um and captures 30 frames per second. In our setup, the focal length is 10 mm. This camera only needs two dry batteries to work. The FPGA could fast acquire the frames with the help of the company. The main work is the code optimization on FPGA. Beforehand, we have stored some spatial weight matrixes for the local window to accelerate. Meanwhile, the scale number n is set below 3. And some approximate is done, such as the value of π and σ 2 .
Until now, this algorithm could achieve about 20-30 frames per second in this system. In our application, it meets the requirement of real time. But, there is potential to accelerate our method for further application, especially for high-resolution cameras. Moreover, the whole system can be further assembled and integrated to be a separate module.

CONCLUSION
In this paper, we have designed a local saliency extractionbased infrared enhancement algorithm. Saliency analysis is done within a local window with spatial weight. And those potential targets and details with different sizes can be all enhanced as the size of the local window changes. Utilizing

FIGURE 12
Setup of our infrared imaging system multi-window saliency extraction, we achieve enhancement through fusion at different scales. Eight popular infrared enhancement approaches are introduced for comparison. The subjective qualitative observation indicates the results of the proposed method achieve the best visual effects, meanwhile, the subjective experiment also proves it. Six objective evaluation methods are selected to measure the results and a mass of evaluation data proves the effectiveness of the proposed algorithm. Meanwhile, a box plot is introduced to observe and compare easily, the graphs of six objective evaluation methods demonstrate the relative superiority of the proposed method. For efficiency comparison, the proposed method ranks three among nine approaches. The parameters are also discussed, and we have given an illustration that indicates proper M and small r should be adopted. The experiments and discussions demonstrate that local saliency-based strategy could well extract feature information with different sizes of source images to obtain a good enhanced result. Since our method can obtain a good enhanced result with one single image, the algorithm is suitable for single image processing and application, such as further target detection, scene surveillance, and other relative fields, which would be our next try.
One of the key problems is the automation of the algorithm, especially how to select the size of the local window. In the future, the size of the local window is supposed to be selfadaptive by automatically analyzing image content.
We have tested the proposed algorithm in the hardware system. But it is limited to apply in high-speed imaging sys-tems. Thus, how to accelerate our method will be also focused on.