Window-aware guided image ﬁltering via local entropy

Guided image ﬁltering is one of the widely used techniques in computer vision. How-ever, it commonly leads to over-smoothed edges and a distorted appearance when tackling intricate texture patterns and complex noise. In this paper, a window-aware image ﬁltering framework based on the bilateral ﬁlter guided by the local entropy is presented. The key idea of the authors’ proposed approach is to design a novel guidance input and a non-box ﬁltering window. Speciﬁcally, using the Gaussian spatial kernel and the local entropy, a GEF that can maintain image feature details and yield a robust guidance input for BF is constructed. Meanwhile, based on an intensity-similar strategy, the local non-box ﬁltering window is designed for the further preservation of edge structures. The authors’ approach not only inherits the advantages of bilateral ﬁlter i.e. simplicity, parallelisation and easiness of programming, but also is more powerful than bilateral ﬁlter and its variants. In addi-tion, the guided entropy ﬁlter and the non-box window can also be transplanted to other local ﬁlters and can effectively improve the ﬁltering effects. The qualitative and quantitative experimental results demonstrate that the authors’ approach has good performance in image denoising, texture (or background) smoothing, edge extraction and other applications in image processing.


INTRODUCTION
Image smoothing aims to smooth interference information, such as noise and unexpected texture background, which is an important operator for maintaining the image quality in image transportation and preservation. So far, although many existing image filtering methods can remove unexpected details well, they commonly fail to preserve the small-and even mediumscale structures. Among these excellent methods, due to the simplicity, parallelisation and easiness of programming, bilateral filter (BF) [1] is widely used in various image processing tasks. The success of BF originates from the collaboration between the Gaussian spatial kernel [2] responsible for smoothing out interference and the Gaussian range kernel for maintaining image features. However, for images contaminated by high-frequency noise or coarse textures, bilateral filtering, like many other filtering methods, still has trouble in reaching a proper balance between detail preservation and image smoothing. To alleviate this problem, the simple and effective improvements of BF have been proposed by designing a proper guidance input during the past two decades. For example, local schemes [3][4][5]  to improve the distance norm of the Gaussian range kernel of BF, by utilising the information in the filtering neighbourhood. Non-local methods, such as Means-BF [6,7] and NLBF [8,9], attempt to obtain the input of the range kernel through mean filtering or other non-linear filtering approaches. These approaches are able to yield high-fidelity results for processing low-frequency noise or low-scale texture background, while they could not work well when removing complex texture backgrounds.
The recently proposed multi-layer frameworks based on the guided BF can be exploited to better understand the lowfrequency part of an image than the aforementioned single improvement of the range kernel. The multiple resolution bilateral filter (MBF) [10] applies a wavelet filter to approximate the low-frequency part of the image and achieves blending filtering. Similarly, Khadidja et al. [11] utilise the wavelet transform to deal with the low-frequency part of the image and combine WBF to perform image denoising. It is worth mentioning that the joint bilateral filtering [12] introduces the local image entropy to construct a threshold function, which could provide self-adaptive parameters for different types of images. Although these multilayer frameworks of BF provide a more accurate output, they In order to reduce the parameter settings of the guidance image and provide a robust guidance input for BF, we have designed an effective Gaussian entropy filter (GEF). The GEF can avoid setting the spatial and range kernel bandwidth similar to that of existing bilateral filtering. Meanwhile, we address a local non-box filtering window for our guided bilateral filtering to retain some low-scale significant features.
The main contributions of our proposed method are as follows.
• We construct an Gaussian spatial entropy filter to generate a robust guided image. The spatial entropy filter can utilise the grey information of pixels while avoiding the setting of the range bandwidth, which could improve the efficiency of the guided filter. • We provide a simple window-aware strategy to find non-box filtering windows. All significant structure details in a nonbox window could be effectively preserved by weighting the similar pixels in non-box window. • Our proposed window-aware strategy could reduce the sensitivity of the filtering parameters due to the similarity of pixels in the filtering window, which is conducive to adjusting the experimental parameters for the final best results.
The dual improvements from the robust guidance and the sensitivity-aware filtering window could process the different image interference while preserving visually significant details (see Figure 1). The more results of both quantitative and qualitative comparisons are included in Section 5.
The remainder of this paper is organised as follows. In Section 2, we briefly review the related works, including the basic Gaussian filter, bilateral filter and their extensions. In Section 3, the definitions of BF, Gaussian filter, local entropy, the intensityaware window and our window-aware guided bilateral filtering are detailed. The implementation of our approach is analysed in Section 4. In Section 6, we demonstrate some applications of our method compared with several similar methods, followed by the conclusion in Section 7.

RELATED WORKS
Image filtering is a fundamental skill in image processing. There are many literatures of this topic and representative approaches include wavelet transform [13], adaptive total variation [14], clustering learning [15] and low rank learning [16], etc. Recently, non-local methods, such as NLM [17] and BM3D [18], have achieved promising results in image denoising by capturing the similarity between patches within a broader context. In this section, we emphasise on the most relevant works on kernel filtering, including Gaussian filter, bilateral filter and their extensions. Gaussian filter (GS) [2] is a conventional low-pass filter, which is commonly combined with other methods to handle high-level noise or coarse texture. For example, the Gaussian mixture model [19][20][21] integrate two or more Gaussian filters, whose parameters are obtained by learning-based methods. The combination strategy is also applied for image restoration and reconstruction [22]. Another idea is to combine the statistical information, such as Bayesian rules [23,24] , Hierarchical clustering [25] and Monte Carlo methods [26]. These approaches attempt to learn the noise model under a certain probability condition, for the purpose of reducing the error caused by the data discretisation. These methods leverage the robustness of Gaussian kernel to smooth the image, but it is difficult to simultaneously achieve noise removal and feature preservation when using a single Gaussian kernel.
As an improvement of Gaussian filter, Bilateral filter [1] provides better performance in edge-preserving filtering. There are also many other combinatorial techniques based on BF. For example, Yang et al. [27] decomposed the noisy image into components of different frequencies and direction response sub-bands, and then applied the statistical model into BF to obtain the on-the-ground details. To restore the natural information of an image, Mayank et al. [28] combined BF with The pipeline of our window-aware guided image filtering the minimum mean square error to reduce noise. Although the technique of the mixed filters could improve the filtering performance of BF, how to select a proper weight for each coordinate filter is an issue that still has not been tackled. In addition, there are improvements [29][30][31][32] of selecting parameters and guidance inputs by utilising learning methods, yet such methods may involve high computational complexity and difficulty in model verification, which could not be negligible.
Recently, the framework based on joint bilateral filtering [33][34][35][36] has demonstrated that a proper design of the guidance input is considered an effective amelioration. In [37], Song et al. designed distinct patches according to image details, and regard the statistics of these patches are used as the guidance input for texture filtering, which could be applied to effectively remove image textures. Mayank and Bhupendra [38] proposed a different guidance input using spatial gradient that could differentiates edges and textures. In the work of Nagashettappa et al. [39], a partial differential equation is regarded as an approximate of the guidance input, and the second derivative of pixel value is employed to reflect the edge information, which could easily enhance edge details. The methods mentioned above share a common target, that is, to better flatten the noisy areas of the guidance image, but this lead to over-blurred structure details. Unlike the previous techniques, our method focuses on retaining the feature details of the guidance input rather than smoothing it, and feeding BF with sufficient information of features to achieve better edges preservation.

METHODOLOGY
In this paper, to seek the considerable smoothness and edge preservation in image filtering, we first slightly pre-processed the original input by Gaussian range kernel filtering. Then a local entropy filter was designed to generate a robust guidance for bilateral filtering, and meanwhile a window-aware strategy was adopted to optimise filtering windows. Based on the robust guidance input and the optimal filtering window, the final guided image filtering can obtain more accurate computer vision. The whole pipeline of our approach is shown in Figure 2.

Bilateral filter and guided bilateral filter
Given an input image I , the output J p at each pixel p obtained by bilateral filter (BF) [1] can be expressed as follows: where I q is the intensity value of the noisy image at pixel q and Ω p represents the box window centred at p with the size of (2r + 1) × (2r + 1). The output J p at pixel p is a weighted average of all pixels I q in the neighbourhood Ω p . The spatial kernel g r and the range kernel g are both Gaussian functions, which are written as The normalising term k p is defined as where ∥ ⋅ ∥ represents the L 2 norm, and the parameters d r and d control the bandwidth of the spatial and range kernel, respectively. When g r (∥ I q − I p ∥) ≡ 1, Equation (1) represents Gaussian spatial filter (GS), and g r (∥ q − p ∥) ≡ 1 is Gaussian range kernel filter (GF).
For seeking a better filtering effect, the guided bilateral filter (GBF) utilises the other filtering output G as the input of Gaussian range kernel, which could be expressed as Constructing a robust G is the core of guided filtering, which is also one of the key schemes to be proposed in this paper.

Gaussian spatial entropy filter
Many studies [36,37,40] have shown that an effective constructed guidance image should own a better protection of image features rather than a smoother filtering output. Considering that the image entropy [41] can well measure the intensity energy of pixels, we use the local entropy and the Gaussian spatial kernel to construct a GEF, which behaves like BF with the good preservation of edge structures while avoiding parameter setting for Gaussian range kernel. Given a neighbourhood Ω p centred at a pixel p, the local entropy [40] of the pixel p can be defined as is the probability of grey level k appearing in the neighbourhood Ω p , and N k represents the number of pixels with the grey level k. L is the total amount of grey levels.
According to the image entropy, the output GEF at the pixel p can be expressed as sents the information entropy of pixels in the window Ω p . N r is the number of all pixels in Ω p . I q belongs to the grey level inter- From Equation (7), we can find GEF does not need to set the corresponding parameters except the window radius r. Meanwhile, the robust output is also provided by GEF, due to the small fixed spatial bandwidth of Gaussian kernel. Figure 3 exhibits the robustness of GEF. We can see from the grey histogram shown in Figure 3 that the information distribution of output is stable, when the window radius r is larger than 4. The robustness could be easily inferred from Equation (2). Since ∥ q − p ∥> 4, the weight at each pixel q produced by the spatial kernel g(∥ q − p ∥) is almost zero. Therefore, the FIGURE 3 Using GEF with the different radius r for the noisy image. (a) Noisy input. (b)-(e) Filtered images and the corresponding grey histograms below. As can be seen from the grey-scale histogram in (b)-(e) that, when the window radius r is increased, the filtered image would not be over smoothed large window radius have no effect on the output of GEF at the weight with a zero value.
GEF with robust output and no too many parameter settings can better serve GBF, determined by experiments in Section 5.
In general, a larger size of filtering window will be used in image filtering for a better smoothness, yet a large-size widow should lead to the over-smoothness. For this reason, many techniques have been developed to select filtering windows for edge preservation, such as side window [42,43] and edge-aware windows [44]. However, these improvements of sub-windows do not provide significant structure protection for fine details. To retain low-scale structural details, we propose a simple and effective window-aware technique that can find a non-box window to redefine the intensity-similar information of each central pixel.

Intensity-aware window
Assuming the filtering window Ω p centred at each pixel p with the size of (2r + 1) × (2r + 1), pixels q whose intensities I q are similar as I p can be collected according to the following formula: where is a threshold that can be used to determine the intensity-similar window Γ p at the pixel p. By Equation (8), we obtain the window Γ p as follows: The intensity-aware window Γ p is an irregular region, but it can better reflects the information of the central pixel p, as shown Figure 4. Clearly, the intensities of pixels in the region marked with red '1' are more similar to I p .
We perform BF and window-aware bilateral filtering for the noisy image cat at noise level = 10, 20, and compute the results of PSNR, respectively. Compared with the non-box filtering window, the filtered image obtained by window-aware bilateral filtering has better quality than that by BF with the same parameters (see Figure 5). Furthermore, as can be seen from Figure 5, the optimal PSNR of the filtered image obtained by using intensity-aware windows is significantly better than that by using box windows at a higher noise level, which demonstrates that intensity-aware windows can enhance the preservation of local details. Figure 6 shows a visual comparison of the filtered output of BF and the output of window-aware bilateral filtering with different parameters d when the noise level is = 20, from which we can see that the box filtering image will be blurred with the increase of frequency bandwidth. However, for the non-box window bilateral filtering, the bandwidth d is much less sensitive to the smoothness provided by the spatial kernel, which can further provide the protection of fine edge details (see cat's whiskers in Figure 6).

Window-aware guided bilateral filtering
According to the intensity-aware window Γ p and a robust greyscale image guidance G produced by GEF, the output at each pixel p of our proposed Window-aware Guided Bilateral Filtering Due to the high preservation of the intensity-aware window for similar features, WGBF may result in some local singular values when encountering strong interference, such as highfrequency noise. In order to address this issue, we adopt a median alternative strategy, by which the median intensity I med in box window Ω p will replace the singular output J p of WGBF to suppress the mutation. Thus, the final optimal output J p of WGBF based on the intensity-aware window can be rewritten as where represents the intensity similarity between pixels in Ω p and the pixel p. The intensity similarity is determined as Equation (8).
For an image with the high-frequency noise or large-scale background textures, a slight pre-processing filtering should be required to improve the guidance image G , which could weaken the difference between the central pixel p and any pixel q in Ω p , such that the filtering window could be accurately adjudged.
The pseudo code of our algorithm is detailed in Algorithm 1, and our code is publicly available 1 .

Parameters analysis
From what has been discussed above, our approach for image smoothing consists of image pre-processing, guidance image filtering and guided window-aware filtering.
In the first stage, Gaussian range kernel filter (GF) is chosen for image pre-processing, because of the capability of edge preservation. The pretreatment stage is intended to slightly smooth the interference and to facilitate subsequent operations. The parameters d r and d for GF are set to relatively small values. For instance, we select the window radius r = 3 and d = 0.1 for a noisy image with the noise level = 30 and most of conventional textured images as shown in Figure 1(Harp). For other scale images, only d needs to be finetuned.
In the second stage, only the window radius r of GEF needs to be tuned. According to the analysis in Section 3.2, we fix a window radius r = 4 for any scale image.
In the third stage, due to the performance of intensity-aware window, the window radius r have a less effect on WGBF, which can be set to 9. The parameter d could be the same as the value in first stage. In addition, we define the thresholds = = 0.12 + 0.002( − 10) that could basically satisfy different noise levels. When dealing with texture images, the fine-tuning strategy is similar to noise levels. In specific applications, the parameter d can be tuned appropriately.

Performance analysis of WGBF
Given the same values of r and d , the bilateral filter (BF), GBF and window-aware guided bilateral filter (WGBF) are performed on a noisy signal input. We can find from Figure 7 that, the signal filtered by BF is obviously not as smoothed as that by GBF and WGBF, and the edge preservation of GBF and WGBF is also far better than BF. From visual perception, the smoothness of GBF and WGBF are the same, while WGBF provides the significant preservation for low-scale features [see the left of 'Low' in Figure 7(b) and (c)], due to the contribution of the intensity-aware window. More experimental results of WGBF for image filtering are exhibited in Section 5.

Computational complexity
We analysed the computational complexities of our algorithm, BF and its improvements, and calculated the corresponding running time of various methods in the same operating environment. Our proposed algorithm consists of three stages, and the calculation in each stage is equivalent to that of BF. Therefore, the computational complexity is approximately O(3N ⋅ d 2 r ), where N is the number of the pixels in an input image I , and d r represents the window radius of BF. We compared the computational complexity of related improvements on BF: BF [1], robust bilateral filter (RBF) [8], optimally weighted bilateral filter (WBF) [45], multi-resolution bilateral filter (MBF) [10] and entropy-based bilateral filter (EBF) [12]. MBF has the highest time complexity which is about O(N 2 (L ⋅ log 2 N + d 2 r ), followed by EBF which is O(N (d 2 r + L 2 e + L 2 w )), and the others are almost the same. Here, L, L e and L w are the wavelet filter length, the window size of entropy and the window size of the Wiener filter, respectively. We implement the above-mentioned methods on Matlab2019a using an Interel(R) CPU (3.6-Ghz), and obtain the timing data as shown in Table 1. The computational complexity of the proposed method may not be optimal, but subsequent applications show that the experimental results of our method are far better than that of similar improvements of BF.

Evaluation
We introduced the common objective Peak Signal to Noise Ratio (PSNR) [46], Structure Similarity Index Measure [47] and Edge Preservation Index (EPI) [48] to measure the quality of filtered images. PSNR and SSIM could be used to evaluate a filtered image with a reference, while EPI is adopted as the local similarity for the images without references, such as the real-world images and texture images. PSNR is computed by 2 is the mean square error, and N is the number of pixels in the image x. S is the maximum grey value of all pixels in the image x. When the image x is 8-bit grey level within the scope of [0,255], we often take S = 255. SSIM is defined as whereS andS 1 are mean grey values of patch setsS andS 1 that represent the structure region of I and J , respectively. Parameters 2 S and 2 S 1 represent standard deviations of the setsS and S 1 .SS 1 is the covariance betweenS andS 1 . C 1 and C 2 are constants, which can be usually set: EPI is expressed as wheres ands are mean values in the region of interest (ROI) of the original image s and filtered imageŝ that are performed by the Laplace operator, respectively, and where s(i, j ) andŝ(i, j ) represent the pixel values at the position (i, j ) on the original and filtered images, respectively. Since texture filtering would not only remove some texture details but also maintain local structure features, it is an appropriate method which can be used to choose ROI for texture image quality assessment.

Image denoising
Some test images provided by Kodak 2 and the real-world images from PolyUDataset 3 are selected. The real-world noisy image is captured by a camera with a higher ISO, and the pattern of the real-world noise is different from that of the Gaussian noise, but similar to that of some image textures. We compared our method with bilateral filter based on side window (SWBF) [42] and its recent improvements: entropy-based bilateral filter (EBF) [12], as well as some of the state-of-the-art denoising algorithms: BM3D [18]. All experiments are carried out using the optimal parameter settings and the provided test database. For the visual contrast, we exhibited the denoised results of the image girl at the Gaussian noise level = 30 and for a realworld image (2500 × 1872) processed by above methods(see Figures 8 and 9).We also experimented with all the mentioned FIGURE 9 Local zoomed-in denoised images with the best PSNR by different methods for a real-world image. The area of the purple box should be well smoothed, while the details in the purple one needs to be preserved as much as possible   TABLE 2 The results of PSNR,SSIM and EPI in ROI, which are provided by EBF [12], BM3D [18], SWBF [42], and our proposed algorithm for test images  Table 2 show that BM3D has the unparalleled advantage over other methods in processing noisy images with Gaussian noise, while our method performs better for real-world images. When dealing with high-frequency noise or unnecessary background texture, BM3D tends to highlight the high-frequency region, which may cause the plaque phenomenon [see yellow circle in Figure 8(f)], due to the strategy of enhancement, and it may lead to the lack of artistic effect of images. In contrast, our method focuses on the protection (not 'enhancement') of details, some rough textured background in the noisy image can be better smoothed, while preserving significant details [see the zoomed white flat background and curve textures in Figure 9(h)]. And the results of EPI also show that our method is superior to other similar filters, since these regions of interest existing some important texture features could only be well preserved by our proposed intensity-aware windows.

Image detexturing
We chose two texture images with fine and complex texture patterns, and compared our algorithm with some excellent detexturing methods: BF [1], ROG [49], kRTV [50], RGF [51]. Meanwhile, we zoomed the area with more feature details and relatively weak texture as the ROI to compute EPI of each filtered image. Figures 10 and 11 present the filtered results of images with fine, coarse-scale textures, respectively. From Figures 10 and 11, we can see that the performance of our algorithm is almost close to that of ROG and kRTV in terms of overall texture removal, and is far better than that of BF and RGF. Nevertheless, the zoomed interested region shows that our approach preserve the details better, such as the retention of the hair in Figure 10 and the teeth in Figure 11. The values of EPI also demonstrate that our method achieves the desired retention of significant details on tine and complex scale texture images.

Image background smoothing
Background smoothing mainly refers to remove the existing feature details that affect some special application of computer vision. Similar to textures, some of the background details may be unimportant textures, but some are important feature details, such as fish scales, uneven surfaces etc. In this paper, we employed methods: BF [1], RGF [51], ROG [49] and kRTV [50] to smooth the background for obtaining the clean image. Generally, the texture filtering methods kRTV and ROG could smooth the background without the large-scale interruption very well, yet they could also easily blur some low-scale structure details or form some artifacts [see eye areas in Figures 12 and 13(d) and (e)]. Conversely, RGF entirely fails to remove most of the background textures, due to the reinforcement of all details. Since our method emphasizes the preservation of features only based on the similarity of details, we can tune the intensity threshold to control the smoothness for the removal of some similar coarse background details, and preserve the fine-scale features that are different from background details, as shown frecklesandwater wave in Figures 12 and 13(f) and (l). The intuitive experimental results demonstrate the superiority of our proposed algorithm for background smoothing.

Edge extraction
Our algorithm also benefits the pre-processing step of edge extraction, especially when the image contains noise or complex textures. In order to illustrate the effectiveness of our method FIGURE 10 Tine texture filtering. The yellow box area is the interested region, which includes many details while less intrusive textures, and the purple box area is the zoomed interested region. The numerical result below each sub-figure (b)-(h) is the corresponding value of EPI

FIGURE 11
Complex texture filtering. The yellow box area is the interested region, which includes lots of details while less intrusive textures, and the purple box area is the zoomed interested region. The numerical result below each sub-figure is the corresponding value of EPI in edge extraction, we conduct experiments on both the noisy image (Parrot) and the textured image (Sheep), whose edges are extracted by TV [52]. We compare the edges extracted from the original image, the image pre-processed by BF and the image preprocessed by our method. We can find plenty false edges in Figures 14 and 15

HDR tone mapping
A High Dynamic Range (HDR) [53] image records the actual scene, whose pixel values representing the brightness are far more than the grey level 255 and are difficult to display on the screen. For computer vision, linear decoding methods usually are used in HDR images, yet it will make some regions too bright or dim. The ways to solve this kind of the balance of the colour brightness is called tone mapping, which will decomposes an HDR image into two layers: a base layer and a detail layer. The base layer is compressed with a certain contrast factor and the detail layer remains unchanged. Generally, the base layer needs to select an effective filter to remove noise while maintaining more edge features. Based on the filtered base layer, HDR tone mapping could be expressed as where I b is the output of the HDR image filtering, and I d is the difference between the original HDR image and the filtering image I b . The compression factor c belonging to (0, 1) determines the dynamic compression range. We compared our approach with the methods: ROG [49] and BF [1] that could perform well on image edges. As can be seen from the same zoomed region in Figure 16(b) and (d) that the HDR image processed by WGGF is much clearer than the HDR image processed by BF and ROG.

Multi-scale detail enhancement
Multi-scale detail enhancement requires the outputs of coarsescale and fine-scale detail enhancements by image filtering with different parameter-scale settings twice, then calculates the average of these two results as the final improvement result. The preservation of the details of the filtering process is essential for the final image enhancement. In the multi-scale detail enhancement experiment, we compared the efficient edge preservation methods: BF [1], ROG [49] and our WGBF. Figure 17(b) and In contrast, as can be seen from the zoomed image, BF and ROG cannot more effectively preserve the the fine-scale details and result in low-scale textures to be blurred.

CONCLUSION
Image filtering actually incorporates different topics, while existing research works tend to make improvements only on one specific direction, such as denoising, which makes it hard to be adapted into various types of practical usage. Our work attempts to generalise the task of image smoothing, by combining the image entropy and Gaussian filter in a conceptually simple way, where the proposed local entropy filter strives to protect the detailed information along with the Gaussian filter ensuring the smoothing effect. The adaptability and robustness of the bilateral filter inspire and enable us to realize a more generalised algorithm for smoothing out image interference. To verify the effectiveness of our method, we conduct extensive experiments on images with different kinds of interferences and compare our algorithm with methods aiming at different targets. We prove that our algorithm achieves similar or even better results than techniques designed for a specific task first, then we demonstrate the generalising ability of our method by using noisy textured images as inputs, where our algorithm outperforms surpasses the single target algorithms by a large margin. In addition, our designed intensity-aware window can not only improve bilateral filtering but also be applied to other local filters (e.g. Gaussian, mean and and other window filters) for details preserving.
Although our method inherits the advantages of the bilateral filter, under this framework it also suffers from the problem of parameter regulation. In our future work, we would like to integrate the intensity-aware window into other excellent algorithms to simplify the tuning process and enable our method to better handle interference with different scales. Besides, we also intend to employ our smoothing algorithm into various types of applications, such as detail enhancement, image extraction and image decomposition.