An effective weighted vector median ﬁlter for impulse noise reduction based on minimizing the degree of aggregation

Impulse noise is regarded as an outlier in the local window of an image. To detect noise, many proposed methods are based on aggregated distance, including spatially weighted aggregated distance, n nearest neighbour distance, local density, and angle-weighted quaternion aggregated distance. However, these methods ignore the weight of each pixel or have limited adaptability. This study introduces the concept of degree of aggregation and pro-poses a weighting method to obtain the weight vector of the pixels by minimizing the degree of aggregation. The weight vector obtained gives larger components on the signal pixels than on the noisy pixels. Then it is fused with the aggregated distance to form a weighted aggregated distance that can reasonably characterise the noise and signal. The weighted aggregated distance, along with an adaptive segmentation method, can effectively detect the noise. To further enhance the effect of noise detection and removal, an adaptive selection strategy is incorporated to reduce the noise density in the local window. At last, noisy pixels detected are replaced with the weighted channel combination optimization values. The experimental results exhibit the validity of the proposed method by showing better performance in terms of both objective criteria and visual effects.

Because of the characteristics of impulse noise, both component-wise methods and vector based methods need to define a noise-like feature to evaluate each pixel and to conduct the image denoising, such as noise detection and noise removal. In the component-wise methods, the sum of scalar absolute difference is usually used as the noise-like feature. In vector based methods, the sum of vector absolute difference (Euclidean distance, city block distance, quaternion distance) is usually used as noise-like feature.
Analysing the above methods, it can be found that aggregated distance [40] has been widely used as a noise-like feature in impulse noise removal. The aggregated distance of a pixel is computed as the sum of distances from the pixel to its surrounding pixels. However, the distance to the noisy pixels does not have much meaning compared with the distance to the signal pixels in the colour space because impulse noise usually does not have valid information for the original image. Therefore, some improved versions of aggregation distance are proposed.
The rank-ordered absolute differences (ROAD) [41] and fast averaging peer group filter (FAPGF) [32] have used the local density of local pixel groups in the colour space as a noise-like feature. The ROAD value of a pixel is computed as the sum of the Euclidean distances between the pixel and the four closest (colour field) pixels in the local window. Meanwhile, FAPGF first defines a threshold distance of a colour space and then calculates the number of other pixels in the window that contains this same threshold range. This number is used as an independent variable to define a function that is regarded as a noise-like feature. Although the local density information can reflect the characteristics of the pixel (local density is higher, more signallike; local density is lower, more noise-like) and does reduce the bad influence of noisy pixels, but such information greatly depends on parameters and therefore cannot easily adapt to different noise levels.
Adaptive vector median filter(AVMF) [31] uses the Euclidean distance between the mean vector of several sample vectors which are close to the median vector, and the current colour vector, to compute its noise-like feature(reduced aggregation distance), and replaces the noisy pixels which's noise-like feature exceed a fixed value by the output of the standard VMF. Meanwhile, the fast peer group filter for VMF (FPGFVMF) [27], regards the current centre pixel in the local window as a signal pixel only if the number of surrounding pixels, which are close enough to the current centre pixel (in colour space), is sufficiently large. These two types of noise-like feature and switching mechanisms use fixed thresholds and parameters, and they heavily depend on parameters too.
The above improved versions of aggregation distance can be regarded as a kind of weighted aggregation distance, but they heavily depend on parameters. To reasonably weight and detect the noise, the concept of a degree of aggregation is proposed, based on which a weight vector is calculated by minimizing the degree of aggregation. The weight vector has larger component on the signal pixels than on noisy pixels. Then these appropriate weights of weight vector are used to form a weighted aggregated distance which is used as noise-like feature in this study. Then an adaptive data segmentation method for weighted aggregated distance sequence is designed to adaptively detect the noisy pixels. In the image recovery phase, we use the values optimised by channel combination to replace the noisy pixels. Experimental results show that the proposed method exhibits competitive results over other state-of-the-art colour image denoising methods.
The rest of this study is organised as follows. Section 2 discusses the noise analysis and detection method. Section 3 describes the noise removal method. Section 4 presents the experimental results. Finally, the conclusion is drawn in Section 5.

Degree of aggregation and weighted aggregated distance
In the local window (for the convenience of description, the window size is set to 3*3 hereinafter), the pixel group in the window is mapped to the nodes in the colour space. These nodes are connected by undirected edges, where the value of each edge is computed as the Euclidean distance of the two nodes associated with the edge. In this case, these nodes and edges can be used to form a complete graph. The adjacency matrix D of the graph is built as follows (where D i, j represents the colour distance between the ith pixel and the jth pixel): It is known that adjacency matrix D is a real symmetric matrix with nonnegative elements. The aggregated distance is defined as The aggregated distance can be perceived as the sum of the distance of the pixels from the neighbouring pixels. In the matrix D the aggregated distance of the i'th pixel is computed as the sum of elements of the i'th row or column. However, the distance (colour space) of the pixels from the noisy pixels in the window is not as important as the distance from the signal pixels in the window. We want the aggregated distance to only include the distance to the signal pixels and exclude the distance to the noisy pixels. But we don't learn in advance which pixels are noisy pixels and which are signal pixels. To reduce the influence of noisy pixels as much as possible, we propose the concept of degree of aggregation. At first, define the weight vector of the node W T : and its constraint: The initial value of the weight vector is 1∕n. The degree of aggregation X is defined as: The rightmost part of Equation (3) contains two terms, the first term is the weighted sum of the off-diagonal elements of the adjacency matrix D, and the second term is the weighted sum of the diagonal elements. If each node is weighted appropriately, X should be small. If we take the weight vector W as a variable and minimise X , the optimal weight vector will be obtained. However, since the diagonal element of adjacency matrix D is 0, only the first term is considered while the second term is ignored. If W is taken as a variable and X is minimised, the weight will freely flow to any component of W vector. Then X will get the theoretical minimum value of 0 while W will be non-unique and meaningless. But if the diagonal element of matrix D is reasonably modified to become a positive definite matrix, then the unique and optimal W will be easily obtained. The adjacency matrix can be reasonably modified in the two ways: 1. Modify each diagonal elements to be slightly larger than the sum of the elements in the same row or column.
2. Modify the diagonal element to be slightly larger than the negative of the minimum eigenvalue M v of D.
Δ is a very small positive value. The diagonal elements of matrix D will become some positive values and the modified positive definite adjacency matrix D 1 is obtained as follows: The degree of aggregation becomes: In the right part of Equation (7), if degree of aggregation is minimised, the first term enlarges the gap between the weight components, while the second term minimises the weight component. In other words, the penalty factor(diagonal element of matrix D 1 ) can also control the sensitivity of weight vector. If there is no noise in the local window, its degree of aggregation is small, and the presence of noise increases the local window's degree of aggregation. When X is minimised, the optimal weight vector obtained can weight more on the signal pixels than on the noisy pixels. Finally, the weight vector is merged into the aggregated distances to get the weighted aggregated distances: The following example illustrates the process of the weighting method. Consider the nine pixels in the 3*3 window that take the following values: Suppose that all the nine pixels are signal pixels. The initial weight vector is According to Equation (8), the initial weighted aggregated distances are: ( 0.89 1.12 1.12 1.12 1.34 1.12 1.34 1.28 1.28 ) According to Equation (7), the degrees of aggregation of different modifications (Equations 4 and 5) are 2.36 and 1.645. Afterwards, we minimise the degree of aggregation to obtain the optimal weight vector: ) and obtain that the new minimum aggregation agrees with 2.28 and 1.58. It can be found that the newly minimised degree of aggregation is only slightly smaller than the original degree of aggregation. After all, the optimal weighted aggregated distances are: Among them, the first 6 columns represent the value of the signal pixels, while the others represent the value of the noisy pixels. The initial weight vector is According to Equation (8), the initial weighted aggregated distances are: According to Equation (7), the degrees of aggregation are 119.851 and 95.142. After minimizing the degree of aggregation, the following optimal weight vector is obtained. and we obtain that the newly minimum aggregation agrees with 54.870 and 54.664, which are both smaller than the initial degree of the aggregation. Moreover, the optimal weighted aggregated distances are: ) It can be seen that the weighted aggregated distances used as a noise-like feature are more distinct between noise and signal than the unweighted aggregated distances. What should be mentioned is that the classic VMF can be regarded as a reduced version of our method. Considering the first setting of the diagonal elements (set to the aggregated distance), if the adjacency matrix D 1 ignores the off-diagonal elements (set to 0), then only the diagonal element D i,i (the unweighted aggregated distance of node i, refer to Equation 4) remains, as shown by matrix D 2 below: The same constraint minimum aggregation optimization is performed for matrix D 2 as a reduced version of the minimum aggregation optimization. By using the Lagrange multiplier method, the optimal solution vector W 1 is obtained as In Equation (10), the pixel with the minimum unweighted aggregated distance is selected by the VMF and has the largest component on the weight vector W 1 . In this case, the VMF only considers the aggregated distance (diagonal elements) and ignores the other non-diagonal elements of the adjacency matrix.

Noise detection
Noise detection is based on adaptive segmentation of noise-like feature. When pixels contaminated by impulse noise are present in a window, the complete graph becomes confusing. The aggregated distance of the noise nodes greatly increases along with that of remaining signal pixels, but the amplitude of the increase in the signals is much lower than that of the increase in the noise (shown in the example list in the Section 2.1). A segmentation method that adaptively divide the aggregated distance sequence into signal and noise parts can solve this problem. The weighted aggregated distance mentioned in Section 2.1 is used as the data sequence segmentation to find the noisy pixels. However, the segmentation of this sequence is not straightforward because the former portion of the sequence that represents the signals' noise-like features value is usually small and convergent, and the following portion of the sequence representing the noises' noise-like features is usually large and divergent. In addition, the weighted aggregated distance of the signal nodes has a natural cluster, while that of the noisy nodes has a natural divergence (which is a nature of noise). To address this problem, a segmentation method based on the sigmoid function may be effective. The sigmoid function equation can be expressed as follows: The curve of the sigmoid function is shown in Figure 1, where the red squares in the curve represent the noises, the green squares represent the signals, and the blue square represent the segmentation point P.
We want the signal part of the weighted aggregated distance sequence, which is small and convergent, to fit the high slope area (which is near the centre point and on the negative halfaxis of the sigmoid function) and the remaining noise part in the sequence, which is larger and divergent to fit the gentle slope area (which is far from the centre point and on the positive half-axis of the sigmoid function). The point P is considered as the segmentation point to determine the noise and signal.
Because the standard sigmoid function has most of the range of slopes in the domain 0 and 4, which is sufficient for model constrained optimization, a scale transformation is performed on the weighted aggregated distance sequence. The adaptive segmentation algorithm consists of the following four steps: (1) The weighted aggregated distance is sorted in an ascending order. (2) The weighted aggregated distance is transformed between 0 and 4. (3) The scaled weighted aggregated distance is mapped to the sigmoid function's value, f (x) in Equation (11). (4) Optimal point P is obtained by using the maximum interclass variance method for the mapped values.
In addition, the double-ended threshold strategy is used to supplement the adaptive segmentation method. If the weighted aggregated distance of the pixel is no more than a threshold T 1 , then the pixel is classified as a signal pixel. If the distance exceeds the threshold T 2 , then the pixel is classified as a noisy pixel.

Adaptive selection strategy
In the image recovery phase, the recovery strategy can greatly influence the recovery effect. For the recovery strategy, whether or not the recovered value is used should be discussed. If the restored part of the window is used as the original value, then the image cannot be easily recovered from heavy noise. Meanwhile, if the recovery value in the restored part of the window is used instead of the original value, then the details and edges of the image may be misinterpreted as noisy due to the uneven noise distribution. To balance these problems, [42] proposed a recovery strategy named course-to-fine detection. The courseto-fine detection strategy was used to detect the obvious noise and recover them in the course state, reducing the noise density and leaving the less obvious noise, which would be detected and recovered in fine state. However, the problem lies in assuming that coarse detection is accurate, as the quality of recovery cannot be easily guaranteed when the noise level is high. Using a less obvious noisy pixel as a replacement will negatively affect the fine detection. Consider the following processing window, where V s is the original value, V d is the recovered value, and V c is the centre point being processed. In the processing window, the recovered value usually has better properties. The probability for the recovered pixel to be a noisy pixel may be low. In a sense, it is equivalent to artificially reduce the noise density if the recovered value is used in the local window, which is extremely important in high noise. Consequently, we propose the following adaptive selection mechanism to adaptively replace some original values with recovered values.
where S is the Euclidean distance between the original value and the recovered value at the same image position, and H is a positive threshold.

Replacement with value of channel combination optimization
Most methods choose an "optimal" pixel as the output of the filter, but this solution has two drawbacks:

A possible situation is that the pixel does not have a neigh-
bour that is exactly equal to it on the original uncorrupted image. 2. The values on some channels of the pixels may be good, while those on the other channels may be quite different from the correct values.
In sum, the channels of a pixel cannot be easily evaluated as a whole. To combine the appropriate channels of pixels, we propose a strategy for combinatorial optimization as follows: where r, g, and b are the serial numbers of the R, G, and B channels, respectively. r, g, b ∈ -0, 1, 2, 3, 4, 5, 6, 7, 8P The combined scheme (r * , g * , b * ) is regarded as the optimal combination scheme and the colour vector P * t is considered as the ideal filter output which has the smallest weighted aggregated distance.

Noise model
To evaluate the restoration effects of different methods, an artificial impulse noise model capable of simulating natural impulse noise is needed. A widely used noise model is where r represents the noise probability, O i, j represents the uncontaminated pixel value at position (i,j) of the image, and N i, j represents the noise value contaminated at image (i,j). The simulation of the noise value can be divided into two steps. First, the three colour channels independently accept the corrosion of the noise value with probability r, and then, for each contaminated pixel, a correlation factor, such as = 0.5, is used to simulate channel correlation. That is, if a pixel is contaminated on at least one channel, each of the other uncontaminated channels receives impact corrosion with a probability of 0.5. Moreover, if the correlation factor = 0 is taken, then the corrosion of the three colour channels is irrelevant, and each channel independently receives the corrosion of impact noise with the probability r. We use four widely used objective criteria for evaluating restoration effects, namely, PSNR (peak signal to noise ratio), SSIM (structural similarity) [43], NCD (normalised colour distance), and FSIMc (feature similarity) [44]. The equation for these three objective criteria (PSNR, SSIM, NCD) is as follows:

Parameter selection
The parameters T 1 and T 2 used for impulse noise detection represent two kinds of deterministic judgments based on numerical amplitude of the weighted aggregate distance G w . When G w is less than T 1 , the pixel can be safely regarded as a signal pixel, and when G w is greater than T 2 , the pixel can be regarded as a noise pixel. T 1 is usually small and T 2 is usually large. In this study, T 1 and T 2 are set to 25 and 280. If G w is between T 1 , and T 2 , the noise will be detected by the adaptive detection mechanism  above. As for the difference S between the original value and the recovered value at the same image position, the parameter H which plays an adjustment and threshold role in the adaptive selection strategy, controls the sensitivity of the difference S in the selection probability (Equation 12), and reduces the noise density of the local image blocks. The larger S is, the more unreliable the original pixel at that position is. When S is greater than H , the original pixel at the position is extremely unreliable, and is substituted for the restored pixel with probability 1. By our extensive experiments, we found that the performance of image denoising is not sensitive to the setting of parameter H . For convenience, H is set to 100.

Performance comparison
The proposed algorithm is compared with other effective image restoration algorithms, namely, AWQD [38] and L0TV [45], on the natural images, partly shown in Figure 2, taken from the CSIQ database (http://vision.okstate.edu/csiq). A random impulse noise with a density range from 0.1 to 0.5 is added to each image, and the recovery quality is compared by using four objective criteria (PSNR, SSIM, NCD, FSIMc). Tables 1-4 show the performance of the proposed method and the other methods on the Shroom image on different objective criteria and different noise levels (0.1 to 0.5). Tables 1-4 show that on the PSNR and NCD criteria, the proposed method outperforms the other algorithms at each noise level. When the noise density is as high as 0.5, although L0TV shows a higher score in terms of SSIM and FSIMc, L0TV shows a poor performance in texture protection compared with the proposed method and produces fake colour blocks. To reflect the robustness and universality of the proposed algorithm, the Tables 5-8 present the comprehensive performance  of the aforementioned algorithms on the four objective criteria of test images withe r = 0.3. Figure 3 visually compares the performance of the aforementioned methods with r= 0.3. The proposed1 method shows better detail protection under low noise level, while the pro-posed2 method shows higher robustness under high noise level. AWQD demonstrates a good recovery performance at the 0.1 noise level and has well preserved both the colour and image structure; however, its recovery ability is greatly reduced when the noise gradually increases because the detection threshold set by this algorithm is difficult to control and the coarse-tofine detection and recovery may be difficult to achieve a good denoising effect. In the coarse detection stage of high-density noise, the image has a high noise density, and the recovery value of the coarse detection stage is very likely to be a less obvious noise. To make the original noisy pixels that remain in the fine stage become less obvious, the original signal pixel correspondingly becomes more anomalous, thereby leading to errors in fine detection and producing many structures where the original image does not exist. L0TV is less sensitive to noise density because it is based on data fidelity regularization and is robust at high noise levels. However, in the case of low or medium level noise density, the image is excessively smoothed, and L0TV lacks a sensitive detail protection mechanism, as seen in Figure 4 (the left image represents the L0TV result, while the right image represents the proposed method's result). Furthermore, when the local noise similarity is high, L0TV produces many fake colour blobs as shown in Figure 5. Table 9 lists the running time of different methods on denoising the test images, corrupted by impulse noise of r = 0.2. The test is performed on a desktop computer running 64bit Windows 7 with 3.2 GHz Intel Core (TM) i7-4790 CPU and 16 GB memory. The main computation cost of our proposed algorithm comes from finding the optimal solution of the constrained least quadratic form as Equation 7 (in this study, we use Goldfarb-Idnani active-set dual method (http://www. javaquant.net/papers/goldfarbidnani.pdf) to solve it). Although the running time of the proposed algorithm is longer than other methods except AWQD, it is still competitive when taking the denoising performance into consideration.

CONCLUSION
In this paper, the degree of aggregation is proposed, based on which we propose a weighting method to obtain an optimal weight vector by minimizing the degree of aggregation. This weight vector, which has larger components on the signal pixels than on the noisy pixels, is integrated with the aggregated distance to form a weighted aggregated distance. Subsequently an adaptive segmentation method based on the sigmoid function, is proposed to classify each pixel as a signal pixel or a noisy pixel. To reduce the local window noise density and avoid edge distortion, the adaptive selection strategy is also incorporated. Finally, the noisy pixels detected are replaced by the values of the channel combination optimization. Compared with other methods, experimental results show that the proposed algorithm can effectively protect the image details and ensure the quality of noise removal. However, one problem to this algorithm is the high computational complexity. It may speed up by a more rapid method of constrained quadratic minimization, and we remain it as the future works. (A.1)

A.1 Positive definite matrix proof of proposed1
Define S m : The modified matrix A is: According to Equation(A.2), we have: The quadratic form of matrix A is all composed of square terms, and Δ is greater than 0. When W is not equal to zero vector, it(W T * A * W ) must be a positive value. In conclusion, A is a positive definite matrix.

A.2 Positive definite matrix proof of proposed2
Since D is a real symmetric matrix, then we have, D = P T * Λ * P (A.5) while P is a orthogonal matrix, Λ is a diagonal matrix, and its diagonal elements are all the eigenvalues of matrix D.
The modified matrix B can be written as, while I is an identity matrix and M v is the minimum eigenvalue of matrix D. Because Δ > 0, all eigenvalues of matrix B are greater than 0, so matrix B is a positive definite matrix.