Depth map super-resolution via shape-adaptive non-local regression and direction-based local smoothness

In thisletter,anovelsingledepthmapsuper-resolutionalgorithmispro- posed, which combines the non-local prior and local smoothness prior. Unlike the color-guided methods, the proposed method does not need a corresponding color image to aid the depth map super-resolution. To explore the non-local self-similarity in the depth map, a shape-adaptive adjusted non-local regression is constructed using the shape-adaptive similar patch groups. This prior can make full use of the non-local in- formation of the depth map and alleviate the effect of irrelevant pixels.


Introduction:
To acquire high quality depth maps, depth map superresolution (SR) methods have been proposed. These methods can be classified into colour-guided [1][2][3] and non colour-guided methods [4][5][6]. Colour-guided methods recover depth maps by using the corresponding colour image as a powerful aid. However, in many cases, the corresponding colour images are not always available. Non colourguided methods can infer a high-resolution (HR) depth map from a single low-resolution (LR) input without the assistance of additional intensity images. Benefit from this merit, in this letter, we focus on the non colour-guided method.
Non-local self-similarity is an important feature of the depth map, which exploits the correlation between patches having similar patterns. Based on this character, many effective methods have been proposed [7,8]. Among these methods, the non-local mean (NLM) is one of the representative works. Although the above methods exploit the correlation between patches, the non-local feature is formulated only by using the center pixel of each patch, which results in that the non-local selfsimilarity cannot be fully exploited. Furthermore, for the region that does not locally smooth, there may be great differences in grey value and neighborhood structure between the current pixel and neighbouring pixels. Although small weights are assigned to these neighbouring pixels, the estimated depth map will be blurred because of these irrelevant pixels. Edges play an important role in the texture-less depth map. To restore these structures, most of the methods penalise local smoothness along with horizontal and vertical directions(e.g., [4,6]). Although these methods can obtain good results, the depth map has different edge structures and these methods will lead to a lack of directional information. Besides, they usually only consider the local structure of the depth map and ignore the patch correlation in the depth maps.
In this letter, we propose a novel single depth map SR method without the guidance of colour image. The proposed method combines the advantage of the non-local self-similarity and the local directional smoothness. To fully exploit the non-local self-similarity of the depth map, we introduce the adjusted non-local regression (ANLR) prior [9], which can make full use of the non-local information of the depth map. However, this prior gives each similar pixel a weight even for the irrelevant pixel. When the patch contains considerable variations, the estimated result will be blurred. To solve this problem, we proposed the shape-adaptive (SA) ANLR prior, which can alleviate the effect of irrelevant pixels and suppress the blurring artifacts. To construct different directional information, we first extract directional features of depth maps using the curvelet transform and then build a direction-based local smoothness Optimisation algorithm: Depth map SR can be formulated as the following minimisation problem: Where D and H stand for downsampling and the blurring operator, γ is the regularization parameter. R(X ) is the image prior. The effective prior is important for the SR performance. The proposed method combines two effective priors of non-local self-similarity and local smoothness for the depth map recovery.
Non-local self-similarity: Since the ANLR prior uses the patchwise strategy, it can fully exploit the non-local self-similarity and fully utilize the non-local information of the depth map. However, when the target pixel is in the edge region, the estimated value of it will be inaccurate because of the irrelevant pixel in the similar pixel group. To solve this problem, SA-ANLR prior is adopted. In SA-ANLR prior, the SA patch is used instead of the fixed-size square patch. Since the shape of the patch is adaptive to the image structures [10], the SA patch can avoid smoothing across edges. To extract SA patches efficiently, the anisotropic local polynomial approximation-intersection of confidence intervals (LPA-ICI) technique [11] is used. For each target pixel x ∈ X , we extract a target patch of size b × b centered at the target pixel and an SA mask is obtained for the target patch by its shape-adaptive neighborhood. For each target patch, its N closest patches p n ] is gotten by masking these similar patches. M(·) is the operator of extracting homogeneous pixels in the SA mask. The SA similar patch group (SA-SPG) is shown in Figure 1. Because of the patchwise strategy [9], each pixel belongs to B overlapped patches , which is an overlap-based scheme. M xm is the SA mask of the neighbor m of the target pixel x, b 2 − m + 1 is the corresponding position of the target pixel in the patch centered at m. Because there are N similar patches for each overlapped patch, so each pixel has BN similar pixels. We call these BN similar pixels an SA overlap-based similar pixel group (SA-OSPG).
where G m x is the SA similar patch group of neighbor m of the target pixel x, R b 2 −m+1 is the operation of extracting the b 2 − m + 1th pixel of each patch in G m x . The SA-OSPG x is shown in Figure 1, where the entries in black mean irrelevant pixels.
Before constructing the SA-ANLR prior, the similarity weight of each pixel in one SA-OSPG needs to be calculated. However, for each similar pixel group, the similarity weight computed from original pixel pairs may be unstable in some cases. To improve the performance, we first filter each pixel in X . For the pixel x, we redefine it by the weighted average of the pixels in its shape-adaptive neighborhood. Then, the weighted pixel x w is given by: where w( j) is the weight of the neighbor x j , u + x is the shape-adaptive neighborhood of x. Using these redefined pixels, the similarity weight is defined by: Finally, we can formulate the SA-ANLR prior as: where is the diagonal matrix, and w x is the element on the main diagonal. w x = α/ max(α, var( x ) β ), α and β are positive constants, var( x ) is the variance of the SA-OSPG of x. The weight w x penalizes the reliability of the SA-OSPG, which enhances the robustness of data estimation. When the standard deviation is small, it means the SA-OSPG is reliable and a large weight can be imposed.
Local smoothness: The edge plays an important role in depth maps and has significant directional characteristics. To preserve the directional characteristics, most of the methods force local smoothness along with horizontal and vertical directions. These methods will lead to a lack of directional information since the depth map has different edge structures. In this letter, we use local directional smoothness prior to model this character. The directional smoothness means that in the depth map the intensity difference between adjacent pixels in a specific direction is small. The directional smoothness represents structures of the depth map at the pixel level and is a complement of the non-local prior. The curvelet transform [12] as a multi-scale directional transform possesses very high directional sensitivity [13]. Since it can represent edges more precisely, we use curvelet transform to extract directional features of depth maps. For a given depth map X , the curvelet coefficients can be obtained by where (.) is the curvelet transform. Q is a set of curvelet coefficient matrices of X and can be expressed as {Q s,l }. s and l are scales and directions, respectively. The depth map is decomposed into five scales. The curvelet coefficient matrixes on the first scale are smooth details in the depth map. The curvelet coefficient matrices from the second to fifth scales are highfrequency coefficients, and each coefficient matrix represents a direction matrix. Since there is more directional information at finer scales, we only consider the finest scale. There are 64 curvelet coefficient matrices on the finest scale. Due to the symmetry of curvelet coefficients, we only use half the matrices on the finest scale and divide them into 16 direction subsets Z k , k = 1, . . . , 16.
The multi-directional matrices A k of the input image of the kth direction can be formulated as where −1 is the curvelet inverse transform. The directional features A k represent the direction intensity of X in the kth direction. When the value of A k (i, j) is large, it means that X (i, j) has more feature information in the the kth direction.
In this letter, we defined the direction-based local smoothness prior as: where the first term is the depth weight. However, the depth weight cannot reflect the rich structures of depth maps. To get the directional information, we introduce the directional feature A k to the weight W L k . When the value of A k (i, j) is large, it means the kth direction is the principal direction of X (i, j). According to the directional feature, we can get the principal direction of the image. In the principal direction, the adjacent pixels is more similar and a higher weight should be given G k X . To achieve this, we add weighting parameter H (A k ) to the depth weight. H (A k ) is the operation of extracting the directional feature value for the pixel in edge region.
Finally, by combining the non-local self-similarity prior and local smoothness prior, we have a joint prior model: Equation (10) can be rewritten as: Equation (11) is a l − 1 related minimization problem. Here, we take a Split Bregman iteration approach to solve it [14].
Experimental results: We use six data sets from the Middlebury benchmark database, that is Art, Book, Moebius, Reindeer, Laundry, and Dolls, to evaluate the performance. Because the test images are too big and time-consuming, we only use a small part of the whole picture. To generate LR depth maps, the ground truth images are blurred by a 7 × 7 Gaussian kernel with a standard deviation 1.15 and then downsampled to the desired resolution. We test the up-sampling performance at factors of 2 and 4. To evaluate our method, we compare the proposed method with four methods, including the color-guide methods, that is anisotropic total generalized variation (TGV) [15] and static and dynamic guided filtering (SDF) [16], non colour-guide methods, that is edge-guided (EG) [5] and ScSR [17]. The number of similar patches N is 10, the patch size is 7 × 7, and the size of the search window is 13 × 13. The control parameter γ and λ are set to 0.074 and 0.016, respectively. Table 1 presents the up-sampling results on quantitative evaluation and mean absolute error (MAE) is used as the evaluation metric. The first   two methods in the tables are colour-guide methods. For the scaling factor 2, our proposed method obtains the lowest MAE and performs even better than the color-guided methods. For the scaling factor 4, our proposed method performs better than the single depth SR methods as well. Figure 2 shows visual comparison results at 4× up-sampling for image Art. From the figures, we can see that the TGV [15] method produces texture copy artifacts in some regions where the inconsistency between colour and depth maps is not considered. This fact is clearly confirmed by corresponding error maps. In SDF [16], static and dynamic guidance is utilized to eliminate the inconsistency, but it fails in some cases. The result of ScSR [17] introduced displeasing artifacts. Its error map shows that there are also noticeable errors near edges because of the ringing artifact. The result of the EG [5] has jagging artifacts when an accurate edge is not available. From its error map, we can see that there are large errors around the depth discontinuities. This is because EG method is highly dependent on an accurate edge. Our method has much smaller errors around the depth map edges.
To evaluate the effectiveness of the proposed method, we also compare our method with DCLS (only direction-based local smoothness prior is used), ANLR, SA-ANLR. Table 2 presents the quantitative depth recovery results for 2 up-sampling. Figure 3 gives the results of a visual comparison (the patches in the red boxes are highlighted regions). From the results shown in Table 2 and Figure 3, we can see that the results of DCLS are the worst. The SA-ANLR produces better results than ANLR, which proves the effectiveness of the SA-ANLR. The proposed method obtains the best performance and reconstructs the depth map with a sharp edge, which shows that the combination of the SA-ANLR and DCLS is effective for restoring accurate depth maps.
In general, the proposed method can significantly improve the SR performance. The drawback of it is the computational complexity since the local structure analyzing, the non-local similarly pixels searching and iterative optimization of the objective equation are time-consuming. In our future work, we will try to improve the efficiency of the proposed method.
Conclusion: In this letter, we have introduced a novel method for a single depth map recovery without a corresponding color image as an assistant. The proposed method introduces a joint non-local self-similarity and local smoothness strategy. To fully exploit non-local self-similarity in the depth map, we build an SA-ANLR prior, which can alleviate the effect of irrelevant pixels and recovery sharp edges. To character different local edge information, we use the direction-based local smoothness. The proposed method combines the advances of the two priors and can recover fine structures. In both quantitative and qualitative evaluations, the proposed method can achieve good performance.