Artefact-free image stitching via a better normed seam-cutting energy function

Image stitching, as the important ﬁeld of computer graphics and vision, has received much attention in recent years. Image stitching techniques are generally decomposed into two phases: image alignment, which aligns target images with the reference images; and image composition, which ﬁxes ghosting and visual artefacts. This work aims to propose a new strategy for the seam-cutting method which provides visually appealing result. Seam-cutting is one of the most inﬂuential methods in image composition, which can relieve arte-facts and produce plausible results. However, it is observed that the state-of-the-art seam-cutting approaches usually lead to undesirable seams in some challenging scenes. Here, the authors put forward a novel seam-cutting method by deﬁning a new energy function. This method uses 5 ∕ 2 power of L 1 norm as a colour difference which can magnify the weight of colour distinction to avoid undesirable seams and artefacts. The proposed method can be easily implemented. The test images are collected from the public available challenging datasets and taken by ourselves. Experiments demonstrate that the proposed method can create comparable or even better stitching results compared to other state-of-the-art seam-cutting approaches.


INTRODUCTION
Image stitching is a well-studied problem with widespread applications [1][2][3][4][5]. Stitching algorithms often produce visual artefacts in the dynamic scene with moving objects or static scenes with objects of different depths. The images cannot always be stitched due to the parallax, noticeable artefacts that usually exist in the stitched results. Therefore, image stitching remains a challenging problem.
Traditionally, the stitching problem is decomposed into two stages: image alignment and composition. In the two stages, there are two main approaches for artefact-free results. Ideally, both approaches should be jointly used for the best results. The first is to align the images as accurately as possible, which is called spatially varying warps, such as dual-homography warping (DHW) [6], SVA [7], as-projective-as-possible (APAP) [8] and robust elastic warping (REW) [9]. A representative work is APAP warps. These methods can achieve high-precision local alignment results. When the images have small parallax, these methods are usually effective. However, most of the stitching This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2020 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology methods cannot produce visually pleasant stitching results for the images with large parallax. The second is to use advanced image composition methods, such as blending [10,11] and seam-cutting [12,13], which are powerful methods to handle parallax and can produce visually plausible stitching results. The seam-cutting method can determine the pixels from which image comes in the final results. The representative works include [14][15][16].
Note that the seam-cutting method can relieve artefacts of image stitching to create better results. In recent work, Li et al. [17] proposed perception-based seam-cutting method to produce acceptable stitching results. Liao et al. [18] proposed quality-based iteration method to find an optimal seam. These methods can update the location of the current seam. However, these methods have limitations: (1) human perception is difficult to be defined by sigmoid-metric. The quality of new seam might be worse in a complex salient or texture regions; (2) it is difficult to choose appropriate parameters; (3) and the iteration method has high computational complexity.

FIGURE 1
Comparison of different seam-cutting methods. Zoom in this figure to observe the difference. Top: Seam and stitching result of the classic seam-cutting method, respectively. Bottom: Seam and stitching result of our seam-cutting method, respectively. The conventional seam-cutting method does produce tearing in the red rectangle. Our method has no perceptible artefacts Recent image stitching methods commonly focus on better alignment. While these methods can obtain better alignment, they still cannot create artefact-free stitching results. Inspired by the work of [17] and our observation, the seam over the whole overlapping area should avoid complex salient or texture regions. We observed by experiments that the 5∕2 power of L 1 norm can obtain the most effective in all norm and sigmoid functions. The ideal seam should pass through aligned regions. The goal of our study is to obtain artefact-free image stitching. The proposed method works well for artefact-errors caused by misalignment. The fundamental mathematical model of this paper comes from [12]. Our finding is based on the observation that different norms and functions have different effects on distinguishing colour and gradient. We desire that the seam passes through aligned regions as much as possible. The motivation of our scheme is that the novel energy function should search for a better seam. This is an effective strategy to stitch images under large and complex parallax. Our method can obtain good stitching results. This paper finds the optimal stitching seam by graph-cut algorithm [19] and uses Poisson fusion [11] to eliminate the stitching line. The contrastive result in Figure 1 demonstrates different energy functions produce different seams and stitching results in the seam-cutting methods.
This paper performed comparative experiments of the proposed method on publicly available challenging datasets, including Parallax [15] and Seagull [16]. In our experiments, we use the same global homography to align the candidate image with the reference image. We utilize the uniform metric criteria to evaluate seam quality along the stitching seam. Our stitching method can be easily implemented and produce a pleasant stitching result. Experiments clearly show that we can produce visually plausible results in most of the cases.
The remainder of the paper is summarized as follows. Section 2 briefly describes related work of image stitching. Section 3 introduces our proposed method. Section 4 presents the experiments. We conclude in Section 5.

RELATED WORK
Image stitching is a popular field in computer graphics and vision. A description of fundamental concepts and many associated approaches are available in a comprehensive survey [20].
In this section, we give a brief review of the related works from a different viewpoint about artefact-free image stitching.

Traditional techniques
The traditional method is the mainstream in the field of image stitching. Homographic stitching methods. When the images are captured in the common camera centre or approximate planar scenes, the methods based on single homography is valid and can create artefact-free stitching results. But the image conditions are easily violated in practice. These methods will produce artefact results if parallax exists.
Spatially varying methods. Spatially varying methods are introduced to create artefact-free stitching results. Gao et al. [6] proposed a dual-homography warping to construct mosaic. It can generate artefact-free stitching results when the scene can be divided into two planes. Lin et al. [7] proposed a stitching method based on a smoothly varying affine field to stitch the images. Zaragoza et al. [8] proposed an APAP warps to flexibly stitch images. It can yield highly accurate stitching results and significantly reduce artefacts. Joo et al. [21] proposed a unified moving direct linear transformation (DLT) framework via integrating with point and line correspondence cues. This method can obtain an accurate and parallax robust alignment for artefact-free results. Li et al. [22] presented a dual-feature warping method via both line correspondences and the sparse feature matches. Li et al. [9] proposed a robust elastic warping method that could achieve accurate alignment. However, their method can only handle small parallax. Li et al. [23] presented bundle robust alignment model for stitching images based on pixels directly. Li et al. [24] used triangular facet approximation method to align images. When the images are perspective and non-perspective, this method could achieve local-adaptive alignment. Lee et al. [25] proposed a novel stitching method based on the novel concept of warping residuals. This method can alleviate the parallax artefacts to produce an artefact-free result.
Shape-preserving methods. Chang et al. [26] presented shapepreserving half-projective warp to create natural-looking stitching results. Lin et al. [27] combine two stitching fields to construct better panoramas. Chen and Chuang [28] proposed a local warp model to obtain visually plausible result. Zhang et al. [29] introduced a mesh-based framework which consists of alignment and a set of prior constraints to optimize alignment and regularity. Li et al. [30] developed a quasi-homography warp to balance the perspective and projective distortion in the nonoverlapping region. Xiang et al. [31] combined global similarity constraint with projective warping by adding a new weight integration strategy. Liu and Chai [32] proposed a new shapeoptimizing and illumination-smoothing multiple image stitching method. This method can create artefact-free by solving the problem of projective distortion and colour difference. Liao and Li [33] proposed two single-perspective warps for image stitching. This method can reduce projective distortion so that it can create artefact-free stitching results.
Seam-based stitching methods. Gao et al. [14] proposed a seamdriven strategy based on a set of homography. Zhang and Liu [15] combined projective transform and content-preserving warping based on a hybrid alignment model. This method can handle large parallax and alleviate local distortion. Lin et al. [16] explicitly used the estimated seam to guide local alignment which can improve the seam quality over each iteration. Herrmann et al. [34] proposed an approach which use multiple registration approach and improved Markov random field (MRF) energy to generate good stitching results. Herrmann et al. [35] took an object-centred approach and leveraged recent advances in object detection into seam finding. One strategy they employed was to modify the energy function used in the seam-cutting stage. Hejazifar [36] proposed a fast seam-finding method to avoid undesirable seam and ghosting. They considered grey-weighted distance and gradient-domain region of differences to find an optimal seam. Li et al. [17] proposed a perception-based seam-cutting approach that took into account human perception in the energy minimization. Liao et al. [18] proposed an iterative seam estimation method to find an optimal seam. Our methods fall into this category.

Deep learning techniques
Recently, deep learning is a hot topic field. Since the groundtruth cannot be defined, there are no end-to-end deep learning methods in the image stitching field. As far as I know, deep learning is more commonly used in a partial process of image stitching, such as feature extraction, image warping and image blending. For example, traditional image stitching techniques based on traditional features often fail to handle lowtexture or inapparent context scenes. To tackle those limitations, researchers considered one proposed approach based on deep learning because deep learning can automatically extract features. It is to learn features from big data automatically. Good features can greatly improve the performance of alignment. Yan et al. [37] proposed to employ the SIFT and single-hidden layer feed-forward neural network to better align images. The method has higher alignment accuracy and faster speed. Sampetoding et al. [38] utilized deep learning-based global feature descriptor to match points. Hoang et al. [39] utilized convolutional neural networks (CNN) architecture based on multi-scale key point detection and feature description to match and select corresponding points. Kang et al. [40] developed a two-step image alignment method based on deep learning and iterative optimization. Detone et al. [41] proposed an end-to-end deep convolutional neural network to estimate homography. Kang et al. [42] presented homography estimation based on a hybrid framework that incorporates a deep learning approach and energy minimization. Gada et al. [43] used neural networks and field programmable gate arrays (FPGAs) to speed up the process of image stitching. Poursaeed et al. [44] proposed an end-to-end neural network architecture to calculate fundamental matrices. It is not relying on feature correspondences. Bayer et al. [45] developed an imitation deep learning framework to align 2D colour funduscopic images. Because of the difficulty of defining ground truth for image stitching, several deep learning techniques are difficult to stitch the images end to end.

PROPOSED METHOD
In this section, we first describe the conventional seam-cutting method used in image stitching. Then, we provide a detailed presentation of our proposed method. Finally, we show our method framework.

Conventional seam-cutting method
Here, let us briefly describe the conventional seam-cutting method. More associated details of the conventional seamcutting method can be found in [12,13].
To keep this paper concise, we take the two images as an example. Given the overlapping regions of pairs of image I 0 and I 1 aligned with the single homography, I 0 is the reference image and I 1 is the candidate image. The overlapping region of the two images (I 0 and I 1 ) is marked with Ω. N is a fourconnected neighbourhood system defined in Ω. We define a label set l (p) = {0, 1}, where p is a pixel in Ω. The seam-cutting method is used to assign a pixel label l (p) to pixel p. If l (p) = 0, it means that the pixel p comes from I 0 . If l (p) = 1, it means that the pixel p comes from I 1 . The pixel p is selected to put in the final stitched image. If the labels of adjacent pixels p, q are not equal, that is to say l (p) ≠ l (q), the seam must exist between pixels p and q.
We already know that the goal of the seam-cutting method is to assign a labelling l (p) to pixel p. The labelling problem on MRF can be found by solving a global energy function. The total energy function has two terms, including data term and the smoothness term. The total energy function E (l ) can be written as where E data is the data term which reflects the saliency of a pixel, p, with label l (p). E smoothness is a smoothness term that represents the discontinuity of adjacent pixels, p, and q. is the weight, which is usually set to 1 in our method.
Following the formulation introduced by [12], the data term E data (p, l (p)) over all pixel p is defined to be the specified value at pixel p: where A is the border of I 0 and Ω. B is the border of I 1 and Ω. E data reflects the saliency of a pixel, p, with label l (p). The E data determines which pixels are likely to produce good result. The smoothness term E (p, q, l (p), l (q)) measures the discontinuity of neighbouring pixels, p and q, defined over a four-connected neighbourhood N . The smoothness term E (p, q, l (p), l (q)) is defined as follows: If l (p) ≠ l (q), the smoothness term is defined as the colourbased difference of the overlapped pixels. The difference of the overlapped pixels is characterized by the L 2 norm. Obviously, the smoothness term plays an extremely important part in the definition of the total energy function. Because graph cuts algorithm [19] optimizes over all pixels and finds the minimal value in polynomial-time, this paper prefers graph cuts [19] to solve the total energy function.

Our smoothness term
Obviously, different energy functions will find a different seam. Our method is built upon an observation that there is a noticeable artifact in this final stitched image. Motivated by [17] and our observation that these final stitched result still occurs noticeable artefact, as illustrated in Figure 2. It is realized that a good seam should pass through only the most unnoticeable part of the image. Despite [17] considered to use saliency measure and human perceptual properties to define the colour difference, it may appear noticeable artefact in the final result. From the visual point of view, the stitched results based on 5/2 power of L 1 norm seam-cutting method slightly better than  other norm-based seam-cutting methods produce severe artefacts. Hence, we choose 5∕2 power of L 1 norm. Therefore, we want to define a function such that its cost of aligned regions approximates zero; meanwhile, the cost of misaligned regions is large enough. So we can distinguish aligned and misaligned positions successfully.
Note that x is a six-component colour difference (in R, G and B) of image I 0 and I 1 at pixel p in Ω. The Euclidean-metric (L 2 norm) function can be expressed as In Figure 3, it cannot distinguish misaligned regions. In Figure 4, the seam is not good since the Euclidean-metric does not give it a large enough penalty. The sigmoid-metric function can be expressed as where is the maximum between-class variance in the Ostu's algorithm. The value range of this function is between 0 and 1. This function has some limitations. When there are many objects in the current scene, it cannot obtain a good seam. In Figure 4, the seam is not visually pleased. Furthermore, the function contains parameters that need to set parameter manually.
In particular, the image content is nonuniform, which means that the penalty value in the misaligned region is bigger than the one in the aligned region. In Figure 3, the black area indicates that the pixels are equal, and the white area indicates that the pixels are not equal. We desire that the colour difference is distinguished as much as possible. So, we define a better normed function.
The selection of fractional-order is essential for quantifying the colour difference. Our energy function can enforce the differentiation of the misaligned region in the overlap region and the function can successfully avoid seam in the misaligned region. Fortunately, we found a fractional-order norm function is a suitable quality metric for our purpose.
Thus, the smoothness term can be re-written as Now we explain the new smoothness term. Briefly, we illustrate the computation of the new smoothness term using the following example. Given two pixels p and q in the overlapped region and the colour difference value is 2 ≈ 3.517. Our value is bigger than the original value. If the difference between the two pixels is large, our computed value will be larger, because the new smoothness term gives a large enough penalty.
The improved energy function E ′ (l ) can be written as Output: Stitched image.

2:
Calculate a global optimal homography via DLT [48] and align I 0 and I 1 .

3:
Calculate the intersection of aligned images to obtain the overlapping region Ω and its boundary A and B.

5:
Calculate the proposed smoothness term E ′ smoothness in (9). 6: Combine E data and E ′ smoothness into the MRF energy function E ′ (l ) in Equation (12).

8:
Use Poisson fusion [11] to blend the aligned images.

Our stitching pipeline
Our stitching pipeline is represented in Figure 5. The detailed steps are shown in Algorithm 1. In alignment stage, we detect and match features by SIFT [46] since [47] has compared SURF and SIFT, and found that SIFT performed well. We estimate single optimal homography via DLT [48]. We employ the homography to align images and obtain an overlapping region. In seam finding stage, we solve MRF energy function that contains data term and our proposed smoothness term via the graph-cut algorithm [19]. In blending stage, we utilize Poisson fusion [11] to construct the final stitching results.

Experimental details
Our goal is to stitch images that reduce noticeable artefact. We validate the effectiveness and innovations of our method with several experiments. Our experiments are performed on a PC with 3.0 GHz CPU, 16GB RAM. The test images are collected from public available datasets: Parallax [15] and Seagull [16], which can be downloaded from their project website. Besides, we also captured images of natural scenes with parallax by mobile phones. In our experiments, the feature points of all examples are detected and matched using SIFT [46] because [47] found that SIFT [46] generally gave the best results.
For the compared methods, we consider Parallax-tolerant [15], Seagull [16], conventional seam-cutting method, perception-based seam-cutting method [17] and evaluationbased seam-cutting method [18]. Note that this paper only shows some representative results. The supplementary materials are provided in the website link. 1

Ablation study
We performed an ablation experiment on the new smoothness term in the MRF energy function. The different smoothness term has different seam result. As we can see, our proposed smoothness term can give better performance. These results are available in Figures 6 and 7.

Visual evaluation
Ideally, there are a standard dataset of images to stitch based on an evaluation benchmark. Unfortunately, it is difficult to define ground-truth for image stitching. And there is no superior benchmark to use other than visual evaluation. The objective evaluation of image stitching algorithms has been a roadblock.
We demonstrate the advantages of our method in some examples. We highlight a few notable regions in Figures 8  and 9. To keep the paper concise, we show representative stitching results only for two image pairs, but more stitching results and intermediate results are found in the supplementary material. Figures 8 and 9 show results of six methods on Parallax datasets [15]. Parallax datasets [15] contain 33 challenging image pairs with a significant amount of parallax. For Parallax-tolerant [15] and Seagull [16] methods, these results appear good at first glance. But the artefact error happened in the stitching results, as indicated by the red rectangle in Figure 9. For the conventional seam-cutting method, the traffic light is duplicated, as indicated by the red rectangle in Figure 8. The lightning rod is teared, as indicated by the red rectangle in Figure 9. For the perceptionbased seam-cutting method [17], the shrub is duplicated in Figure 9. For evaluation-based seam-cutting method [18], the traffic light is duplicated in Figure 8. The streetlight is cropped, as indicated by the red rectangle in Figure 9. However, our method can handle this challenging example and produce visually acceptable stitching results. Our results are artefact-free, as indicated by the green rectangle in Figures 8 and 9. Note that the source code of Parallax-tolerant [15] and Seagull [16] are not available; we only demonstrate intermediate seam results of these three methods and our method. Our seam results are visually plausible for the given examples.

Seam quality assessment
To demonstrate the effectiveness, we compare the final stitching seam quality with our method, conventional seam-cutting method and perception-based seam-cutting method [17].
Considering the fairness of the experiments, it depends on the   [17]. (c) Evaluation-based seam-cutting method [18]. (d) Our method

FIGURE 11
The intermediate seam results on the image pair 'No. 24' of Parallax-tolerant datasets [15]. (a) Conventional seam-cutting method. (b) Perceptionbased seam-cutting method [17]. (c) Evaluation-based seam-cutting method [18]. (d) Our method In this experiment, we set a 15 × 15 local patch which centred at p i along with the final stitching seam. Following the seam quality introduced by [16], the metric is written as where N is the total number of pixels on the final stitching seam. The ZNCC (p i ) is the zero-normalized cross-correlation score between the local patch in the two images. The seam quality is shown in Table 1. Particularly, the seam quality is listed in columns 'STD', ' [17]' and 'Ours', respectively. We use the seam quality calculated by the conventional seam-cutting method as the 'STD' value. A smaller value usually indicates a better stitching seam. The bold value is the smallest. We can see that our method improves seam quality. In some examples (i.e. 10,11,22,25,26,30), the seam quality of [17] is better than our method because these images have less saliency and context. But our stitching results are all visually plausible. Specifically, we provide a visual comparison of some examples (i.e. 10, 11) in Figure 12.  Table 1 FIGURE 13 Comparison between different parameters . The different parameters get the same seam quality score. The parameter is not likely to affect the seam and stitching result All 30 pairs of images are given in the supplementary material.

Parameter validation
In Figure 13, we experimentally find that the parameter is unlikely to affect the experimental results. Here, we set to 1 in our experiments. More results of the validation are provided in the supplementary material.

CONCLUSIONS
In this paper, we proposed an efficient image stitching method based on a better normed energy function. The experiments have shown that our method can often produce artefacts-free results in examples of challenging scenes. The visual and quantitative evaluation demonstrates that our method can create acceptable stitching results without complex saliency detection. And our method is comparable to other seam-based image stitching methods. In future work, we will concentrate on more parallax challenges and deep learning-based techniques to improve stitching approaches.