An adaptive level set method based on joint estimation dealing with intensity inhomogeneity

Automatic object segmentation has been a challenging task due to intensity inhomogeneity. The traditional way is to eliminate the intensity inhomogeneity, which causes the object to lose useful intensity information. The authors propose an adaptive level set method for the segmentation of intensity inhomogeneous images. Firstly, global and local features are utilised to collaboratively estimate the image, which devotes to compensating for intensity inhomogeneity. The local estimation retains detailed spatial information, and the global estimation mainly contains the regional information of the partitioned object. Then, during the construction of the energy functional, joint estimation is introduced to create the external energy. To acquire the precise location of the boundary, a weighting factor indicated by the gradient is introduced into the internal energy. Finally, after the numerical calculation of the energy functional by additive operator splitting algorithm, this method achieves the desired performance in terms of accuracy and robustness. Experimental results verify this method outperforms the comparative methods and can be applied to many real-world sce-narios.


INTRODUCTION
Image segmentation occupies a fundamental position in the field of machine vision. As seen from the latest researches of surface defect detection, for example, leather defect [1], solder ball cracking [2], image acquisition enhancement [3], weld defect [4], stacked substrates counting [5], and so on, the object extraction and image segmentation remain the base tasks and critical issues. In practical or industrial scenarios, imperfection in camera and uneven illumination make intensity inhomogeneity inevitably occur in images. It means various contrasts may appear in a different position or inside a similar object. These unavoidable variations lead to poor segmentation and identification of image objects.
full-resolution feature maps. U-Net [7] puts splicing features together in the channel dimension to form richer features. The major contribution of PSPNet [8] is Global Pyramid Pooling, which scales the feature map to several different sizes so that the features have better global and multi-scale information in improving segmentation accuracy. DeepLab [9] is called Dilated Convolution and uses the atrous algorithm to expand the receptive field and obtain more contextual information, which is pretty important in acquiring the features of various objects. The effect of the larger receptive field is that more global information of the image can be grasped when the feature map is reduced to a small size. Mask R-CNN [10] combines object detection with semantic segmentation which outputs the object category, bounding box, and mask, it can accurately locate the position of the object in the feature map. Actually, in the aforementioned segmentation models, many details and tiny objects cannot be effectively processed, while U-2-Net [11] performs talented in preserving details and edges. From its structure, we can see that U-2-Net obtains as much semantic information as possible by stacking more sequence-to-sequence model (i.e. encoder and decoder), and stacks the blocks of the work to get a larger gain. The architecture makes it the most successful and applicable salient segmentation model recently. However, deep learning is still faced with the following limitations. (1) Data-set requirement and labelling cost [12]. Deep learning bases on a great number of data, and the labelling cost of data to define the ground-truth is relatively large. (2) High hardware requirements [13]. Deep learning requires high computing power, and ordinary CPUs can no longer meet the requirements of deep learning.
(3) Complicated model design [14,15]. The design of deep learning models is very complex, requiring a lot of manpower, material resources, and time to develop new algorithms and models. Most researches and applications can only use readymade models.
As an implicit active contour model, the level set method (LSM) is well investigated as it could track dynamic interfacing and shapes. LSM generally determines the driving energy to define a level set functional (LSF) and then solves a partial differential equation of LSF to conduct the contour evolution. Recently, various research works have focused on tackling intensity inhomogeneity, they can be classified into three categories.
The first category uses local information to construct energy functional [16][17][18][19][20] [21]. Generally, the spatially varying region information [16], local intensity means [17] and local intensity variations [19,20] are efficient features to deal with intensity inhomogeneity. However, these methods consider that intensity inhomogeneity mainly affects the non-contour pixels, therefore, they are extremely sensitive to initialisation and sometimes invalid. The second category combines global and local intensity information to construct energy functional [22][23][24][25][26] [27]. Both average convolution operator and average mean are used to represent and describe the features of the original image [25]. In particular, the first two categories solely apply statistical constants (local or global information). They aim to obtain intensity differences, intensity averages, and variations to enhance the contrast between the foreground and background and ignore pre-processing of intensity inhomogeneity. The third category constructs energy functional based on image estimation and treats intensity inhomogeneity as a component [28] to approximate image regions linearly [29,28,30,31,32] or nonlinearly [33][34][35][36]. The essential step of this category is to model intensity inhomogeneity as a bias field and define a region descriptor for images. Ali et al. [29] proposed a linear region descriptor to LSM for intensity inhomogeneous images. Unfortunately, the energy model they utilise is mostly for noise problems and inefficient to deal with intensity inhomogeneity, as intensity inhomogeneity remains the key matter. Zhang et al. [32] used the intensity average to linearly estimate image. Since intensity difference varies in different regions, it is unreliable to depend solely on average intensity. A LATE method based on Taylor Expansion [33] was proposed by using nonlinear estimation to deal with intensity inhomogeneity. However, this method still needs local region information to measure the intensity inhomogeneity and brings a huge loss of raw intensity information as the estimation process eliminates the intensity inhomogeneity severely. Wang et al. [36] introduced posterior probability to construct nonlinear LSF; the model only considers the probability of pixels belonging to the background or foreground, it is preferably suitable for images where probability distributions of high and low intensity tend to be equal (as images shown in their experiment).
Aiming at eliminating the annoying intensity inhomogeneity and improving the accuracy of object recognition in computer vision systems under great adaptability and performance, this paper combines the characteristics of the second and third categories aforementioned and proposes an efficient LSM that introduces joint estimation to acquire precise object segmentation. The main contributions of our method are summarised as follows: • The joint estimation effectively compensates the disturbance of intensity inhomogeneity on objects; it directly utilises the intensity average and spatially varying average to linearly estimate the original image, so that both global and local information can make devotion to segmentation. • Our estimation descriptor has no requirement of approximation to intensity inhomogeneity degree which is difficult to be evaluated precisely, avoids the information loss of object intensity, and guarantees the accurate segmentation. • We introduce a weighting internal energy that serves as a regularising term rather than driving energy in [37] to control contour evolution. It reduces iteration time and makes the final contour locate the object boundary accurately. • As the parameters in image estimation can be obtained automatically, no manual estimation and intervention is needed, the proposed method has strong adaptability to computer vision applications.

Image estimation
The estimation problem of the image region is treated as linear [28] and nonlinear [33]. For linear estimation, Li et al. [28] defined image approximation as I 0 denotes the original image, n is the noise, and b(x) is a matrix of intensity inhomogeneity with the same dimension as the original image. Replace I 0 with local intensity level constant C i , the simplified format is defined by: where C i is the average mean of partitioned areas. A non-linear estimation model LATE [33] is proposed, which utilises Taylor expansion to estimate images, The average mean of the inner and outer regions is selected as the constant term of Taylor's formula, denoted by: As aforementioned, the general way is to denote image descriptor by the product of intensity inhomogeneity b and piecewise constants mean i (x) or C i performing an image estimation. These constants mean i (x) or C i can be presented easily and accurately, while the intensity inhomogeneity b is difficult to be accurately evaluated and needs to be defined manually. It is the imprecise definition of intensity inhomogeneity that causes a great adverse impact on image estimation, makes the LSMs poorly adaptive, and leads to information loss of the object intensity.

Weighting internal energy
The internal energy which controls the direction of contour evolution [38,37] generally includes area term and length term. The normal direction is derived from the gradient. Consequently, in [37], the method substitutes the weighting factor generated by gradient to traditional length term In [37], both weighting length term and weighting area term are used to control level set evolution. However, the weighting area term is exactly unnecessary, as there exists a relationship: Area(inside(C )) ≤ c(Length(C )) N ∕(N −1) . Hence, we apply only weighting length energy term to construct internal energy (details see Appendix I). Our weighting internal energy achieves desirable performance on improving contour smoothness and fitting degree.

Relevant LSMs
These relevant methods are briefly introduced and compared with our method in the experiment section. LBF [17] uses the convolution kernel to obtain the local features processing intensity inhomogeneity, given by Equation (5): where constants f i are derived from local details of the image, I 0 denotes the original image. Wang et al. proposed LGIF [24] model combining global and local information.
The model is a simple combination of CV model (the former two terms in Equation (6)) and LBF model (the last two terms of Equation (6)), so that the driving energy is originated from the original images with intensity inhomogeneity. LATE [33] uses local intensity mean and Taylor model to non-linearly estimate the original image, and its energy model is as shown in Equation (7).
A linear likelihood model LSACM [32] is introduced and the driving energy function is as Equation (8). The related limitation of existing estimation is analysed in Section 2.1.

Joint estimation of image region
Li et al. [28] ideally assumed the segmented region as a constant, which is impossible to be strictly a constant. However, according to the assumption, we can utilise constants to estimate image regions to decreasing the interference of the intensity inhomogeneity. Naturally, how to define the constants so that the raw information of the ideal image can be preserved becomes a significant problem deserving consideration.
Set Ω be image domain, the contour Ψ divides Ω into Ω 1 and Ω 2 , where Ω 1 and Ω 2 are inside and outside the contour, respectively. Let ∶ Ω → ℜ be Lipschitz function, these two disjoint regions Ω 1 and Ω 2 are defined as: The region descriptor is difficult to be denoted as intensity inhomogeneity may lead to an overlap of the intensity distribution. Now that each object in the ideal image tends to be a constant, the major task to estimate the image region is finding out a proper constant to depict the region. Our work combines local and global information of the image region to define the estimation, that is, finding two constants to represent local and global information of the image region. Two constants correspond to the spatially varying average f and the average mean C , respectively. f is used to conduct the local estimation which acquires spatially average of the image region, and the global estimation related to C mostly acts as a regularising term to guarantee the estimation accuracy of image regions.
We use Gaussian convolution to generate the constant f , the Gaussian kernel K is used to complete the convolution process. Due to the larger details applied to local estimation, it can result in an estimation blur. Hence, the global constant C essentially represents the regularising information. The definitions are given, respectively.
where I L (x) represents the local estimation, I G (x) is the global estimation and H ( ) denotes Heaviside function. The parameters f 1 , C 1 and f 2 , C 2 are constants that represent the features in Ω 1 and Ω 2 . Then, we combine I L (x) and I G (x) to define the joint estimation as I (x): where m is the weighting factor of both global and local estimation, and different segmentation effects can be obtained by adjusting the value of parameter m.
The optimal values C 1 , C 2 , f 1 and f 2 are denoted by the following equations and Appendix II explains the acquisition process.
where K is the Gaussian Kernel with standard deviation and the symbol * is the convolution operator.
The basic principle is using local estimation I L (x) and global estimation I G (x) for cooperated estimation. The global estimation mainly acts as a regularising term in the joint estimation. To control estimation weight, a factor m is used in the final estimation image I (x). The joint estimation effectively avoids the direct evaluation of intensity inhomogeneity. The information loss caused by eliminating intensity inhomogeneity is also indirectly compensated. Our joint estimation achieves a more prominent benefit than the traditional estimation of eliminating intensity inhomogeneity.

Construction of LSF
The driving energy F ( ) is generally made up of three terms and denoted by F ( ) = E ( ) + R( ) + P ( ). The external energy term E ( ) plays the most important role in this model. The image estimation model is also applied to the external energy term. For internal energy R( ), the length term with weighting factors is used to control the accurate extraction of the target boundary. Finally, a penalty term P ( ) is added to reduce the computational complexity of level set initialisation.
As for the external energy, we introduce the estimated image I (x) into the external energy, that is, obtain the difference between the estimated image and these constants link to the original image. This approach guarantees the information of the original image and estimated image synergises on overcoming intensity inhomogeneity. The external energy E ( ) is given by: (16) where 1 and 2 are constants controlling weights of information inside and outside level set contour.
is weight of different part in external energy. The constants C 1 and C 2 are denoted by Equations (12) and (13). Minimise the last two terms of the energy functional Equation (16), as various details briefly discussed in Appendix II, d 1 and d 2 are defined by Equations (17) and (18), respectively. The intensity feature of the image obtained by estimation guides the evolution to fit target boundary.
The weighting internal energy is used to promote the evolution speed and improves the target segmentation accuracy. It is written as follows: where k is a constant representing the number of contours adjacent to the current contour. Local edge information [39] used to construct weighting factor is derived from these adjacent contours. The specific form of the weighting factor [37] is defined as follows: where ( , k) denotes the average strength of edge operators adjacent to the current contour.
where ( , n) is the adjacent contour to the current contour Ψ and contains the gradient information, denoted by: where ⃗ N = ∇ ∕|∇ |, it helps to generate a new contour in the direction of zero level set . The value of constant n inside the current contour Ψ is negative and outside the contour is positive. The edge operator g is a classical formula as g = 1 1+|∇K * I 0 | 2 . The gradient vector flow (GVF) field is a static external force field, ⃗ V =[u(x, y), v(x, y)], u and v are the derivatives in two directions x and y, respectively. According to the alignment between the ordinary field of the current contour and the GVF field of the adjacent contour , the average builtin parameter is given as follows: where ⃗ V is the GVF field of the original image. If the ordinary field ⃗ N of the current contour coincides with the distribution ⃗ V , the value ( , k) approaches to 1.
Generally, the re-initialisation leads to heavy computational burden [40]; the energy penalty term P ( ) assigns |∇ | to approaching 1 rather than specifically initialising func-tion, which eliminates contour islands and decreases the reinitialisation complexity.
Combining the external energy, internal energy and penalty energy, we model the energy functional utilising global and local estimation as follows: where is the weight of internal energy.

Numerical calculation method
Appropriate numerical calculation is an essential step to solve complex LSFs. Upwind method, difference method [41] and Newtonian fast algorithm [42] are general ways. However, to the problems formulated as the optimisation of a cost function, the numerical optimisation typically relies on first-order and second-order information to guide the evolution [43]. We apply the special algorithm, which greatly reduces the expensive burden. C i , d i and f i are denoted as aforementioned minimising relative energy term. The evolution of the contour is given by an initial contour at t of the function (t , x, y). The optimisation can be defined by parameterising the descent direction with an artificial time t [44]. According to the chain rule [45], to minimise the energy functional Equation (25) with respect to function , 1 = 2 = 1, the gradient descent function can be derived, where ∇ is the gradient operator and div(⋅) is the divergence operator. The function H ( ) is the differential version of Heaviside function. The Dirac delta function ( ) is the derivative of H ( ) defined by Equation (28), Let the latter term of Equation (26) be F (x, y), In following process, the discretisation of is firstly required and iteration termination condition that comes to be stable must be satisfied. Therefore, we first discretise ∇ from x and y directions to avoid poorly implemented derivation operations and obtain a more suitable numerical calculation scheme, and the term |∇ | is denoted by a two-dimensional format |∇ |= √ 2 x + 2 y , accordingly. We define E = 1 |∇ | , and div( ∇ |∇ | ) = ( x E x + y E y ), the two-dimensional equation of the gradient descent function can be rewritten as, We use A l to define the relationship between the current contour n and the contour n+1 after evolution, and rewrite the contour evolution equation into a discrete format. According to the characteristics of the central difference method [46] and the additive operator splitting algorithm [29], we utilise the discrete relationship between n and n+1 to replace t , and central finite difference to define discrete relationships in the horizontal and vertical directions. The following difference steps can be obtained, Equation (31) is the discrete central-finite-difference form of Equation (30), where A l are tri-diagonal matrices for discretisation and given by Equations (32) and (33), where A 1 ( n ) represents the horizontal correlation and A 2 ( n ) is the vertical correlation. The solution n is the optimal segmentation.
By shifting the terms of Equation (31), the intuitive evolution can be obtained.
The relationship between the post-iteration item n and the previous item n+1 can be regularised by the following formula: ΔtF (x, y)). (35) Let the simplified expression bên = n + ΔtF (x, y), so that: The overall algorithm flow of this paper is presented in Algorithm 1.

Parameter setting
We summarise the selection of parameters. The relevant parameters for the common situation are listed in Table 1, which are general choices in different LSMs. is the energy factor controlling the weight of object and non-object regions energy. Normally, the inner and outer regions of the contour are equally important for segmentation, so is usually set to 1. Definitely, according to Table 1, it does not rule out that some methods (e.g. LSACM) need to adjust to obtain better results. is the parameter that controls the weight of internal energy. Generally, the internal energy is derived from the contour length. is usually 0.0001*255*255 or 0.0002*255*255 for most of the comparative methods, while LSACM needs to adjust the internal energy ratio to obtain better results.
is the external energy factor, it takes the value [0,1]. In the two-phase segmentation, if one external energy factor is , the other is supposed to be (1-). Thus, when we are more inclined to local information during the segmentation process, the value of that controls the local energy term is [0.5,1]. LBF, LSACM and LATE only have one type of external energy, so the value is not set, and their default is 1.
is the size of the Gaussian kernel. For images with dense details and high quality, we need more local information, the value of is small, usually less than 10. For images with intensity inhomogeneity and sparse distribution, the global information plays a vital role, the value of is usually greater than 10 (e.g. the value of LSACM and LATE, as they mainly deal with severe low-quality images).
m is an unique parameter of the proposed method, which controls the proportion of local estimation and global estimation. The adjustment of m is similar to . The larger m represents a greater proportion of local features in the energy functional. Different from the value of that will affect the accuracy of target segmentation, m only controls the proportion of the energy functional and does not affect the image feature collection.

Performance evaluation of parameters
We firstly analyse the effect of one parameter varying and the other fixed. In this subsection, the chosen image is a severe intensity inhomogeneous image with two mushrooms (Figure 1) where the background has a similar intensity distribution with the object.

Experimental analysis of
The parameter is a scale parameter of the Gaussian kernel which greatly influences the performance of estimation. In our method, affects the global estimation, the estimated image blur reduces as decreases. When we segment images with   [4][5], the final segmentation can identify the object in general but unable to fit the real target. ranges 3 to 5 achieves the satisfactory results, while = 3 is the most frequent choice. Since is not only a parameter that controls the proportion, it also has certain functionality that controls the convolution operation and affects the process of feature collection. The larger the value, the more unstable the target segmentation accuracy. As in low-quality images, the larger convolution kernel is utilised to reduce the impact of noise and intensity inhomogeneity, which makes the energy functional less enable to fit the object boundary. If the details inside the object are sparse, the value should be smaller, which can reduce the interference of the non-target to the target. When the object has more internal details, a larger value should be selected.

Experimental analysis of m
The parameter m is the other vital parameter for the proposed method, which adjusts the estimation weight of local information and global information, and the value of m takes in [0, 1]. A small value of m implies collecting image information in a small weight of local features and a large weight of average means. On the contrary, the larger values of m imply collecting local features in large weight and regularising details in a small weight. Specifically, when the target image has more details or interference, the m is set larger. As in Figure 1b Due to the existence of the m, the in our method can be set to a smaller value to avoid a large kernel resulting in the instability of segmentation accuracy. It is a vital reason why our joint- estimation-based method has high segmentation accuracy and strong robustness. Additionally, for images with complex targets, the value of m is set relatively larger to preserve local details and extract global targets.

Comparison with state-of-the-art LSMs
In this part, we demonstrate three test sets of the methods LBF [17], LGIF [24], LCV [25], LSACM [32], LATE [33] and the proposed method on the normal, natural and medical images. Typically, the first column of numbers in Tables 2-7 represent images in corresponding figures. To quantify performance comparisons, the segmentation accuracy is measured by the index O m and O t are the segmented and real target area, respec-

tively. A(O m ∩ O t ) is the intersection area of O m and O t , and
is the union area. JSC value equals to 1 represents the most ideal situation. Furthermore, the running time of each method are also compared. The best data in tables is emphasised in bold. We also utilise }}− ′′ to delete the running time link to segmentation whose JSC value is below 0.7, as they are meaningless to the analysis. Test Set 1: The performance on normal images. Firstly, in experiment 1, the six methods are applied to normal images, the segmentation results are shown in Figure 2. Tables 2 and 3 record the JSC value and running time corresponding to each  processing in Figure 2 on normal images. It can be seen that LGIF, LCV, LSACM, and our method have a similar effect. The basic contour can perfectly identify the target, and the boundary of targets is accurately segmented. LBF and LATE can also achieve favourable contour segmentation, except LBF may have redundant contour parts (Figure 2a row 4). Because of the strong edge of objects and contrast of the images, the results of these different methods show nearly similar performance. Under the situation, as LATE eliminates intensity inhomogeneity which takes heavy information loss of the intensity, it becomes relatively ineffective and may acquire missing segmentation (Figure 2e row 5). Quantitatively, in Table 2, it can be seen that the JSC values of the methods maintain above 0.98 on whole images. Table 3 represents the executive time of different methods on these images. LCV, LSACM and our method spend shorter running time relatively. Our method and LSACM are roughly the same on this index, while both segmentation accuracy and target contour smoothness are better than the LCV method. Worth emphasising, our method has the same segmentation performance as state-of-the-art methods on normal images while our executive time is apparently the shortest. Test Set 2: The performance on natural images. For the challenging case, experiment 2 applies the six methods to natural images, as shown in Figure 3. Tables 4 and 5 are the JSC value and running time concerning the image processing in Figure 3. The drawback of LBF, LGIF, and LCV models is that they consider only the statistical information makes them unable to fit real-world objects accurately and can only identify the main part of objects. Meanwhile, due to the inaccurate evaluation of intensity inhomogeneity, LSACM and LATE methods can complete the accurate segmentation but sometimes cause missing segmentation (Figure 3d row 5, Figure 3e row 5). Additionally, different from LSACM and LATE, our method has no necessity to evaluate intensity inhomogeneity degree, and consequently will not lead to missing segmentation. The  advantages of the weighting factor are vividly reflected in the intensity-inhomogeneous images, our final contours are with high smoothness and no sharp blurs. Due to the joint estimation, our method performs much better than other methods in boundary segmentation, obvious boundary and less isolated island area occur in the results (Tables 4-5 data in bold). Table 4 compares the segmentation accuracy of different methods. As shown in Table 4, for intensity inhomogeneity in natural images, LBF, LCV, and LGIF will cause local redundant segmentation which means the wrong segmentation inside tar-gets. It should be noted that our model is superior to other methods in JSC, and achieves the shortest executive time for the majority of experiments. In Table 5, our method spends a shorter running time relatively than comparative methods. The experimental results show that the traditional method is not suitable for intensity inhomogeneous images, and will result in missing segmentation. Our method performs more outstanding than other methods from segmentation results to contour smoothness. The short running time and fast convergence speed are also revealed.
Test Set 3: The performance on medical images. For another challenging case, experiment 3 illustrates the six kinds of comparative results on medical images, as shown in Figure 4. Tables 6 and 7 represent the JSC value and running time of images in Figure 4, respectively. The LBF, LGIF, and LCV methods have poor robustness and trouble to stabilise in the contour evolution process, while LSACM and LATE relatively perform better. However, as the object regions are all treated with same intensity inhomogeneity, LATE fits the target better than LSACM, but it may cause missing segmentation (as Figure 4e rows 2 and 5). LSACM (as Figure 4d rows 5 and 8) is slightly better than LATE in missing segmentation, while the fitting degree of objects in LSACM is worse. The proposed method is capable to extract the edge of objects and owns the best fitting degree. We demonstrate the ability of our method to  handle the weak boundaries (see Figure 4f); as the results show, it achieves satisfactory performance on detecting the accurate boundaries and fit the desirable objects.
The JSC values in Table 6 match the results in Figure 4, especially when the targets in the image are complicated. LBF, LGIF, and LCV perform poorly as they utilise only the average mean or pixel intensity. In Table 7, our running time is also the shortest. With the comprehensive consideration of JSC value, under high segmentation accuracy, the final segmentation contour of our method has a larger fitting degree to the target and better

Comparison with deep learning method
We compare the most classical deep-learning-based segmentation models, such as MaskRCNN, DeepLab, PSPNet, U-2-net with our method. As the method of transfer learning is a general way to solve the problem of missing data or small data sets and the test images in this experiment are mainly derived from the PASCAL VOC data set, the corresponding pre-trained networks on the PASCAL VOC data set are selected. The deep learning method is applied to the GPU and our method runs on the CPU, consequently, we ignore the meaningless comparison of running time. Figure 5 shows the results of the four methods and the proposed method. Table 8 is the JSC value corresponding to the segmentation result. It can be seen from these results that most deep learning methods are poorly adaptable. Since the features obtained by the deep learning method on the training set are stable, the closer the feature of the test image is to the training set, the better the segmentation results obtained. Deep learning methods have unsatisfactory segmentation accuracy for small targets and images with large differences between target features and training set. Secondly, the measurement standard of deep learning methods in large data sets is mainly for the total accuracy rate of all test images, but this does not mean that an excellent segmentation effect can be obtained for each image, as the segmentation results of images (7), (8), (9) and (12) shown in Figure 5a, and corresponding images (1), (4), (7), (12), (14)  and (15) in Figure 5b,c. The unsatisfactory segmentation effect of the former three methods mainly because of the mismatch between the test images and training images. Among these deeplearning-based segmentation models, U-2-Net, as a method for salient objects, achieves the best performance. However, due to the total concern of the object saliency, it may cause a small bug. For example, the image (13) in Figure 5d, it ignores the goose itself to extract the most prominent eye of the goose. As Table 8 shows, in most cases, compared to these deep learning models, our method has the best segmentation accuracy. In other cases, our method can also approach the considerable performance. It is the proposed method that can provide high segmentation accuracy and adorable adaptability.

Robustness of contour initialisation
To prove the robustness over the initial contour, we test the varying positions and shapes, and the satisfactory experimental results are obtained.   Figure A.2(b)) of final contours. The JSC values slightly change and approximately approach to 0.990. Our method shows the expected robustness of various types of contour shapes. The experimental results prove that our method is extremely robust to the initial contour, and the difference of the initial contour will not affect the segmentation performance.

CONCLUSION
We have proposed a LSM that utilises joint estimation to ensure the segmentation accuracy of intensity inhomogeneous images. The local statistical intensity information and the global intensity average are jointly incorporated to estimate the original image. We introduce a weighting internal term containing gradient information as a regularising term. A specific additive operator splitting algorithm is designed for calculation. Although our method is highly adaptable and achieves substantial performance, it remains limitations in the following aspects. On one hand, the proposed method considers the balance between global features and local features, which may lead to a reduction in the accuracy of tiny object segmentation. On the other hand, due to the existence of edge-based weighting factor in the energy functional and the discretisation of the derivative in the numerical calculation process, it will inevitably increase the computing cost. In future research, we will focus on a more efficient way to extract the features of tiny objects.

APPENDIX A
Denote area term Area(inside( )) as x , length term Length( ) as y, is the level set contour and the energy term as z. In LSM, the task is to minimise energy term made up by area term x and length term y, as given by: min z = x + y. According to the relationship between length term and area term in [44] [47]: Area(inside( )) ≤ c(Length( )) N ∕(N −1) , (A.1) where N denotes the arbitrary dimension of the level set contour. That is, x ≤ c ⋅ y N N −1 , substituting the inequality into formula we can get a new expression: As formula c ⋅ y N N −1 + y is monotonically increasing, to find the minimisation of z is to solve corresponding y. Thus, it is enough to solely use regularising length term which can achieve the same benefit as both area and length term.

APPENDIX B
Denote N i as constant (i.e. C i , f i , d i ) to be calculated, the optimal constant can be chosen to minimise the energy functional. Get the partial derivative of energy functional with respect to N i and set it to 0, as: mal N i can be obtained. As the partial derivatives terms independent of N i are equal to 0, we consider only related energy functional terms linked to N i ; the optimal C i in Equation (16) is to minimise the equivalent of CV model [44]; the optimal f i in Equation (16) is to minimise the equivalent of LBF model as Equation (5); the optimal d i is to minimise the last two terms of