Object scale selection of hierarchical image segmentation with deep seeds

Hierarchical image segmentation is a prevalent technique in the literature for improving segmentation quality, where the segmentation result needs to be searched at different scales of the hierarchy to identify objects represented from various scales. In this paper, a novel framework for improving the quality of object segmentation is presented. To this end, the authors ﬁrst select the optimal segments among several hierarchical scales of the input image using simple mid-level features and dynamic programming. Simultaneously, deep seeds are localised on the input image for the foreground and background classes using a deep classiﬁcation network and a saliency network, respectively. Then, a graphical model is constructed as a set of nodes that jointly propagate information from deep seeds to unmarked regions to obtain the ﬁnal object segmentation. Comprehensive experiments are performed on different datasets for popular hierarchical image segmentation algorithms. The experimental results show that the proposed framework can signiﬁcantly improve the quality of object segmentation at low computational costs and without training any segmentation network.


INTRODUCTION
Image segmentation is an important technique in computer vision, which divides an image into several segments on the basis of specific properties, such as uniform colors or similar textures. Comprehensive segmentation algorithms have been proposed in the literature, however an effective division of an object into meaningful segments to visualise human perception remains an open challenge [1][2][3][4][5]. Different definitions of the term "meaningful" have been reported, which leads to different approaches to segment an image properly. Some researchers prefer segmenting an image into several segments, others tend to identify only a few segments of image contents [4,6]. In the latter scenario, a single segmentation result can only be generated by an algorithm, which may be insufficient in solving the image segmentation problem.
To this end, researchers [7][8][9][10][11] have investigated the partitioning of an image into a single multiscale structure that aims to capture objects at all scales. As a dual representation, the Ultrametric Contour Map (UCM) was used [8][9][10][11] to explain In recent studies [13][14][15], deep image classification networks were trained just from image-level labels and were successfully employed to retrieved seeds on object localisation. These seeds can be very useful to guide graphical model to propagate information from deep seeds to unmarked regions to obtain the final object segmentation. In this paper, we use a classification network to get activation maps , then we combine the activation maps with saliency map [16] to produce deep seeds.
Motivated by the hierarchical segmentation methods and deep image classification networks, we introduce a novel framework to improve the quality of object segmentation by combining three major steps. First, the input image is passed through a hierarchical segmentation algorithm to predict the boundary map from a hierarchical segmentation tree. Then, dynamic programming is performed to search for an optimal tree cut based on the segmentation qualities. Simultaneously, the input image is passed through a deep classification network and a saliency network [16] to generate deep deeds. Lastly, a graphical model based on spatial constraints, appearance, and semantic content is proposed to spread information from seeds to unmarked regions. We name the proposed method as "Object Scale Selection with Deep Seeds (OSS-DS)" for object segmentation.
The major contributions of this work are as follows: First, a novel method is provided for processing the hierarchy of segmentation by selecting regional thresholds. This method does not rely on any ground truth, thus can be used in online processing with a small pool of simple features. Second, a graphical model is designed and the corresponding loss function is solved to propagate information to unmarked regions from deep seeds and obtain the final object segmentation. Third, comprehensive experiments on several state-of-the-art hierarchical segmentation algorithms are performed to validate the proposed method. The remaining sections of this paper are organised as follows: Section 2 provides a review of related work. Section 3 presents the proposed method. Section 4 describes the experiments performed on open datasets by using state-of-the-art algorithms for comparison. Section 5 concludes the study.

Hierarchical segmentation
One of the drawbacks of unsupervised segmentation methods is that parameters frequently need to be adjusted to control the number of segmentation regions. In practice, for all objects and several images, no single threshold or parameter fits well [17]. Instead of working on a single scale, a more flexible solution is to use the intermediate results of the merging process to build a partition hierarchy that is often described as a tree structure. Hierarchical segmentation methods [8][9][10][11] address this issue through output rather than multiple segmentations nesting in a tree, such that objects will be properly segmented at certain tree levels. These algorithms can produce a hierarchy of regions grouped into a sequence of granularity with pixels. Meanwhile, this region's spatial relationship is preserved.
Segmentation quality was predicted using a selected classification scheme in various studies. Ren et al. [18] used a linear classifier based on Gestalt features [19] to distinguish good and bad segmentations. Their negative training data were generated through the random placement of a ground truth mask over an image. Peng et al. [20] adopted a similar idea for selecting parameters based on interactive segmentation by choosing in the graph cut. They selected the highest-quality segmentation by computing segmentation using different values. Instead of training a classifier to use a large pool of cues similar to that in [12], we use only three middle-scale cues to evaluate segmentation quality. As a post-processing step, this method is used to evaluate segmentation quality at different scales and is independent of the segmentation algorithm. Without imposing specific information on the segmented targets, the proposed method can be applied to any segmentation task for general purposes.
In a hierarchical structure, the segmentation results of most region-merging algorithms are not explicitly represented. The selection of segmentation and scale is performed simultaneously on the basis of certain regional cues; therefore, integrating object information globally across different scales is a challenging task. Finding the right scales for several objects is important. In the method in [12], all objects are aligned at the same scale/level. The proposed method utilises hierarchical segmentation trees to find the correct object regions. Rather than realigning object scales in [12], by searching for an optimal tree cut, we find the best regions of the object based on their segmentation qualities.

Localisation seed with classification network
The most popular seed pixel selection method was based on a simple handcrafted criterion [21] (e.g. color, intensity, or texture). Furthermore, with regard to handcrafted features, a similarity criterion [22] is represented. Results of these settings are over-segmentation and bad segmentation. However, Kolesnikov et al. [13] proposed a method for locating seed cues to train segmentation networks in accordance with classification networks. Moreover, [13] obtained only small and sparse object seeds for supervision. To address this issue, a saliency model was proposed by Oh et al. [23] to utilise the extent of an object as additional information. Wei et al. [24] iteratively trained multiple classification networks to expand discriminatory regions as an adversarial erasing method. In our present work, we use an idea similar to that of [13] to generate seed cues for foreground classes and adopt the saliency network [16] to localise background classes. With the development of deep learning on computer vision, weakly supervised semantic segmentation methods [13][14][15]25] achieves a remarkable progress. Unlike weakly supervised semantic segmentation approaches, the proposed approach does not train any segmentation network; it only uses deep classification network to set up deep seeds on the input images and adopt unsupervised process to obtain the final segmentation. Overview of the proposed OSS-DS method. We first train a deep classification network to produce class-specific activation maps (for person and bottle labels) and use a saliency network to produce saliency map (for background). With these maps, we obtain small and sparse seeds, which will guide a graphical model over optimal tree cut of high-quality region hierarchies to obtain the final segmentation result

PROPOSED METHOD
This section presents a technical description of the proposed method. The objective of our work is to partition an image, wherein each region describes either an object or a background, into meaningful non-overlapping regions. Figure 1 illustrates the overview of the proposed method. First, we introduce how we generated a boundary map from a hierarchical image segmentation algorithm and search for an optimal hierarchy segment based on its segmentation qualities. Then, we demonstrate how we obtained seed cues from a deep classification network and a saliency network [16]. Lastly, a graphical model is proposed to propagate the information from deep seeds to unmarked regions.

Object segmentation and hierarchical segmentation tree
We indicate the input image as I . The goal is to divide image I into non-overlapping regions I = -r 1 , r 2 , r 3 , … , r n˝s uch that each region describes an object or a part of an object. The partitions produced by machine algorithms can be grouped into three categories based on perceptual quality. Oversegmentation contains many small pieces in the object segment, and region boundaries are typically not smooth. In contrast, undersegmentation places different objects in the same region, leading to the oversmoothness of boundaries. An appropriate segmentation is in between, wherein objects are well-segmented in exclusive regions.
Hierarchical segmentation describes an image at multiscale in which identical pixels are grouped into hierarchies of regions. Tree nodes are constructed to present meaningful data that appear in the image. In the present work, a tree structure is constructed wherein each region is represented as a tree node and the edges are represented the inclusion among regions (see Figure 2). This tree can be built using two methods: bottom-up or top-down. In the former, which is adopted in this work, to construct the hierarchy, region-merging algorithms are used. Using an initial set of regions that conforms to the finest possible partition, the algorithms typically start to construct the hierarchy. Adjacent regions are then iteratively combined, a new node represents the output regions on the graph as parent of the merged regions. In the latter, the algorithm starts with the largest possible partition, which divides iteratively until the convergence selection criteria is achieved.
We denote a hierarchical segmentation tree as T , which is composed of n nodes obtained from the image. A single region of the image is represented by a tree node, and the root of the tree is represented by the entire image. Furthermore, each node has at least one child (excluding the leaves) and only one parent (excluding the root). The segmentation tree also exhibits an interesting property, that is, a flat image segmentation l k can be generated by using the threshold as the tree cut. A segmentation tree and different cuts of the tree are shown in Figure 3. In the subsequent sections, we demonstrate how the quality of each region is obtained and how these qualities are used with dynamic programming to find an optimal tree cut.

Unsupervised segmentation quality measure
In this study, we first evaluate segmentation quality in different scales as basis for searching optimal tree cut. Studies [12,[26][27][28] reveal that Gestalt rules are of great significance in describing the segments. Therefore, image features which are represented by color histograms, texture histograms and the geometry size of regions are chosen. Specifically, color histograms are calculated for each channel in the CIE Lab color space. For texture histograms, the image is convolved with a bank of 38 filters [29], "which consists of an edge and a bar filters at six orientations and three scales, a Gaussian and a Laplacian of Gaussian filters". Based on these features, the following Gestalt principles are applied to evaluate a region's segmentation quality. Instead of computing a large pool of features for training a classifier [12], we chose to have only three features for evaluation.

Inter-region similarity
The distinct feature distribution in adjacent regions can also indicate a good segmentation quality. In order to measure this feature, we used 2 statistics. H L x and H L y are the histogram bins of the region color for regions x and its neighbour y, respectively. The inter-region color similarity of region x is defined as the sum of distances from three channels to all its neighbours: and the texture similarity is defined as: where H T x and H T y are the texture histogram bins for regions x and y, respectively. And lab and t are the coefficients. In our experiment, we set lab = 5 and t = 10. A small value of finter lab and finter texture will represent a high quality of segmentation.

Geometry size of the region
The large size of the object region or small number of segments may lead to a under-segmentation or an over-segmentation. In order to punish these cases, we apply the following measure for a region in the segmentation: where N s 0 is the number of regions in the lowest-scale of hierarchy, which contains the most regions, and is used as a baseline. N s h is the number of regions in the present hierarchy. R is the image size, and R x is the region area. When there is an undersegmentation or an over-segmentation, the feature value would be large. For the segmentation, t+he quality of each region is calculated as the sum of values of all the three features. That is: From the above equation, we can observe that a small value of Q(x) indicates a good segmentation quality for region x. ALGORITHM 1 Dynamic programming in a tree (The forward pass)

Optimal tree cut using dynamic programming
Segmentations with countless regions will not be meaningful; therefore, we discard the first few lower scales and start from the one with a medium number of regions. In existing datasets, segmentation ground truths contain less than 30 regions for an image [30]. This situation suggests that in a number of regions, a meaningful segmentation for real applications should approximately be divided into 30 regions. In our experiments, we select candidate scales from near the middle of the scales in the hierarchy to make the number of regions in H 1 (anchor) not much more than 30. To further refine segmentation quality, a smaller interval is set to select candidate scales around the anchor H 1 .
For each region in the candidate scales, the quality of a region Q(x) and its sub-tree Q( x ) is computed using equation (4). The optimal cut can be identified using dynamic programming in a tree structure as follows. One forward and one backward passes are included in the optimization process. From the bottom to the top of the tree, we calculate the quality at each node x and its sub-tree x c as the forward pass. The pseudo-code of Algorithm (1) is summarised in this procedure. In the backward pass, the qualities of the current node and its sub-tree are compared from the top to the bottom of the tree to determine the optimal cut. We start from the roots and move up to the leaves, level by level and node by node, to check the quality of the current node and compare it with its children. If the quality of the current node is better than its children, then we choose this node as the node of optimal path; otherwise, we find the child with the best quality (see Algorithm (2)). The complexity of this highly efficient method is O(n), where n is the total number of nodes.

Deep seeds generation
We use the deep seeds generated by leveraging a deep convolution neural network, which is trained to solve the image clas-ALGORITHM 2 Dynamic programming in a tree (the backward pass) sification task [13][14][15]. We use a deep classification network to set up deep seeds from discriminative regions under the supervision of the image level. However, semantic object position information cannot be provided directly from image-level labels. We utilise two methods to locate background and foreground classes (see Figure 1). We adopt the CAMs [31] approach in the proposed method to locate the classes in the foreground.The standard VGG-19 network is used as the underlying classification network, which is initialised from the publicly available model [32]. In our experiments, we find pre-trained VGG-19 model can generate more accurate seeds compared to seeds generated using VGG-16 model. In order to make the methodology presented in [31] applicable, the architecture of VGG-19 is slightly modified. In particular, we implement the following changes in VGG-19. To initialise our classification network, we use a modified VGG-19 network. According to [32], fully connected layers destroy valuable localisation information. Therefore, we discard that section in favour of adaptive average pooling with LogSoftmax; the generated tensor is used to represent the image after it is classified by fully connected layer. Lastly, the heatmap for each object class is generated by applying fully connected classifier. Then, a hard threshold is applied to the heat map to obtain the discriminative object regions. We threshold the corresponding heat map to produce localisation cues for each foreground class by 20% of its maximum value as was recommended in [31].
To localise the background, we rely on an alternative technique from [33]. Several computer vision studies have recently concentrated on detecting the most insightful and attentiongrabbing regions in a scene (i.e. salient objects). These proposed salient object detection methods [16,34] also evolve to target on uniformly highlighting pixel-accurate saliency values. We utilise the saliency network (Basnet [16]) and normalised saliency maps to select the seeds of the background as regions with low saliency values. We adopt the normalised saliency value 0.01 as the threshold to obtain background localization cues (i.e. pixels whose saliency values are smaller than 0.01 are considered as background). Figure 1 illustrates how the deep seeds are stacked together for the foreground and background in a singlechannel.

The proposed graphical model
We use a graphical model to spread data to unknown regions from seeds. We build a graph on the regions of proper segment image, which we get from the optimal tree cut section. A region represents in the graph as a vertex, and the similarity between two regions represents in the graph as an edge (see Figure 1). We denote a proper segment image as I , and the set -r k˝a s its set of non-overlapping regions, which is satisfying ∪ k (r k ) = I and r k ∩ r m = , ∀ k,m . The seeds of an input image are S = -s i , l i˝w here s i is the pixels of seeds in category i and 0 ≤ l i ≤ L is the seed' s category label (supposing there are L categories and l i = 0 for background). For a region r k , we are looking for a category label 0 ≤ y k ≤ L. Therefore, we use a graph-cut optimisation framework [35] to find the final label, which minimises the following energy: where seeds k represents a unary term including the region r k based on deep seeds, and k,m represents a pairwise term between two regions r k and r m . We define the unary term as follows: The first condition in this equation means that when a region r k overlaps with a seed s i , the cost is zero when this region assigned to the label l i . The second condition is to assign any seed label on this image with the same probability if a region r k does not overlap with any seed. Where |-l i˝| represents the number of seed labels appear on this image. To reduce false positive predictions, this exclusive information is useful. The pairwise term k,m in this model represents the similarity between two regions. We adopt a pairwise term to neighbouring regions and considered a simple appearance of similarities for neighbouring regions. Then, we build the histograms of color and texture for region r k as follows. The color histogram h lab (r k ) on region r k is built on CIE Lab color space, the color histogram is evenly divided into 30 bins. The texture histogram h t (r k ), a bank of 38 filters [29] are convolved the image including Gaussian and Laplacian of Gaussian filters, edges and bar filters with three scales and six orientations. In color/texture histograms, all bins are concatenated and normalised. This term is described as follows: k,m (y k , y m |I ) where [.] is 1 if the condition is true and 0 otherwise. We set lab and t as 10 and 15, respectively. This definition means that the neighbouring regions belonging to different labels will have higher costs if their appearance is closer, and, D k,m is described as the Euclidean distance of SURF [36], and SIF [37] distance of two regions r k and r m : where is used as a weight factor for distance adjustment and satisfies 1 + 2 = 1.
The optimisation of the labeling problem in equation (5) is NP-hard. It can be solved with the algorithm of expansion and swap moves [35], where the minimum cut is calculated for a defined graphical model. The nodes in the graph are defined as regions, which are connected by n-links to their neighbours.

EXPERIMENTS
In this section, we evaluate the proposed method on the output of hierarchical segmentation methods and analyse its effectiveness in object segmentation. The goal is to prove that the proposed method can improve object segmentation and reflect improved vision tasks at high levels.

Dataset and evaluation
We benchmark the performance of the proposed method on the following three annotated datasets: the MSRC 21-class dataset [41]: 591 natural images with 23 categories of objects. PASCAL VOC 2012 dataset [42] contains 20 foreground object categories and 1 background category. For the segmentation task, it contains (1464) training, (1449) validation and (1456) test images. Following common practice, we use the augmentation data [43] which contains 10,582 images as training set. MS COCO dataset [44] includes 80k training images with image-level labels and 40k validation images with 81 categories. The quantitative evaluation of segmentation on these datasets is performed through using standard metrics: Segmentation covering (SC) [9]: "It is a metric of the similarity between segmentations of ground truth and proposed. It gives lower value to proposed segmentations with lower similarity and gives higher value to proposed segmentations with high similarity." It has the following definition: where N represents the total number of pixels in the image. S gt and S prop represent ground truth and proposed segmentations, respectively. [9]: "It is a metric of the relative entropy between proposed segmentations and ground truth segmentations. It assigns small value to a higher similarity between proposed segmentations and ground truth segmentations". It has the following definition:

Variation of information (VI)
where H (S prop |S gt ) and H (S gt |S prop ) are conditional image entropies. Probabilistic Rand index (PRI) [9]: "It is a metric of probability that a pair of pixels are consistently grouped between segmentations of ground truth and proposed. Probabilistic Rand Index gives higher value when segmentations of ground truth and proposed are closer." It has the following definition: where l km and p km are the event that pixels k and m have the same label and its probability. Jaccard similarity coefficient [45]: "The Jaccard index in the context of object segmentation is often referred to as Intersection over Union (IoU) between the machine and the groundtruth results": where P prop and P gt refer to positive pixels for proposed segmentations and ground truth, respectively. For a review of this measure, see [45]. In this experiment, we set Jaccard_Index > 0.5 for better vision. All at Optimal Dataset Scale (ODS). We also use mean intersection over union (mIoU).

Segmentation techniques
Five popular hierarchical segmentation algorithms are selected to produce UCMs to validate the proposed method, we chose them due to their publicly available code and the-state-of-the-art performance: 1. gPb-UCM [9]: A commonly used method of hierarchical segmentation. This method produced different scales of segmentation by applying different thresholds to the UCM. 2. MCG [10]: A framework called the Multi-scale Combinatorial Grouping MCG that uses multi-scale information from UCM and fast-normalised cuts technique to get region segmentation. 3. SCG [10]: A single resolution version of the MCG. It has competitive results and is faster than the MCG. 4. PMI [11]: An algorithm for unsupervised boundary detection named Pointwise Mutual Information (PMI), which uses different local information embedded in a matrix of

Results
In general, hierarchical segmentation algorithms do not directly provide high-level segmentation results for object detection. However, the extraction of object regions from hierarchical segmentation results is possible through several post-processing steps. The aim of segmentation is not for a specific application, but providing well grouped pixels for further analysis. The hierarchical segmentation is originally designed for this purpose. In this section, experiments are performed to validate the improvement produced by the proposed algorithm. For comparison, the ODS is applied to choose the optimal scale of segmentation in UCM maps. Tables 1 and 2 show the evaluation results of different hierarchy methods on MSRC and Pascal VOC 2012 datasets, respectively. These tables show segmentation results with and without applying the proposed method as post-processing step for each hierarchy methods; also we compare our results to other hierarchical segmentation algorithms results, which provided their results for these datasets. The improvements achieved by the proposed method over SC and VI are obvious across different segmentation methods. For PRI, it is known to suffer from a small range [9], whereas the values of different methods are close in the test. For COCO dataset, only MCG, SCG and COB provided their results on COCO dataset; so Table 3 shows the evaluation results of these hierarchy methods. In particular, the segmentation results of  COB outperform the other algorithms, and PMI receives the largest improvement. Also, to validate the quality of the object segmentation generated by the proposed method, the Jaccard index is applied to evaluate region quality, which is defined as the intersection size of two-pixel sets over their union size [45]. Figure 5 shows the object quality of the proposed method (in solid line) and the corresponding segmentation (in dashed line) used in these datasets. The Jaccard index values are arranged in ascending order and we set J > 0.5 for better vision. Therefore, when a method has a wide range on the x-axis, that means most images have Jaccard Index values more than 0.5, and when these values are high on the y-axis, this means a method gives the best results. The proposed method based on COB outperforms the other methods used in these datasets. Overall, we demonstrate that the proposed algorithm can produce a consistent improvement on the segmentation quality. Additionally, Figure 4 shows that the segmentation quality of optimal cut of UCMs is better than the superpixel generating methods: SLIC [57], Graph-Seg [40]. Also, it is better than the segmentation of using only single scale of UCMs. Afterward, optimal cut with deep seeds exhibits high-precision segmentation performance. Moreover, we compare the results by applying these post-processing steps to superpixels [40]. Figure 6 shows effectiveness of using optimal cut of high-quality region hierarchies to produce high-precision segmentation performance compared to using superpixels [40].
In the following sections, we will compare our approach to other approaches on the three datasets. Note that most weakly supervised approaches give their results only on PASCAL 2012 VOC dataset and MSRC dataset used by superpixels approaches, so we compare our approach to both categories. Also, few approaches give their results on COCO dataset. Therefore, we cannot compare the three datasets on the same approaches.

Comparison on MSRC-21 dataset
We first provide several examples of segmentation on MSRC-21 to illustrate visually the performance of the proposed method. Some of the results of the proposed method on the MSRC-21 dataset are presented in Figure 7. The proposed method produces better segmentation results based on these segmentation examples. Comparing the performance of each algorithm on MSRC-21 intuitively, the mean intersection over union (mIoU) and pixel accuracy of all the image labels are provided in Table 4 for the proposed method (OSS-DS based on COB) and other algorithms for comparison. Table 4 indicates that the proposed method outperforms the other methods. Where, each column represents different algorithm accuracies for each semantic class, and the average accuracy of all the classes is provided in the last column of the table. The best segmentation performance in the tables is written in bold font.

Comparison on PASCAL 2012 VOC dataset
We first present several segmentation examples on the VOC 2012 dataset to visually demonstrate the performance of the proposed method. Some of the results of the proposed method on the VOC 2012 dataset are provided in Figure 8. The segmentation examples in this figure indicate that the proposed method achieves better segmentation results. We also compare the performance of the proposed method with weakly supervised semantic segmentation methods on VOC 2012. Unlike these methods, the proposed method does not train any segmentation network. These approaches take more computational time and require special equipment as a high-quality GPU. Table 5 provides the (mIoU) and pixel accuracy of all the image labels for the proposed method (OSS-DS based on COB) and the other compared methods. According to Table 5, the proposed method achieves comparable results to weakly supervised semantic segmentation methods.

Comparison on COCO dataset
Next, we perform a set of experiments on COCO dataset to demonstrate the generality of the proposed method. In con-  trast with PASCAL VOC, most of the COCO samples are collected from non-iconic images in a complex natural context. We provide the per-class IoU of BFBP [50], SEC [13], DSRG [14], and the proposed method OSS-DS in Table 6. The proposed method achieves comparable results in large-size cate-gories, such as outdoor, animal, and vehicle. However, in smallsize categories, such as indoor and kitchenware, this method performs poorly because we cannot place deep seeds precisely in small-size categories, such that seeds from large-size categories spread their information to small-size categories.

FIGURE 6
Effectiveness of using the optimal cut with deep seeds. From left to right: original image, ground truth, optimal cut, optimal cut with deep seeds, superpixel [40] and superpixel with deep seeds

Comparison on lung dataset
Lastly, we perform a set of experiments on lung dataset to verify the performance of the proposed method in medical field. Segmentation is an essential step during the diagnostic, and treatment stages. Having an accurate segmentation algorithm can make a big difference in patients' life [58,59]. Lung CT image segmentation is a necessary initial step for lung image analysis, it is a prerequisite step to provide an accurate lung CT image analysis such as lung cancer detection. Early detection of lung cancer could reduce the mortality rate and increase the patient's survival rate when the treatment is more likely curative. However, designing an effective lung segmentation method is a challeng-ing problem, especially for abnormal lung parenchyma tissue, where the nodules and blood vessels need to be segmented with the lung parenchyma. Moreover, the lung parenchyma needs to be separated from the bronchus regions that are often confused with the lung tissue. A lung segmentation dataset is a collection of 2D and 3D images with manually segmented lungs at the Kaggle Data [60]. We use 70% of the data as the train set and the remaining 30% as the test set. The size of each image is 512 × 512. For this dataset, we modified the U-Net [61] to perform image classification by replacing the last 1 × 1 convolution with a global average pooling layer followed by a fully connected layer that outputs a single number ∈ (0, 1). The global average pooling layer

FIGURE 9
Segmentation results on lung dataset. From left to right: original image, ground truth, UCM map, optimal cut and OSS-DS results es the mean value across all spatial dimensions of the input and is used to recover a class activation map. To get deep seeds from a networkg trained for image classification, we used class activation maps (CAMs) following the work of Zhou et al. [62]. Then, a hard threshold is applied to the heat map to obtain the discriminative object regions. We threshold the corresponding heat map to produce localisation cues for each foreground class by 20% of its maximum value. Lastly, the deep seeds guided a graphical model constructed over the optimal segments to propagate information from deep seeds to unmarked regions to obtain the final object segmentation. Figure 9 shows some segmentation outputs of the proposed method for lung dataset. For comparison, the ODS is applied to choose the optimal scale of segmentation in UCM maps. Table 7 shows the evaluation results of different hierarchy methods on lung dataset. It shows our results based on COB outperform the other algorithms,  and PMI receives the largest improvement. Also, Table 8 shows the quantitative results of using the proposed method as postprocessing compared with only use U-Net to obtain final results. It is clear that the proposed method gives more accurate segmentation, which can help in cancer detection and treatment.
At the end of these experiments, we found that the quality of segmentation generated by hierarchical segmentation algo- rithms depends largely on the UCM result. The quality of the final object segmentation is shown to improve after applying certain post-processing steps to UCM. We implement the algorithm using Matlab r2018b version and C++ with mexfile in a Windows system. On Intel Core i5, 3 GHz CPU with 8 GB memory. For each image, the proposed method takes about 3 s in total.

CONCLUSION
In this work, we addressed the problem of object segmentation by using the output of hierarchical image segmentation algorithms with deep seeds as a post-processing method. To achieve high-quality segmentation, simple mid-level features were used to describe the quality of an object region. In addition, an optimal segment from several hierarchical scales of the input image was selected using dynamic programming. Then, deep seeds were located in the input image using a deep classification network and saliency network. Lastly, graph labelling was resolved via expansion and swap move algorithms to propagate information from seeds to unmarked regions. We performed experiments on four segmentation datasets. The results showed that the proposed method achieves higher-precision image segmentation results compared with state-of-the-art algorithms. For our future work, we will explore more features to describe the quality of regions. Furthermore, we will focus on developing more effective strategies to improve the quality of seeds, particularly for small-size categories.