Double-weighted patch-based label fusion for MR brain image segmentation

In recent decades, a large number of label fusion methods have been introduced and applied to magnetic resonance brain image segmentation. This paper proposes a double-weighted label fusion method to improve the segmentation accuracy of magnetic resonance brain tissues. In weighted label fusion methods, different weights are utilised for different neighbouring pixels around the central test pixel. The weight of one speciﬁc neighbouring pixel is determined by the structural similarity between the patch of the atlas that centred in the neighbouring pixel and the patch of the image to be segmented that centred in the central test pixel, which is referred to as a patch-based label fusion scheme. This paper presents a double-weighted label fusion method to improve the segmentation accuracy of magnetic resonance brain tissues. The proposed label fusion method adds another new type of weights to the neighbouring pixels around the central test pixel and this new type of weights are calculated based on the atlas information itself. Segmentation experiments of different brain tissues show that our method can improve the segmentation performance.


INTRODUCTION
Magnetic resonance (MR) imaging acts a crucial role in pathology detection, the study of brain organisation, and human brain mapping [1]. This is due to it being non-invasive, its good spatial resolution and fast acquisition. Every day, a vast amount of medical image data is produced. The development of accurate, robust, and reliable segmentation techniques for the automatic extraction of anatomical structures is becoming an important challenge in quantitative MR analysis. Many methods for automated brain structure segmentation have been introduced in the past few decades. Among these methods, multi-atlas-based method has been proved to be a very effective segmentation method. An atlas consists of two parts, one is the MR brain image and the other is the label image commonly used label fusion methods are the majority voting (MV) and improved MV: weighted majority voting (WMV) [6]. The MV and WMV methods are based on global weight, that is to say, each label of one atlas has the same weight in the label fusion process. Global weighted label fusion does not take into account the local characteristics of different positions of the image, so the local weighted label fusion methods are proposed. Different labels of one atlas have different weights in local weighted label fusion methods, and it has shown that the local weighted label fusion methods outperform the global one in practice [7].
In addition to simple weighting label fusion methods, we can classify multi-atlas label fusion methods into the following two categories: (1) classification-based label fusion methods and (2) patch-based label fusion methods. Classification-based label fusion methods consider that each label of the atlas is a labelled sample, and trains these labelled samples using machine learning methods to obtain a classifier. Finally, the trained classifier is used to classify the pixels of the image to be segmented to obtain the final segmentation result. There are several machine learning methods used for classification-based label fusion, such as label fusion with randomised forests [8,9]. Support vector machine classifier combined with the augmented feature vector is used for cardiac MR images segmentation [10]. A segmentation framework based on a conditional random field (CRF), with a random forest encoding the likelihood function, is proposed for brain tissue segmentation [11]. Neural networks and deep learning techniques are also used as a classifier for label fusion [12][13][14][15][16]. For example, a novel voxel-wise residual network with a set of effective training schemes is used for brain segmentation [12], a convolutional neural network is proposed for the automatic segmentation of MR brain images into a number of tissue classes [13], a novel brain tumour segmentation method is developed by integrating fully convolutional neural networks and CRFs in a unified framework to obtain segmentation results with appearance and spatial consistency [14], a supervised artificial neural network framework combined with volumetric shape models is proposed for brain segmentation [15], a novel convolutional neural network based approach for accurate segmentation of the sub-cortical brain structures that combines both convolutional and prior spatial features is proposed for improving the segmentation accuracy [16]. In recent years, the classification-based label fusion methods have been welcomed by researchers, which can achieve better segmentation results. But these methods are very complex, and need to design network structure and a large number of training samples are required, which is time-consuming.
Patch-based label fusion methods are proposed inspired by the success of non-local strategy and patch-based method in imaging applications, e.g. image denoising [17,18], image restoration [19], target detection [20], and image classification [21]. These types of label fusion methods are a particular type of weighted voting label fusion method [10,22,23]. Here, the main idea is to allow multiple candidate labels (usually from the search neighbourhood) on each atlas and to aggregate them based on non-local means. Each voxel of MR brain image is represented by a small image patch centred on that pixel. Simi-lar image patches from the search neighbourhood of each atlas image are aggregated to the voxel to be labelled in target MR image. The more similar a patch of a voxel in an atlas image is to a voxel to be labelled in the target image, the higher is the weight that is used to propagate its labelling to the voxel to be labelled in the target image. Finally, all selected image patches from a subset of atlases with their weighed labels are fused to estimate the label of the voxel to be labelled in the target image.
As can be seen from the above description, it is crucial for patch-based label fusion method to determine the similarity between image patches from each atlas and image patch centred on the voxel to be labelled in target MR image. These methods show two advantages: (i) this type of approach drastically increases the number of labels of the atlas involved in the label fusion process, and (ii) compared with single voxel, image patch contains more abundant context information, so it can be used to produce more reliable weights of the labels of the atlas [22]. In order to obtain a more reliable weight of the label, different methods have emerged in recent years to measure the similarity (the weight of the label) between two image patches. The simplest and most direct method to measure the similarity between two image patches is the non-local patch-based label fusion (NPLF) method proposed by Coupé et al. [22]. This method directly uses the Euclidean distance between image patches as the similarity between image patches and highly depends on the intensity-based similarity measurement between patches. In order to make full use of other useful information to measure the similarity, Bai et al. proposed to combine the intensity, gradient, and contextual information into an augmented feature vector and incorporated it into label fusion process for segmentation [10]. Other richer and more complex image features used to measure the weight of label of the atlas are deep features obtained by deep learning [24][25][26]. Yang et al. proposed a deep architecture that integrates a feature extraction subnet and a NPLF subnet in a single network [24]. The learned deep features are mainly used to measure the weight of the label, and can also be used for atlas selection. Sparse representation methods are widely used in medical image segmentation [27][28][29][30][31][32][33][34][35][36][37], and they are also used to measure the weight of the label of the atlas [27][28][29][30][31]. Their main idea is that each image patch in the image to be segmented can be reconstructed by the sparse linear superposition of patches in the atlas images, and the reconstruction coefficients can be used to deduce weights of the labels of the atlas. Tong et al. proposed using image intensity patch library directly to construct a dictionary [28]. To limit the possibility of the patch-based similarity measurement being wrongly guided by the presence of multiple anatomical structures in the same image patch, Wu et al. proposed to divide each image patch of the atlas into a set of partial image patches according to the labels of this patch [30]. Due to the information that the image has divided into different patterns, the dictionary constructed from these new divided atlas patches make the label weight more reliable. In order to make full use of the prior information provided by the atlas and more reliable weight of atlas labels obtained, Yan et al. proposed to combine transformed intensity information with label information from the image patch to construct a new dictionary [31]. Experiment results show that the weights of atlas labels obtained from the new dictionary are more reliable and more accurate segmentation results can be obtained.
Although the patch-based label fusion methods are very popular and can produce accurate segmentation results, these methods determine the weight of each label according to the similarity of image patches. However, the reliability of the label of the atlas itself has been neglected, especially the reliability of the label at the boundary of the target brain tissue. This paper proposes a novel double-weighted label fusion method for MR brain image segmentation. Due to the errors caused by interpolation algorithm in atlas registration process and anthropogenic factors in the process of atlas sketching, the label information on the boundary of target brain tissue is very unreliable. So, we should measure the reliability of the label of the atlas itself. In the proposed label fusion method, we will introduce a new label weight to measure the reliability of the label of the atlas and we will use MV label fusion method [6], NPLF method [22], and sparse patch-based label fusion (SPLF) method [28] to demonstrate the performance of our double-weighted label fusion method. These methods have been successfully applied to the problem in label fusion. To the best of our knowledge, these methods have never considered the reliability of atlas label information itself in the process of label fusion. The proposed method estimate the reliability of each label of the atlas based on the local information of the atlas itself. In the proposed method, each label of atlas has two weights in label fusion process. One weight comes from the similarity between the image patch of the atlas and the image patch from the image to be segmented, and the other one comes from the reliability of the label itself. Experimental results show that multi-atlas label fusion methods based on these two weights can obtain more accurate segmentation results of brain tissues.
In the remainder of this paper, we will first introduce the methodology of patch-based label fusion method and how we add a new weight to each label of the atlas. The proposed method was evaluated on seven different brain tissues in Information Extraction from Images (IXI) [38][39] dataset and Internet Brain Segmentation Repository (IBSR) [40] datasets. We studied the influence of different parameters and compared the performance of the proposed methods with that of patch-based label fusion technique. The performance of different methods has been compared on different subjects of the IXI and IBSR datasets. Finally, we discuss the strengths and weaknesses of the proposed method and conclude the paper.

METHODS
The based principle of patch-based label fusion method [31] is that the central voxels of similar patches are considered to belong to the same structure [22]. This type of method assigns higher weights to the central voxel of similar patches and small weights to dissimilar patches. As a result, the central voxel of similar patches from the atlas contribute more to the final result of label fusion. The assumption of our method is that labels of atlas on the boundary of brain tissues are not very reliable due to the errors introduced by interpolation in the process of image registration and the errors introduced by human factors in the process of drawing the outline of the target tissue of the atlas. In label fusion process, it is necessary to focus on these labels. So, we assign a new weight to these labels according to their context information extracted from the atlas. After assigning a new weight to the label of atlas, each label of the atlas will have two weights in the process of label fusion. The first weight comes from the similarity between image patch extracted from atlas and image patch extracted from the image to be segmented, and the second weight comes from the reliability of the label calculated according to the context information of the atlas.
In the proposed method, a new weight is added to each label of atlas, which means that each label of the atlas has two weights in the process of label fusion. According to different calculation methods of the first weight (image patch-similarity) and in order to verify the effectiveness of our proposed double-weighted label fusion method, there are three different implementation types of the proposed method. They are double WMV label fusion method [6] (DOM), double-weighted NPLF method [22] (DON), and double-weighted SPLF method [28] (DOS).
For the MV label fusion method [6], different labels of different atlases have the same weight in label fusion process. For NPLF method [22], the weight of one label of the atlas comes from the similarity between the patch centred in this label and the patch centred in the voxel to be labelled in the target image. For the SPLF method [28], the basic assumption is that the target patch which will be labelled can be represented by a few representative atoms of a dictionary that is extracted from the same structure of different atlases [31]. After the coding of the target patch, the target voxel will be labelled based on these coding coefficients (weights of the labels) and dictionary [31]. This paper assigns a new weight to each label of atlas in the process of MV label fusion [6], NPLF [22], and SPLF methods [28], respectively. The new weight of the label is calculated from the context information of the atlas itself. After assigning a new weight to each label of the atlas using the method presented here, MV label fusion [6], NPLF [22], and SPLF methods [28] will have two weights for each label in the process of label fusion. We call the new MV label fusion method as DOM, the new NPLF method as DON, and the new SPLF method as DOS. Figure 1 demonstrates the three different implementation types of the proposed double-weighted label fusion method.

Single-weighted label fusion method
For the single-weighted label fusion methods, each label of the atlas has only one weight in label weighted fusion process. MV label fusion [6], NPLF [22], and SPLF methods [28] are of this type. Next, we will specifically introduce how these three methods are used for label fusion with single label weight.
For the image to be segmented T , our goal is to determine the segmentation result image L T . Before the process of multi-atlas label fusion, we need registration. Symbols I = -I s |s = 1, … , N˝and L = -L s |s = 1, … , N˝denote N registered atlas images and label maps, respectively. The size of the target image T and N registered atlas images and label maps are denoted as r t × r t × r t . This paper mainly studies the segmentation of single target brain tissue, so the value of each label of the atlas is either 0 or 1.
The label L T (x, y, z ) of voxel located in position (x, y, z ) of the target image T can be obtained by MV label fusion method using Equation (1): where w s (x, y, z ) denote the weight of label L s (x, y, z ) of atlas (I s , L s ) and (x, y, z ) is the location information in image (1 ≤ s ≤ N, 1 ≤ x ≤ r t , 1 ≤ y ≤ r t , 1 ≤ z ≤ r t ). For the MV label fusion method, each label of different atlases has the same weight w s (x, y, z ). For each target image voxel T (x, y, z ), all the atlas patches within a certain search neighbourhood V (x, y, z ) centered in position (x, y, z ), denote as p 1 , p 2 , … , p n , n = 1, … , N * r 3 s , are used to compute the weight of each label centered in its corresponding atlas patch, where r s × r s × r s is the size of the search neighbourhood. The target patch centered in voxel T (x, y, z ) is denoted as p t . We arrange each image patch p 1 , p 2 , … , p n and p t , into a column vector and use l 1 , l 2 , … , l n to denote the label of the voxel centered in atlas patches p 1 , p 2 , … , p n , respectively, and w 1 , w 2 , … , w n to denote the weight of corresponding labels l 1 , l 2 , … , l n . All the image patches are of the same size and denote the size as r p × r p × r p [31].
The difference between NPLF [22] and SPLF methods [28] is that the weights w 1 , w 2 , … , w n of labels l 1 , l 2 , … , l n are calculated in different ways.
The weights w 1 , w 2 , … , w n calculated by NPLF method [22] can be obtained by Equation (2): where ‖ ⋅ ‖ 2 2 is the L2-norm, normalised by the number of elements and h denotes the bandwidth for the Gaussian kernel [22].
The weights w 1 , w 2 , … , w n calculated by SPLF method [28] can be obtained by Equation (3): where W denotes the sparse coefficients and W is a column vector. P is a matrix and P = [p 1 , p 2 , … , p n ]. The regularisation parameter balances the relative contributions of the two terms and also controls the 'sparsity' of the linear model [28]. The sparse coefficients can be used as the weights of labels l 1 , l 2 , … , l n , so After the weights w 1 , w 2 , … , w n are obtained by NPLF [22] and SPLF methods [28], respectively, the label L T (x, y, z ) of voxel located in position (x, y, z ) of the target image T can be obtained using Equation (4): As we can see in Equation (4), every label has only one weight in the label fusion process, and the weight is derived from the similarity or relevance between the image patch of the atlas and the target image patch, and the value of each label is either 0 and 1 (we aim at single target segmentation). Since the labels of the atlas are obtained by manual sketching, there will inevitably be some errors, especially on the boundary of the tissue. Therefore, the reliability of atlas label itself should be questioned, especially those on the boundary of the tissue. On the other hand, due to the errors introduced by interpolation in the registration process, there will be some errors in the labels on the tissue boundary of the atlas. So, the labels of the atlas on the boundary of the tissue are not very reliable. In order to solve the problem of the reliability of atlas labels, especially the reliability of labels on tissue boundary, we propose a double-weighted label fusion method. In the proposed method, we assign a new type of weight to the labels on the brain tissue boundary of the atlas according to the context information of the atlas itself. Those weights are mainly used to measure the reliability of the labels of the atlas.

Double-weighted patch-based label fusion method
For the double-weighted patch-based label fusion method, we assign a new weight to each label of the atlas, which means that each label will have two weights in label fusion process. One weight comes from the similarity or relevance between the image patch of the atlas and the target image patch, and the other weight comes from the atlas itself. Next, we will introduce how to get the weight related to the reliability of the label itself, and how to combine it with MV label fusion [6], NPLF [22], and SPLF methods [28].
According to the context information of the atlas itself, each label of the atlas will be assigned a weight, which is mainly used to measure the reliability of the label. In the atlas, the boundaries of different brain tissues and background are very clear, and there is no smooth information among them, which will inevitably lose some useful information, especially brain tissue information. This paper focusses on the background labels of atlas, hoping to establish some smooth information between the background and the target.
In the label image L s of one numbered atlas s, if the value of one label located in position (x, y, z ) equals 0 (L s (x, y, z ) = 0), we add a weight to it to indicate its probability of 0, and if the value of the label equals 1 (L s (x, y, z ) = 1), we add a weight to it to indicate its probability of 1. We use A s (x, y, z ) to denote the probability. After this treatment, each atlas consists of three parts: atlas image I s , label image L s , and label probability image A s .
For each label L s (x, y, z ) of atlas s, we can extract an image patch centered on this label from atlas label image L s . We use p s (x, y, z ) to denote this image patch and its size denote as r q × r q × r q . The value of A s (x, y, z ) can be calculated as Equation (5): where sum(p s (x, y, z )) is the number of labels with a label value of 1 in the image patch p s (x, y, z ). One example about calculating A s (x, y, z ) is shown in Figure 2. From Figure 2, we can see that each label of the atlas will get reliability after simple calculation and the reliability of the label is calculated according to the information of the atlas itself. After this operation, each atlas will get a label probability.
Finally, the label L T (x, y, z ) of voxel located in position (x, y, z ) of the target image T can be obtained by method DOM, method DON and method DOS, respectively, as shown in Equations (6) and (7), where the symbols a 1 , a 2 , … , a n denote the label probability of the label centered in atlas patches p 1 , p 2 , … , p n , which are obtained from label probability image A 1 , A 2 , … , A N . Equation (6) is the proposed DOM method, and Equation (7) is the proposed DON and DOS methods. The only difference between DON and DOS methods in Equation (7) is the method about calculating weight w i .
From Equations (6) and (7), we can clearly see that each label of atlas has two weights in multi-atlas label fusion process. In Equation (6), we combine the new label weight with MV label fusion method. The symbol w s (x, y, z ) is the first weight of the label located in position (x, y, z ) and the symbol A s (x, y, z ) is the second weight of the label location in position (x, y, z ). The symbol L s (x, y, z ) is the value of the label located in position (x, y, z ) and s is the sequence number of the atlas. The value of A s (x, y, z ) is calculated by Equation (5) and the value of w s (x, y, z ) is same with Equation (1). In Equation (7), we also assign a new weight to every label participated in the process of label fusion. The weights w 1 , w 2 , … , w n in Equation (7) can be obtained by the NPLF method using Equation (2) and can also be obtained by the SPLF method using Equation (3). The symbols a 1 , a 2 , … , a n are the second weights. They denote the label probability of the label centred in atlas patches p 1 , p 2 , … , p n , which are obtained from label probability image A 1 , A 2 , … , A N . If the weights w 1 , w 2 , … , w n in Equation (7) are obtained by the NPLF method using Equation (2), we call it DON method. If the weights w 1 , w 2 , … , w n are obtained by the SPLF method using Equation (3), we call it DOS method.

Datasets and metrics
To assess the performance of the proposed algorithms, we develop a set of experiments on real datasets. The real datasets are the IBSR and IXI datasets. The IXI (http://brain-development.org/ixi-dataset/) dataset consisting of 30 images with 83 manually labelled structures and skull stripped, is provided by the Imperial College in London [38,39]. The 30 volunteers include 15 males and 15 females with age ranging from 20 to 54.
To evaluate the performance of our proposed label fusion method, the proposed method is compared to several existing state-of-the-art label fusion methods using publically available neuroimaging datasets.
Specifically, the MV label fusion method [6], NPLF [22], sparse patch-based labelling method (SPLF) [28], similarity and truth estimation for propagated segmentations (STEPS) [41], and HMAS method [30] are tested. The symbols DOM, DON, and DOS represent three different implementation types of our proposed label fusion method. To measure the segmentation performance of these methods, Dice ratio is used which measures the degree of overlap between two regions R 1 and R 2 as follows [22]: Symbol | ⋅ | denotes the volume of the region of one brain tissue. R 1 and R 2 represent the same region of one brain tissue in two different images. In all the experiments, one atlas image is used as the image to be segmented, and others in dataset serve as atlases. We are mainly interested in the segmentation of subcortical structures in brain MR images. We pick seven subcortical structures of interest for the comparison, including hippocampus (Hip), amygdala (Amy), caudate nucleus (Cau), putamen (Put), thalamus (Tha), nucleus accumbens (Acc), and pallidum (Pal). The experimental part consists of three parts: a comprehensive evaluation of the proposed label fusion method is performed using IXI dataset, then the proposed method is evaluated in IBSR dataset, and the influence of parameters on the results is discussed at the end.

FIGURE 3
The segmentation results of hippocampus of majority voting (MV) and DOM methods

Experimental results on IXI dataset
A leave-one-out strategy is used to compare the label fusion performance of MV, NPLF, SPLF, STEPS, HMAS, and our proposed label fusion methods. Table 1 shows the results of these methods. Every result in the table means the average Dice ratio of 30 segmentation results of one brain subcortical structure. Average Dice ratios of brain tissues segmentation in IXI dataset are shown in Table 1. By comparing the methods DOM versus MV, DON versus NPLF, and DOS versus SPLF, we can see clearly that the segmentation accuracy is improved using double weights in the process of label fusion. A comparison of the methods DOM, DON, and DOS suggests that method DOS gets the best segmentation results. This proves that the segmentation accuracy is affected by the two weights in our method at the same time.
It indicates that if the reliabilities of all labels are improved, more accurate segmentation result will be obtained. The segmentation results of method DOS are better than that of STEPS and HMAS methods, except for brain tissues hippocampus, amygdala, and accumbens, which are slightly worse. This shows the good performance of our proposed double-weighted patchbased label fusion again.

Experimental results on IBSR dataset
In order to further demonstrate the performance of our proposed method, we compared the results of DOM method with MV method, DON method with NPLF method, and DOS method with SPLF method in IBSR dataset. The results of these methods are shown in Table 2. Each result in the table means the average of 18 segmentation results of one subcortical structure. It can be seen from Table 2 that the segmentation results of DOM method is better than that of MV method, DON method is better than that of NPLF method, and DOS method is better than that of SPLF method. By observing the experimental results in Table 1, it is shown again that the segmentation effect of our proposed double-weighted label fusion method is better than that of single-weighted label fusion method. In recent years, deep learning methods have demonstrated a good performance in brain MR image segmentation. Here, our proposed method DOS is compared to deep learning methods including MS-CNN [41], BrainSegNet [42], M-net [43], and automatic segmentation tools including FIRST [44] and FreeSurfer [45]. The results are shown in Table 3. The Dice ratio of our method is higher than that of FIRST and FreeSurfer for each structure. The segmentation results of our method are better than those of deep learning methods, except for the segmentation results of putamen.  The segmentation results of hippocampus of majority voting (MV) and DOM methods presents the segmentation results of a slice of hippocampus from IBSR dataset. It can also be seen clearly from the circle marked in the figure that at the boundary of the hippocampus, the double-weighed label fusion method achieves better segmentation results compared with single-weighted label fusion method.

Influence of parameters
The main parameters of our method are patch size r p and r q , the size of search neighbourhood r s and parameters h involved in formula 2. The parameter h is about Gaussian kernel in NPLF method, and we set it to the same value as reference [22]. In this part, we mainly discuss the influence of new parameter r q introduced by our proposed method on segmentation results. Figure 9 shows the influence of three parameters r p , r q ,

FIGURE 7
The segmentation results of hippocampus of non-local patchbased label fusion (NPLF) and DON methods

FIGURE 8
The segmentation results of hippocampus of sparse patchbased label fusion (SPLF) and DOS methods

FIGURE 9
The average dice ratio of different parameters in each method and r s on segmentation performance. Taking Figure 9(a) as an example, DON (5-7) means r p = 5 × 5 × 5, r s = 7 × 7 × 7, and the legend 3 × 3 means r q = 3 × 3 × 3. Figure 9(b-d) has the same meaning as Figure 9(a). From Figure 9, we can see that parameter r q still has certain influence on final results. With the increase in parameter r q , the segmentation accuracy decreases, but the influence is not significant.

CONCLUSION
This paper mainly studies the label weight calculation method. We introduce a double-weighted label fusion method to improve the segmentation performance. Compared with original single-weighted label fusion method, our method introduces a new label weight in label weighted fusion process. Because of the complexity of brain structure, the errors introduced in image registration process and atlas drawing process, the reliability of labels of the atlas in the boundary of brain tissue is not very reliable. This new weight is mainly used to measure the reliability of atlas labels, especially at the boundary of brain tissue. The new weight of the label is calculated according to the information of the atlas itself. So the proposed method not only improves the reliability of weight of the label but also makes full use of atlas information. The experiments of brain tissue segmentation in different MR images have shown that our method can improve the segmentation accuracy of brain tissue after introducing double weights in the process of label weighted fusion.
To date, there are two shortcomings in our method. First, the calculation method of the second weight of the label is too simple, and in the process of weight calculation, the information of atlas is not fully utilised. Second, in the process of multi-atlas label fusion, the label of each pixel of the image to be segmented is determined independently, without any neighbourhood feedback or prior shape information. In the following research work, we will improve segmentation accuracy of brain tissues from the following aspects: (i) In the process of label weight calculation, we can make full use of atlas information, to improve reliability of the weight of the label and get more accurate segmentation results. (ii) In order to improve the segmentation accuracy, the information of the weights of the label can be used to integrate with high-level shape information.