Automated segmentation of liver and hepatic vessels on portal venous phase computed tomography images using a deep learning algorithm

Abstract Background CT‐image segmentation for liver and hepatic vessels can facilitate liver surgical planning. However, time‐consuming process and inter‐observer variations of manual segmentation have limited wider application in clinical practice. Purpose Our study aimed to propose an automated deep learning (DL) segmentation algorithm for liver and hepatic vessels on portal venous phase CT images. Methods This retrospective study was performed to develop a coarse‐to‐fine DL‐based algorithm that was trained, validated, and tested using private 413, 52, and 50 portal venous phase CT images, respectively. Additionally, the performance of the DL algorithm was extensively evaluated and compared with manual segmentation using an independent clinical dataset of preoperative contrast‐enhanced CT images from 44 patients with hepatic focal lesions. The accuracy of DL‐based segmentation was quantitatively evaluated using the Dice Similarity Coefficient (DSC) and complementary metrics [Normalized Surface Dice (NSD) and Hausdorff distance_95 (HD95) for liver segmentation, Recall and Precision for hepatic vessel segmentation]. The processing time for DL and manual segmentation was also compared. Results Our DL algorithm achieved accurate liver segmentation with DSC of 0.98, NSD of 0.92, and HD95 of 1.52 mm. DL‐segmentation of hepatic veins, portal veins, and inferior vena cava attained DSC of 0.86, 0.89, and 0.94, respectively. Compared with the manual approach, the DL algorithm significantly outperformed with better segmentation results for both liver and hepatic vessels, with higher accuracy of liver and hepatic vessel segmentation (all p < 0.001) in independent 44 clinical data. In addition, the DL method significantly reduced the manual processing time of clinical postprocessing (p < 0.001). Conclusions The proposed DL algorithm potentially enabled accurate and rapid segmentation for liver and hepatic vessels using portal venous phase contrast CT images.


INTRODUCTION
Accurate segmentation from medical images is a fundamental prerequisite for surgical planning. 1Multiphase computed tomography (CT) remains the preferred imaging modality for hepatic lesions. 2,3Due to the intricate hepatic anatomy, even experienced surgeons can be blinded to some critical structures, potentially affecting surgical decision-making. 4[10] Conventional image segmentation is manually performed by radiologists, which is time-consuming and subjective to substantial inter-observer variations. 11ince manual approaches cannot satisfy surgical planning within a short time, 12 automated segmentation has attracted increasing interest.7][18][19] These algorithms automatically segment medical images in an end-to-end manner and progressively improve parameters to optimize final segmentation, generally achieving better performance.Several CNN frameworks have been proposed for medical image segmentation, including fully convolutional networks (FCNs), 20 DeepLab, 21 dense convolutional networks (DenseNets), 22 residual networks (ResNets), 23 generative adversarial networks (GANs), 24 and U-shaped networks (U-Nets). 25Among these, U-Net architecture with promising results has widely attracted popularity.For example, the Attention-Based Residual U-Net proposed by Wang et al. 26 achieved automatic and accurate liver segmentation on LiTS17 and SLiver07 datasets.Kitrungrotsakul et al. 27 also introduced a deep CNN for accurate hepatic vessel segmentation.Due to complicated and changeable imaging representations, the accuracy of DL algorithms relies upon the quantity and variety of training data; 28 however, these networks were developed based on limited training samples from public datasets and evaluated using the reference datasets with incomplete annotations.Particularly, annotation quality of datasets highly affects the accuracy of DL algorithms, and considering incomplete annotations as the benchmark for training and evaluating may cause segmentation bias.
Hence, our study aimed to propose a DL algorithm for accurate and rapid liver and hepatic vessel seg- mentation based on sufficient CT data with high-quality annotations.Furthermore, the DL algorithm was clinically evaluated using preoperative CT data from patients with focal lesions.

METHODS AND MATERIALS
This study was approved by the institutional research ethical committee and the written informed consent was waived due to the retrospective nature of this study.We present our article in accordance with the STROBE reporting checklist.

2.1.1
The development and test datasets  1.
The included CT scans were strictly selected through the inclusion and exclusion criteria.Inclusion criteria: (1) DICOM format data with integrity; (2) ≤2 mm slice thickness and matrix size ≥512 × 512; (3) Multiphase or only portal-venous phase contrast enhanced CT images covering the entire liver.Exclusion criteria: (1) Incomplete CT series; (2) Inadequate scanning phases or ranges.Multiphase contrast-enhanced CT scans were acquired by a GE Discovery 16 Slice CT scanner (GE Health care, Boston, USA).The portal venous phase images were stored in DICOM format and presented with slice thickness of 1.25 mm.All the data were completely anonymized and no patient-specific information was extracted or could be retracted.
In the development dataset, 413, 52, and 50 CT scans were applied to the training, validation and testing of the DL algorithm, respectively.Additionally, its robustness and generalization were independently evaluated based on 44 CT scans in the test dataset.

The annotation protocol and ground truth
To ensure the consistency of segmentation results, the radiologists involved with this project went through a training procedure and annotated several cases together before formal annotation.To obtain the reliable ground truth, each CT series was manually annotated by one of three experienced radiologists (Dr.S. L., Dr. J. P., and Dr. B. L., with 8-, 10-, and 12-year experience in abdominal radiology, respectively) under the supervision of one expert radiologist (Dr.Z.B., with 15-year experience in abdominal radiology).Firstly, all the images were randomly divided into three groups and evenly distributed to each radiologist.The preliminary masks marking the liver and hepatic vessels were generated and prelabeled using ITK-SNAP 3.6 software.To reconfirm the accuracy, the preliminary masks were independently double-checked and refined by one expert radiologist (Prof.X.L., with more than 20 years of experience in abdominal radiology).Any inaccuracies were adjusted and corrected by the expert radiologist, after which the final segmentation masks were accomplished as the ground-truth reference.This annotation protocol was globally applied to the manual ground-truth generations in our study.

Data pre-processing and augmentation
Data pre-processing plays a crucial role in medical image segmentation tasks.The pre-processing of the CT images mainly included the following steps.Firstly, the images were reorientated to the LPI(Left-Posterior-Inferior)direction in patient coordinate system.Then, CT images were resampled to a fixed size with a coarse input of 512 × 512 and a fine output of 256 × 256 to reduce the computational loads.The Hounsfield values were windowed with range from −200 to 350 to enhance the contrast and remove the irrelevant tissues from CT images.In addition,a z-score normalization was applied based on the mean and standard deviation of the intensity values.
To alleviate data overfitting, data augmentation methods were used during training.Specifically, we augmented the training data by horizontal flipping, rotation with a maximum angle of 25 • , shifting by horizontally and vertically translating the images (a maximum of 20 pixels),random occlusion of erasing a rectangle imaging region with a random value and adding Gaussian noise to the data.The rotation and shift were used to increase the adaptability to different body positions in the surgical scanning.The random occlusion aimed to avoid the metal implant artifacts and Gaussian noise was added to improve the robustness.

The proposed DL algorithm development
Our developed DL algorithm performed the training, validation and testing based on 413, 52 and 50 CT scans from the Development dataset, respectively.
Our DL algorithm using a whole-volume-based coarse-to-fine framework 29 mainly composed of coarse and fine segmentation (Figure 1).Briefly, the coarse segmentation process preliminarily extracted the general features based on the whole-volume CT images, such as the position and contours.After that, the results were further refined in details during the fine process.Theoretically, this coarse-to-fine segmentation pattern was highly efficiently and adaptable to huge anatomical variations in hepatic regions.
The proposed U-Net deep learning algorithm consists of three major parts: the feature encoder module, the context extractor module, and the feature decoder module (Figure 2).The encoder module is composed of a consecutive multi-layer perceptron (ConvMLP) block, 30 and the decoder module with one residual convolution block.The details of the DL algorithm are described in Supplementary Materials 1.

Validation in preoperative clinical data
The performance of DL-based segmentation was extensively evaluated and compared with manual segmentation using preoperative 44 CT scans with focal lesions from real clinical scenarios.The DL-based automated segmentations were accomplished using the DL algorithm, while the manual segmentations were implemented by the radiologists (Dr.S. L., Dr. J. P., Dr. B. L.) in clinical practice using the Volume Viewer in Medical Image Processing Software (GE Medical Systems SCS, GE Health care, Boston, USA), which is the widely used software in medical image-postprocessing.All 44 CT series was annotated in accordance with the standard annotation protocol (Section 2.1.2) to generate the ground truths and all manual segmentations were independently performed blinded to the DL-derived results.Finally, the evaluation metrics and image processing time were quantitatively compared.The manual processing time per case was recorded from the first ROI setting until the final revision, while the DL processing time per case was from the initial input of images to the final output.Additionally, our proposed DL method was compared to the CNN 16 and Unet 26,27 networks based on 44 CT series in the Test dataset.

Inter-observer and inter-method agreement
To evaluate inter-observer variability among the radiologists in manual segmentation, the manual segmentation was performed by the three radiologists Reader 1 (Dr.S. L.), Reader 2 (Dr.J. P.), and Reader 3 (Dr.B. L.) on a randomly selected subgroup of 40 CT series, each.All manual segmentations were performed blinded to the results from other radiologists.The inter-observer agreements of the DSC values were estimated using the intraclass correlation coefficients (ICCs).
To assess the differences between the two methods, the DL-derived DSC values were compared to the averaged DSC of manual segmentations from three radiologists (Reader 1, 2, 3).Another subgroup of 44 CT series was randomly selected for the inter-method analysis using Bland-Altman plots.All manual segmentations were performed blinded to the DL-derived results.

Evaluation metrics
The accuracy of DL-based segmentation was quantitatively evaluated using the Dice Similarity Coefficient (DSC), Normalized Surface Dice (NSD) and Hausdorff distance_95 (HD95).Different from the liver parenchymal, the hepatic vessels are generally tube-structured branches with the highly skewed proportion of vessels and background.Recall and Precision are more frequently adopted in vessel segmentation assessment by excluding truenegative cases (pixels belonging to background in accordance with GS) from comparisons. 31Besides DSC, Recall and Precision are used to evaluate the segmentation performance of hepatic veins, portal veins, and inferior vena cava in our study.The definition of evaluation metrics is shown in Supplementary Materials 2.

Statistical analysis
Data are represented as mean ± standard deviation.The inter-reader and inter-method agreements were assessed using the intraclass correlation coefficient (ICCs) with corresponding 95 % confidence intervals (CI) and the Bland-Altman 95% limits of agreement (LOA), respectively.Comparison of DSC was calculated via the paired samples t-test, while the NSD, HD95, Recall, Precision and processing time comparisons were computed using paired samples Wilcoxon signed rank test.p value < 0.001 shows statistically significant difference.Statistical analysis was performed using SPSS version 21.0 (IBM Corp., Armonk, NY, USA).

Segmentation performance of the proposed network
After the initial training procedure using 413 data, the accuracy of DL-based liver and hepatic vessel demonstrated in Table 2. Compared with the groundtruth, the representative DL-based segmentation results were illustrated in Figure 3.
Moreover, the integrated illustration of the DL-based segmentations for the liver and hepatic vessel is shown in Figure 3.

Validation in preoperative clinical data
In the independent test dataset of preoperative 44 CT scans with focal lesions, our DL algorithm statistically outperformed the manual method on the liver segmentation (DSC 0.98 ± 0.01 vs.0.97 ± 0.01,NSD 0.89 ± 0.05 vs. 0.79 ± 0.05, HD95 1.88 ± 0.58 vs. 3.07 ± 2.03 for DL vs. manual method, respectively, all p < 0.001)   (Figure 4 and Table 4).The comparisons between the DL and manual method on the liver segmentation were demonstrated in Figure 6.
Compared to other networks, our proposed method showed superior performance in liver and hepatic vessel segmentation (Table 6).

Inter-observer and inter-method agreement
We found excellent ICCs above 0.992 for the interreader assessment of the manual segmentations for liver and hepatic vessels (Table 7).In addition, there was an excellent agreement between the DL-derived and manual-derived results averaged over the three readers (Figure 7).The Bland-Altman 95% LOAs between F I G U R E 6 Comparisons between DL and manual method on the representative segmentation and 3D reconstruction results.Comparisons involve the segmentation performance of (I) Liver, (II) Hepatic vessels (hepatic vein, portal vein, and inferior vena cava), (III) Integrated of liver and hepatic vessels.In each imaging group, each row illustrates the segmentation results and 3D reconstruction based on the segmentation of the ground truth (first row), DL algorithm (second row), and manual approach (third row).DL, deep learning; HV, hepatic vein; IVC, inferior vena cava; PV, portal vein.

DISCUSSION
In our study, we innovatively developed and validated a DL algorithm for automated liver and hepatic vessel seg-mentation using a large amount of annotated clinical CT data.Based on the original performance, our developed DL algorithm initially reached acceptable accuracy for both liver and hepatic vessel segmentation.In the independent comparison with the manual approach using preoperative images from clinical scenarios, DL-based segmentation quantitively outperformed manual method with higher accuracy and processing efficiency, clinically indicating the robustness and generalizability of our DL algorithm.Compared to other CNN and Unet networks, our proposed method also showed superior performance in liver and hepatic vessel segmentation.The inter-reader agreement assessments showed excellent results with regards to the ICC values.The DSC values obtained by DL-based segmentation showed close agreement with those derived by the radiologists manually, with a small bias and measurement error.Recently, several automated methods have been proposed for liver and hepatic vessel segmentation.A multi-scale UNet proposed by Kushnure et al. 32 using 20 CT scans from the 3Dircadb publicly available dataset achieved a DSC of 0.971 for liver segmentation and reduced the computational complexity.Wang et al. 33 developed a multi-scale attention and deep supervisionbased 3D UNet on three public datasets (including 131, 20 and 20 CT scans from LiTS17, SLiver07, and 3DIRCADb, respectively) with high accuracy (Dice of 0.9727, 0.9752, and 0.9691, respectively) for liver segmentation.Instead of using limited training data from Note: The inter-reader and inter-method agreements were assessed using the intraclass correlation coefficient (ICCs) with corresponding 95 % confidence intervals (CI) and the Bland-Altman 95% limits of agreement (LOA), respectively.The Bland-Altman 95% LOA results are presented as the mean difference ± 1.96 × standard deviation (SD) of the difference and the limits of agreement are shown in parentheses.Abbreviations: CI, confidence intervals; DSC, dice similarity coefficient; ICC, intraclass correlation coefficient; LOA, limits of agreement; SD, standard deviation.
public datasets with incomplete annotations, our DL algorithm was developed and evaluated using sufficient data with double-refined annotations from our institution, generating results closer to clinical practice.Moreover, data augmentation was conducted to enrich the imaging appearances of liver and hepatic vessels to improve the robustness of the training and avoid data overfitting.Regardless of the higher DSC attained by our DL algorithm, NSD and HD95 were supplemented as additional metrics to comprehensively evaluate the DLbased liver segmentation for surgical planning purposes.Since DSC sometimes cannot reflect the boundary errors of segmentation, NSD, as a more sensitive metric to the boundary errors, tends to quantitively assess the errors occurring between the boundaries of segmentation and ground truth. 34Especially in surgical planning such as needle trajectory planning, boundary errors are of vital importance and should be eliminated as little as possible. 35,36The results of NSD and HD95 complementally indicated fewer boundary errors in the shape surface annotations and partly met the clinical demands for surgical planning.As for hepatic vascular segmentation, Kitrungrotsakul et al. 27 applied deep convolutional networks with multi pathways to liver vessel segmentation and achieved the average DSC of 0.901 on the VASCUSYNTH simulation dataset and the highest Recall and Precision of 0.89 and 0.87 on the IRCAD dataset comprising 20 scans at 1% noise level, respectively.Hao et al. 37 proposed a dual-branch progressive 3D Unet for accurate segmentation of liver vessels and reached average DSC and sensitivity of 75.18% and 78.84% using public dataset 3Dircadb, respectively.However, these algorithms were trained and validated on limited training samples using incomplete annotations as the benchmark for training and evaluation, potentially causing huge bias.Different from the liver parenchymal, the proportions of hepatic vessels and background voxels are highly unbalanced and skewed.Due to the low contrast with surrounding tissues, high noise and irregular vessel shapes caused by nearby tumors, accurate liver vessel segmentation remains challenging. 38To improve the segmentation accuracy with imbalanced classes, the Tversky loss function was used to adjust the parameters of overor under-segmented foreground pixel numbers based on the dice loss function and increase the penalty for misclassified voxels to train and optimize the network to identify vessels with weak boundary, high noise or low contrast.Considering that our manual annotations were double-checked and refined by experienced radiologists, inaccurate annotations were limited as far as possible in the datasets,leading to higher quality training and evaluating procedures.Based on the comparable accuracy to the previous results, the proposed DL algorithm effectively extracted liver vessel features from CI images and reconstructed the 3D position relationships of liver vessels.Inevitably, some tiny errors and discontinuities occurred, mainly in distal branches of hepatic veins, which minimally affected the surgical planning procedures.
To clinically confirm the adaptability to surgical planning scenarios, the accuracy and efficiency of DL-based segmentation were independently evaluated and compared with manual approach using preoperative CT data from patients with intrahepatic focal lesions.By contrast, DL-based segmentation achieved statistically higher accuracy with lower dispersion in data distribution, contributing to robust segmentation results without inter-observer variability.Since various hepatic lesions within images did not significantly reduce the accuracy, the clinical generalizability was partly verified in CT images containing focal lesions, potentially allowing for larger-scale application to patients with suspicious focal lesions.Although manual delineation and reconstruction possibly reached higher accuracy without the limitation of time and patience, similar accuracy was rarely reached in a clinically reasonable period, especially in surgical planning settings. 39In our study, the mean processing time of the DL algorithm was 173s, nearly six times faster than the 1032s of manual approach, representing higher efficiency in practice.Based on a coarse-to-fine pipeline, our algorithm rapidly smoothed the edged boundaries and surfaces of the segmentation leading to initial 3D spatial visualization in the coarse stage; the errors and discontinuities occurring at the vessel branches were further suppressed to ensure the integrity of vascular branches in the fine stage.To reduce computation and improve spatial connections, ConvMLP block was used to improve the network in a more lightweight and stage-wise manner.
There are several limitations in our algorithm.Firstly, it was developed and evaluated using single-center data.Though we utilized sufficient numbers of data with high-quality annotations, the broad applicability should be further assessed based on external data from multiple institutions.Secondly, our algorithm only offers the segmentation of the liver and vessels, but not hepatic lesions or segmental liver.DL-based segmentation aimed at the hepatic focal lesions and functional segments has already been planned in our further research.Thirdly, only the portal venous phase CT images were used in the algorithm training, potentially limiting the applications.The larger-scale applications to other imaging modalities require further training based on additional training data through transfer learning.Fourthly, our algorithm was not systematically compared with state-of -the-art networks.Lastly, a small portion of errors and discontinuities still exist in some vessel segmentation with lower accuracy, which undoubtedly needs to be solved by optimizing our model.

CONCLUSION
In conclusion, we developed and evaluated a DL algorithm for automated liver and hepatic vessel segmentation based on large amounts of CT images.
Compared with the manual approach, our DL algorithm quantitatively showed higher accuracy with less time consumption.Our algorithm potentially serves as a CT-based practical tool to clinically assist surgical planning.

AU T H O R C O N T R I B U T I O N S
Xiaoguang Li led and coordinated this study.Shengwei Li and Lin Cheng collected the data and built the datasets.Fanyu Zhou, Yumeng Zhou, and Shengwei Li trained and developed the deep learning algorithm.Zhixin Bie led and performed the manual segmentation work with Jingzhao Peng and Bin Li.Shengwei Li performed data interpretation and statistical analysis under the supervision of Xiaoguang Li.Shengwei Li was the major contributor to writing and revising the manuscript.All authors were involved in critical revisions of the manuscript, and have read and approved the final version.

AC K N OW L E D G M E N T S
This study has been supported by the National High Level Hospital Clinical Research Funding (BJ-2022-106).

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors have no relevant financial or non-financial interests to disclose.

DATA AVA I L A B I L I T Y S TAT E M E N T
The imaging data from Beijing Hospital currently cannot be publicly accessible due to privacy protection.Reasonable requests for the datasets and materials used in this study can be addressed to the corresponding author.

E T H I C S S TAT E M E N T
This study was approved by the Institutional Review Board of Beijing Hospital (IRB No. 2022-BJYYEC-361-01).This study was performed in line with the principles of the Declaration of Helsinki.Written informed consent was waived due to the retrospective nature of our study.

F
I G U R E 1 A schematic diagram of a whole-volume-based coarse-to-fine segmentation framework.

F I G U R E 2
The architecture of the coarse-to-fine deep learning segmentation U-net framework.segmentationwas quantitively evaluated using 52 (validation) and 50 (test) CT scans, respectively.The DL-based liver segmentation achieved the highest accuracy with DSC of 0.98 ± 0.01, NSD of 0.92 ± 0.04, and HD95 of 1.52 ± 0.89 mm in 50 testing cases, while DSC of 0.98 ± 0.01, NSD of 0.89 ± 0.06, and HD95 of 1.88 ± 0.81 mm in 52 validation cases.The quantitative evaluation of the DL algorithm was TA B L E 2 Quantitative evaluation of the DL algorithm for liver segmentation.± 0.01 0.92 ± 0.04 1.52 ± 0.89 Note: Data are represented as mean ± standard deviation.Abbreviations: HD95, Hausdorff distance_95; DSC, dice similarity coefficient; NSD, normalized surface dice.F I G U R E 3 Representative segmentation and 3D reconstruction based on the developed DL algorithm.(I) DL-based liver segmentation and 3D reconstruction, (II) DL-based segmentation and 3D reconstruction of the hepatic vessels (hepatic vein, portal vein, and inferior vena cava), (III) Integrated illustration of the DL segmentation.In each imaging group, (a) Image of the ground truth segmentation, (b) 3D reconstruction based on the ground truth segmentation, (c) Image of the DL segmentation, (d) 3D reconstruction based on the DL segmentation.DL, deep learning; HV, hepatic vein; IVC, inferior vena cava; PV, portal vein.

Note:F I G U R E 4
Data are represented as mean ± standard deviation.DSC = Dice similarity coefficient.Violin plots of the comparisons between the DL algorithm and manual method on the liver segmentation performances (DSC, NSD, and HD95).**** represents p value < 0.0001.DL, deep learning; DSC, dice similarity coefficient; NSD, normalized surface dice.TA B L E 4 Quantitative comparison of segmentation performance between the DL and manual method.

F I G U R E 5
Box and scatter plots of the comparisons between the DL algorithm and manual method on the hepatic vessel segmentation performances (DSC, Recall, and Precision).**** represents p value < 0.0001.(a) Hepatic vein; (b) Portal vein; (c) Inferior vena cava.DL, deep learning; DSC, dice similarity coefficient.

F I G U R E 7 TA B L E 5
Bland-Altman plots for agreement between DSC values by the DL-based and manual segmentations for liver (a), hepatic vein (b), portal vein (c), and inferior vena cava (d).Solid lines indicate mean differences and dashed lines indicate upper and lower limits of 95% limits of agreement.DSC, dice similarity coefficient; DL, deep learning; SD, standard deviation.Comparison of the processing time between the DL and manual method.Data are represented as mean ± standard deviation while the data in parentheses are range.M ± SE = mean ± standard; 95%CI = 95% confidence interval.Comparison of processing time was calculated via paired samples Wilcoxon signed rank test and p value < 0.001 show a statistically significant difference.
Characteristics of the private dataset and independent test dataset.
TA B L E 1Note: Data are expressed as the number of cases while the data in parentheses are percentages.*Data are expressed as mean ± standard deviation.
Quantitative evaluation of the DL algorithm for hepatic vessel segmentation.
TA B L E 3 Data are represented as mean ± standard deviation while the data in parentheses are range.Comparison of DSC was calculated via the paired samples t-test, while the NSD, HD95, Recall and Precision comparisons were computed using paired samples Wilcoxon signed rank test.p values < 0.001 show statistically significant differences.
TA B L E 6