Novel deep learning radiomics model for preoperative evaluation of hepatocellular carcinoma differentiation based on computed tomography data

The evaluation of tumor differentiation is an urgent clinical issue that would facilitate the establishment of individualized therapeutic strategies. 1–3 Our team developed a deep learning radiomics model based on computed tomography (CT) data for preoperative evaluation of hepato-cellular carcinoma (HCC) differentiation (low vs. high grade) and preliminarily explored the biological basis of the radiomics model. We

Dear Editor, The evaluation of tumor differentiation is an urgent clinical issue that would facilitate the establishment of individualized therapeutic strategies. [1][2][3] Our team developed a deep learning radiomics model based on computed tomography (CT) data for preoperative evaluation of hepatocellular carcinoma (HCC) differentiation (low vs. high grade) and preliminarily explored the biological basis of the radiomics model.
We included 1047 patients from the First Affiliated Hospital, College of Medicine, Zhejiang University (Institution 1) and 187 patients from the Ningbo Medical Center Lihuili Hospital (Institution 2). Data from Institution 1 were divided into training and internal validation cohorts by stratified sampling at a 3:1 ratio, while data from Institution 2 constituted the independent test cohort ( Figure S1). Patient characteristics are shown in Table 1; there were no significant differences in the distribution of clinical characteristics among the three cohorts.
The radiomics pipeline ( Figure 1) mainly involved data acquisition from CT images (Method S1), segmentation of regions of interest, feature extraction (Table S1) and selection, model construction and evaluation and multiomics analysis (Method S2). In total, 707 radiomics features were extracted from CT image data; 614 were filtered out because of low reproducibility or high redundancy, and 25 features with a significant impact on the target were ultimately selected (Table S2). A radiomics signature was established using the random forest (RF) method (Table  S3, Figure S2). The AUCs in the training, internal validation and external test cohorts were 0.82, 0.76 and 0.75, respectively ( Figure S3). Violin plots of selected features are shown in Figure 4A. The accuracy of the radiomics signature in the training, validation and test cohorts were 0.75, 0.72, and 0.66, respectively; the sensitivity was 0. There were no significant differences between the deep learning model and radiomics signature, although the former had a slightly higher AUC. To see how much value radiomics or deep learning can bring to some risk factors about tumor morphology and size, the features (origi-nal_shape2D_Sphericity, original_shape2D_Elongation, original_shape2D_MajorAxisLength) were used to construct a morphological model ( Figure S5).
Predictions based on clinical characteristics were determined from the clinical model established from RF of clinical characteristics. After visualizing the predicted probabilities of the clinical model, radiomics signature, and deep learning model, we found that the three predictors showed good discriminatory power for groups with different pathologic grades ( Figure 3B). The performance of the clinical model is unsatisfactory ( Figure S6). Next, the clinical model, radiomics signature, and deep learning model served as the base models for inputting predicted probabilities into the logistic regression model for multi-model predictions fusion. ROC curves of the fused model applied to the three cohorts are shown in Figure 3C.  Table 2. The fused model showed the best performance in the training, validation, and test cohorts, with an AUC of 0.89, 0.83, and 0.80, respectively; accuracy of 0.82, 0.77, and 0.73, respectively; sensitivity of 0.85, 0.81, and 0.71, respectively; specificity of 0.76, 0.71, and 0.75, respectively; PPV of 0.84, 0.80, and 0.79, respectively; NPV of 0.78, 0.73, and 0.66, respectively; and F1 score of 0.77, 0.72, and 0.71 respectively. The calibration curves showed that the fused model had better concordance between predicted and actual probabilities than the other models (Figure 3D). Comparison of the decision curves of the four models in the test set indicated that the fused model had greater clinical utility ( Figure 3E), and the IDI indicated that the predicted probabilities of the fused model were significantly improved compared to those of the other models ( Figure S7). A nomogram for preoperative prediction of HCC pathologic grade was established based on the fused model ( Figure 3F).
A total of 69 patients with CT data were included in the multiomics analysis. After data preprocessing, 19723 genomics, 42807 transcriptomics, and 3658 proteomics variables with differential expression between high-and low-grade HCC (valid data > 80%) were extracted. Pearson's correlation coefficients between radiomics features and multiomics variables are shown as correlation heat maps ( Figure 4A). The selected radiomics features reconstructed 65.54%, 64.65%, and 72.69% of the differentially expressed genes, transcripts, and proteins ( Figure 4B). The coverage of each type of -omics was 60% with just 15 radiomics features. The radiomics-related multiomics variables showed significant differences between the different pathologic grades (high vs. low grade) ( Figure 4C).  The results of the gene enrichment analysis of 25 radiomics features are summarized in Figure 4D. In the enrichment result for wavelet_LL_first-order_entropy, 21 GO terms and pathways were identified that are potentially related to HCC development. For example, wavelet_LL_first-order_entropy was associated with abnormal alcohol dehydrogenase activity, which leads to abnormal development and cell apoptosis. Key genes associated with original_shape2D_sphericity were related to the phosphatidylinositol 3-kinase (PI3K)/protein Matrices of cancer-related biological processes covered by radiomics features at specific -omics levels (upper); and details of GO terms and pathways (lower). (E) Bubble chart of 10 important GO terms and pathways correlated with wavelet_LL_firstorder_entropy used to establish the radiomics signature. The biological process of each GO term or pathway is shown on the x-axis. (F) Key genes (red) in the phosphatidylinositol 3-kinase (PI3K)/protein kinase B (AKT) signaling pathway were reconstructed with original_shape2D_sphericity, which was used to establish the radiomics signature kinase B (AKT) signaling pathway ( Figure 4F), which is involved in apoptosis, cancer cell proliferation, DNA repair, and cancer differentiation, among other biological processes.
In conclusion, we established a deep learning radiomics model that can be used for preoperative pathological grading of HCC and served as a noninvasive prediction tool to guide clinical decision-making.