Identification and Characterization of Robust Hepatocellular Carcinoma Prognostic Subtypes Based on an Integrative Metabolite‐Protein Interaction Network

Abstract Metabolite‐protein interactions (MPIs) play key roles in cancer metabolism. However, our current knowledge about MPIs in cancers remains limited due to the complexity of cancer cells. Herein, the authors construct an integrative MPI network and propose a MPI network based hepatocellular carcinoma (HCC) subtyping and mechanism exploration workflow. Based on the expressions of hub proteins on the MPI network, two prognosis‐distinctive HCC subtypes are identified. Meanwhile, multiple interdependent features of the poor prognostic subtype are observed, including hypoxia, DNA hypermethylation of metabolic pathways, fatty acid accumulation, immune pathway up‐regulation, and exhausted T‐cell infiltration. Notably, the immune pathway up‐regulation is probably induced by accumulated unsaturated fatty acids which are predicted to interact with multiple immune regulators like SRC and TGFB1. Moreover, based on tumor microenvironment compositions, the poor prognostic subtype is further divided into two sub‐populations showing remarkable differences in metabolism. The subtyping shows a strong consistency across multiple HCC cohorts including early‐stage HCC. Overall, the authors redefine robust HCC prognosis subtypes and identify potential MPIs linking metabolism to immune regulations, thus promoting understanding and clinical applications about HCC metabolism heterogeneity.

. Clinical and molecular differences between the two MPI-based HCC subtypes. a-b) KM plot of the differential prognosis between the two subtypes in a GEO HCC dataset GSE14520 (a) and the other two cancer types KIRC and UCEC from TCGA (b). The two subtypes were determined by the MIPros-based PC1-PC2. For subtype C1, PC1-PC2 <0; for C2, PC1-PC2>0. c) Enriched pathways of each PPI network module among the top-ranked subtype-relevant genes. Each network module is named by "G_" plus one hub gene name. d) Enriched pathways of metabolites interacting with proteins from each PPI network module in (c). e) A sub-network of proteins with frequent interactions with the subtype-relevant genes in the PPI network. f) Volcano plot of the proteomic differences between the two TCGA-HCC subtypes. g) Enriched pathways of the up-regulated proteins (FDR<0.01, |FC|>0.2) identified in (f). Figure S3. Molecular and clinical differences between the two HCC subtypes. a) DNA methylation based pathway level alterations between the two TCGA-HCC subtypes. Colors represent pathway categories. b-c) Boxplots of the DNA methylation levels of enzymes involved in lipid (b) and glucose (c) metabolism. (d) The sub-network of linoleic acid in the directed MPI network. Proteins are colored by the fold changes (FCs) in their mRNA expressions between the C1 and C2 subtypes in TCGA-HCC. e) Barplot showing the ability of the subtype C1 in ICGC-JP to accumulate (NES > 0) or consume (NES < 0) metabolites in different metabolism pathways in comparison with the subtype C2. Figure S4. Metabolomics differences between the C1 and C2 subtypes identified in the GSE76279 HCC cohort. a) Volcanol plot of the metabolomics differences. The differences were tested by wilcox-test, unpaired, and the P values were adjusted by false discovery rate (FDR). Red and blue points represent the metabolites are significantly up or down regulated (FDR<0.05 and |FC|>2) in the C1 subtype. Grey points represent the differences are not significant. b) A MPI sub-graph of four metabolites involved in primary bile acid biosynthesis. After removing the reversible reactions, these four metabolites were only linked to an enzyme BAAT in the primary bile acid biosynthesis pathway, and their corresponding accumulation score deltaM were less than 0 (see also Methods, deltaM=deltaP-deltaS, since only one enzyme BAAT which can catalyze the production of these metabolites were left on the sub-graph, so deltaP = FC of BAAT, deltaS=0 here) Figure S5. Interplay between metabolism and immune regulations. a) High confidential MPI prediction results for IKBKE. b) PPI sub-network about the immune-relevant proteins up-regulated in C1. Proteins are colored by the network modules. c) Pathways enriched by the members in the SRC-centered network module. d) The expressional differences in the immune-relevant proteins up-regulated in the subtype C1of four independent HCC cohorts. Figure S6. Further separation of the poor prognosis subtype in HCC into two subtypes based on TME compositions. a) Barplot of the GSEA-based pathway differences between the S1 and S2 subtype with respect to the mRNA expressions. b-c) Comparison with the Hoshida (b) and iCluster (c) subtypes in the TCGA-HCC dataset. The subtyping information was obtained by the TCGAbiolinks R package. d-e) KMplots of the survival rate differences among the Hoshida (d) or iCluster (e) based subtypes. f) KMplots of the survival rate differences among three further divided subtypes in the other three HCC cohorts. g) KMplots of the progression differences among three further divided subtypes in two HCC cohorts. Figure S7. Validation of the two main HCC subtype features. a) Western blot analysis of HIF1α in SNU449 cells under normoxia or hypoxia conditions. Vinculin and GAPDH were used as loading controls. b) GSEA based immune system pathway differences between the SNU449 cells cultured under 6 hours of hypoxia and normoxia conditions. c) Heatmap of genes involved in Th1 and Th2 cell differentiation and showing higher expressions in the hypoxia SNU449 cells (T-test, one-sided). d) GSEA based metabolism or immune system pathway alterations of the HepG2 cell lines cultured after 12 hours of hypoxia treatment. e) Volcano plot showing the metabolite alterations of the SNU449 cell lines cultured after 6/24/48 hours of hypoxia treatment (T-test, two-sided, n= 6 for each condition).The red and blue points respectively represent up or down regulated genes in the hypoxia group.
Supporting Tables   Table S1. Details about the MPI network. Table S2. High-confidential MPI prediction results for proteins involved in immune system and up-regulated in C1.

Pathway information
Pathway information was obtained from KEGG. The KGML annotation files for all human pathways were parsed by the KEGGgraph R package, the genes as well as reactions in the pathways were extracted.
Meanwhile, we also annotated the categories of the pathways according to the pathway catalogues in KEGG (https://www.kegg.jp/kegg/pathway.html#pk).

Genome-wide analysis of the pathway level differences between the two HCC subtypes
The mRNA expression differences between the two HCC subtypes were examined by Wilcox-test (un-paired) and log2FCs in terms of the mean mRNA expressions were calculated. Then, genes were ranked by the log2FCs, and the ranked gene list was utilized as the input for GSEA-based pathway enrichment analysis for all KEGG pathways.

Prognosis analysis for HCC subtypes
The prognosis differences among different HCC subtypes were examined by the Peto & Peto modification of the Gehan-Wilcoxon test, and the survival curves were fitted by the KM model.

Gene mutation analysis
The downloaded gene mutation data was in .maf format. MutSigCV-based significances of the mutations in the TCGA-HCC samples were downloaded from cBioportal. The mutated genes were ranked by q value (from smallest to biggest) and the top-50 genes were examined to see whether the mutations showed significant enrichment in one of the two TCGA-HCC subtypes based on hypergeometric distribution.

Proteomics analysis
We examined the protein level differences between the two subtypes based on the RPPA data (Wilcox-test, un-paired) and calculated the FC (difference value) between the two subtypes in terms of the mean protein level. The significantly up/down regulated proteins were identified, and their enriched pathways were examined by hypergeometric distribution.

DNA methylation analysis
We examined the DNA methylation level difference between the two TCGA-HCC subtypes for each gene (Wilcox-test, un-paired) and calculated the FC between two subtypes in terms of the mean methylation level.
Then, genes were ranked by the FC, and the ranked gene list was utilized as the input for GSEA-based pathway enrichment analysis for all KEGG pathways.

Prediction of the two HCC subtypes in the other HCC cohorts
The genes were ranked based on their importance estimated by the random forest method in the part "Identification of subtype-relevant genes". Then, we try to construct subtype predictors based on the TCGA-HCC gene expression matrix of the top-ranked genes. Comparing the accuracies of the predictors based on different number of top-ranked genes (Table S3), we found that the top-30 based predictor performed the best. Consequently, we utilized the top30-based predictor to identify the subtypes.

MPI prediction
Step

Step 2. Feature description
Six network association features were calculated to describe the correlations between a metabolite a and a protein b.
The first two features called NS1 and NS2 were based on the MPI and PPI networks (G.MPI and G.PPI). NS1 estimated whether the metabolite and the protein can directly interact with the same proteins, while NS2 estimated whether the metabolite and the protein can interact with the same proteins through one intermediary protein.
where Nei(G,i) means the neighbors of node i in the graph G, & means the interaction between two sets, |•| means the number of items in the set.
where Nei k (G,i) means the k-th neighbor of node i, and twoStepL(k,b,G) means the number of two-step length paths between node k and node b in the network G.
NS3 examined whether the candidate protein P was similar with the known MIPros of the metabolite m considering the protein sequences. ( 3 where SeqSim means the protein sequence similarity (estimated by the twoSeqSim function in the protr R package [1] ) NC1 and NC2 were based on a two-layer heterogeneous network (G.H, undirected) which contained metabolite-metabolite association (the first layer), PPIs (the second layer) and MPIs (links between layers).
Two types of G.H were constructed, where in G.H1, metabolites involved in the same reactions were connected with each other, while in G.H2, structure similar metabolites were connected with each other. The metabolite structure similarity was calculated based on the Tanimoto coefficient between the atom pair descriptors of the metabolites (the ChemmineR package [2] ).
Where edgeC calculated the edge connectivity, i.e., the minimum number of edges needed to remove to cut all paths from a to b on the network G.H1.
The last feature GS evaluated the GO-based functional similarity.
Where GOSim(i,j) calculated the sematic similarity (by GoSemSim R package [3] ) between GO items of two proteins.
Notably, when calculating the features for a reported MPI, this MPI was removed temporarily from the MPI network, thus evading the over-representation of positive cases.

Step3. MPI prediction model
The six dimensional features were calculated for metabolite-protein pairs in both positive and negative datasets. Next, 10% of both the positive and negative datasets were retained for an independent validation, and the left 90% were utilized to train a MPI prediction model. We employed RF to train the MPI prediction model by the randomForest R package. [4] Finally, we used the trained RF model to predict positive probabilities for samples in the validation dataset, and evaluated the prediction performance by ROC.

CCLE data analysis
We downloaded the mRNA expression and drug response data of different cancer cell lines from CCLE (https://portals.broadinstitute.org/ccle/data), and we extracted the data for liver cancer cell lines. Then, we utilized GL1 to classify the cell lines into the corresponding subtypes, and compared the drug sensitivity between the two subtypes for each drug.

Transcriptomics profiling of cell lines
The SNU449 cell lines treated under normoxia and hypoxia for 6 hours were collected and their RNAs were isolated using TRIzol reagent. Sequencing libraries were constructed from total RNA using SMART-RNAseq Library Prep Kit (Hangzhou KaiTai, AT4201). In briefly, the mRNAs were isolated from total RNA with Sera-Mag Magnetic Olido (dT) particles, and then chemically fragmented. The fragmented RNA was reverse-transcribed into cDNA using random primer containing a tagging sequence at their 3'ends.

Protein extraction and immunoblotting
For HIF-1α detection, culture medium was quickly removed and cells were immediately quenched in liquid nitrogen. Subsequently, cells were quickly scraped into tubes with 1× SDS-PAGE sample loading buffer.

Lipid extraction and fatty acids analysis
Cells used for fatty acids analysis were quickly washed with 5% D-mannitol-water (w/v) solution.
Subsequently, cells were immediately quenched in liquid nitrogen. Extraction solvent was prepared by adding FFA standards to methanol, including d3-hexadecanoic acid (C16:0) and d3-octadecanoic acid (C18:0). Liquid-liquid extraction was performed as described previously [5] with slight modifications. In brief, cells were quickly scraped into EP tubes with 1 mL extraction solvent, followed by vortexing for 30 s. Then, 1 mL chloroform was added into the tubes and vortexed for 30 s, followed by adding 400 µL ultrapure water and vortexing for 30 s. After letting samples sit for 10 min at room temperature, the mixture was centrifuged for 15 min at 13000 × g, 4 °C. 600 µL lower phase was transferred for vacuum drying and stored at −80°C until analysis.