Proteome‐based molecular subtyping and therapeutic target prediction in gastric cancer

Different molecular classifications for gastric cancer (GC) have been proposed based on multi‐omics platforms with the long‐term goal of improved precision treatment. However, the GC (phospho)proteome remains incompletely characterized, particularly at the level of tyrosine phosphorylation. In addition, previous multiomics‐based stratification of patient cohorts has lacked identification of corresponding cell line models and comprehensive validation of broad or subgroup‐selective therapeutic targets. To address these knowledge gaps, we applied a reverse approach, undertaking the most comprehensive (phospho)proteomic analysis of GC cell lines to date and cross‐validating this using publicly available data. Mass spectrometry (MS)‐based (phospho)proteomic and tyrosine phosphorylation datasets were subjected to individual or integrated clustering to identify subgroups that were subsequently characterized in terms of enriched molecular processes and pathways. Significant congruence was detected between cell line proteomic and specific patient‐derived transcriptomic subclassifications. Many protein kinases exhibiting ‘outlier’ expression or phosphorylation in the cell line dataset exhibited genomic aberrations in patient samples and association with poor prognosis, with casein kinase I isoform delta/epsilon (CSNK1D/E) being experimentally validated as potential therapeutic targets. Src family kinases were predicted to be commonly hyperactivated in GC cell lines, consistent with broad sensitivity to the next‐generation Src inhibitor eCF506. In addition, phosphoproteomic and integrative clustering segregated the cell lines into two subtypes, with epithelial–mesenchyme transition (EMT) and proliferation‐associated processes enriched in one, designated the EMT subtype, and metabolic pathways, cell–cell junctions, and the immune response dominating the features of the other, designated the metabolism subtype. Application of kinase activity prediction algorithms and interrogation of gene dependency and drug sensitivity databases predicted that the mechanistic target of rapamycin kinase (mTOR) and dual specificity mitogen‐activated protein kinase kinase 2 (MAP2K2) represented potential therapeutic targets for the EMT and metabolism subtypes, respectively, and this was confirmed using selective inhibitors. Overall, our study provides novel, in‐depth insights into GC proteomics, kinomics, and molecular taxonomy and reveals potential therapeutic targets that could provide the basis for precision treatments.


Correlation
Proteome cluster

Correlation
Proteome cluster

Cytoplasmic translation
Nucleosome assembly DNA packaging

Chromatin assembly
Ribonucleoprotein complex assembly

Cell division
Proteome cluster

Module royalblue 80 gene products
Proteome cluster z-score Ribonucleoprotein complex assembly L-leucine import across plasma membrane Module brown 306 gene products

Correlation
Proteome cluster

Mitochondrial transport
Oxidative phosphorylation

ATP metabolic process
Protein localization to lipid droplet Protein targeting to mitochondrion Endoplasmic reticulum to Golgi vesicle-mediated transport

Correlation
Proteome cluster

Intracellular protein transport
Regulation of protein localization to plasma membrane

Regulation of vesicle-mediated transport
Positive regulation of transport

Regulation of cell death
Endocytic recycling Module purple 77 gene products    z-score Negative regulation of stress fibber assembly z-score 0.4 0.6 0.8 1.0

Correlation
Module turquoise 523 gene prodcts mRNA processing RNA splicing

RNA destabilization
Nuclear-transcribed mRNA catabolic process RNA transport z-score   YPcluster signaling pathways between the integrative subgroups revealed by post-translational modifications signature enrichment analysis (PTM-SEA).Each column represents a single replicate of a particular cell line, while each row indicates the degree of enrichment for a particular pathway across all cell lines.The same gel was used for blotting FGFR2 (Figure S6d) as well as p-SRC, hence they share the same loading control.from the best fitting of NMF results.Both similarity within subtypes and the overall silhouette value reached 1.0.d Differential phosphorylation of mTOR_S2448, AKT_S473 and MAP2K1/2_S217/221 between the EMT and metabolism subgroups.Cell lysates were harvested from representative cell lines of the EMT subgroup (red) and metabolism subgroup (dark blue) and subjected to Western blotting as indicated.Data are representative of n = 2 independent experiments.The same gels were used for blotting ERBB2 (Figure S6d) and pAkt (Figure S8d), and FGFR2 (Figure S6d), p-SRC (Figure S7b) and p-mTOR (Figure S8d), hence they share the same loading controls.e IC50 values of BKM120 and MK2206 for inhibition of GC cell lines in the EMT and metabolism subgroups.The IC50 value was derived from dose response analysis using a MTS assay.

Figure S1 .
Figure S1.Phosphoproteomic data overview a Number of Ser/Thr/Tyr phosphosites identified across the cell line panel.b Number of Tyr phosphosites across the different GC cell lines.c Sample correlation analysis of the WP, PP and YP datasets.Cell lines with biological replicates were plotted together.Triplicates were used for the WP and PP datasets and duplicates for YP.Sample correlation heatmaps highlight the high consistency of the WP, PP and YP data (mean pearson score was 0.93, 0.82 and 0.72, respectively).

Figure S2 .
Figure S2.Subclassification based on WP and biological processes associated with each subcluster a Cophenetic correlation coefficient plots (which measure the stability of the clusters) show how the coefficients vary with different numbers of clusters (k = 2 to 5) and different numbers of the molecules selected using percentages (10 to 100%) of median absolute deviations (MADs).The highest cophenetic coefficient was observed for a cluster number of 3 with MAD of 10%.b Silhouette width score plot.Silhouette scores can range from −1 to +1, where a high value indicates that the object is well matched to its own subtype and poorly matched to Figure S3 Metagene-specific proteins and comparison of WP subclassification with other published taxonomies a Heatmap of the metagene matrix of WP subclassification.The most metagene-specific proteins were selected using the default method of NMF.Each column corresponds to a metagene for each WP cluster.Each row represents a protein.Rows were scaled from -1 to 1 according to the contribution of proteins to the corresponding metagene and ordered by hierarchical clustering.b Heatmap of the metagene composition in WP subclassification.The abundance of 42 metagene-specific proteins were scaled from -2 to 2 and plotted in the heatmap by row.GC cell line samples from three WP subgroups were plotted by column.c-f Comparison of WP subclassification with other GC molecular taxonomies.Comparison with TCGA subtype (c), KUGH subtype (d), Singapore subtype (e) and PUCH subtype classification (f).
Figure S4 Subclassification based on phosphoproteomics and tyrosine phosphorylation and biological processes associated with each subcluster a Cophenetic correlation coefficient plots for the PP dataset.b Silhouette width score plot from the best fitting of NMF results.c Differentially expressed phosphosites and functional annotation of the PP clusters.Left panel, Representative GO terms of each subcluster; Right panel, heatmap of phosphosites in each subcluster.d WGCNA and gene module functional annotation in the PP clusters of GC cell lines.Functional annotation of the five colour modules in two PP clusters.Left: the co-phosphorylation patterns of the phosphosites in the five modules.Middle: phosphorylation heatmap in the five modules across the cell line panel.Right: representative GO terms for each module.e Cophenetic correlation coefficients for the YP dataset.f Silhouette width score plot from the best fitting of NMF results.g Differentially regulated Tyr phosphosites and functional annotation between different YP clusters.Left panel, Representative GO terms associated with each phosphosite subcluster; Right panel, Phosphorylation heatmap of phosphosites in each subcluster.
Figure S5Functional annotation of phosphosites and Western blotting validation for the integrative subsets a Differential phosphosites annotated by PhosphoSitePlus database.The red scale bar represents the ratio of phosphosite level in one versus the other integrated subset.b Differential expression of specific markers between the integrative subgroups.Cell lysates were harvested from eight representative cell lines, including four from the EMT subgroup (red) and four from the metabolism subgroup (dark blue).The apoptosis inhibiting phosphosite RB1_T821 exhibited increased phosphorylation in the EMT subgroup.Data are representative of n = 2 independent experiments.c Differential autophagy between the integrative subgroups.Cell lysates were harvested from four representative cell lines, including two from the EMT subgroup (red) and two from the metabolism subgroup (dark blue).Enhanced autophagic activity in the EMT subgroup was indicated by increased accumulation of LC3-II following 4 hours of starvation and bafilomycin A1 (Bafil.)treatment, in comparison to the metabolism subgroup.Data are representative of n = 2 independent experiments.d Differentially enriched signaling pathways between the integrative subgroups revealed by post-translational modifications signature enrichment analysis (PTM-SEA).Each column represents a single replicate of a particular cell line, while each row indicates the degree of enrichment for a particular pathway across all cell lines.
Figure S6 Outlier kinases in the GC cell line panel and GC patient cohort a-c Outlier kinases of the WP (A), PP (B) and YP (C) datasets.Kinases with outlying high intensities are highlighted with the red shading.d Validation of expression of specific outliers across the cell line panel.Cell lysates were harvested from specific GC cell lines.Phosphorylated and/or total forms of CDK12, FGFR2 and ERBB2 were significantly elevated in specific GC cell lines as predicted by the MS analysis.Data are representative of n = 2 independent experiments.e MTS assay following selective kinase knockdown or inhibitor

Figure S7 aFigure S7 .
Figure S7 Figure S8 Kinomic subclassification and group specific vulnerabilities of GC cell line panel a Predicted differentially-activated kinases between the EMT and metabolism subgroups.The heatmap represents output obtained using the KSEA algorithm using normalized PP.Six activity matrices were used in total (see Methods).Heatmaps show kinases (rows) arranged by hierarchical clustering and kinase activity level for each kinase.Yellow represents high kinase activity whereas navy blue represents low activation level.Subclassification of the cell lines is shown at the top.b NMF classification using the WP kinome in GC cell lines.The heatmap presents the sample consensus obtained from NMF clustering, with the blue-to-red gradient denoting the consistency of the clustering results.The silhouette score of each sample after 100 times of NMF clustering is indicated with a green-white gradient.c.Silhouette width score plot treatment in the corresponding GC cell lines.Data are expressed relative to day 1, which was arbitrarily set at 1. Error bars represent the range of the mean from n=2 independent experiments.si: small interfering RNA; siOTP: ON-TARGETplus TM small interfering RNA.f Decreased phosphorylation at CSNK1D Ser384 in MKN28 cells after SR3029 treatment.Data represent the mean and variation of two independent MS-based phosphoproteomics experiments using DDA and DIA modes respectively.g CDK4/6 expression in the MKN28 and YCCEL1 cell lines.Protein expression data of CDK4/6 were extracted from WP data and compared between MKN28 and YCCEL1 cell lines.*, P<0.05; ***, P<0.001.h Genomic alterations in outlier kinases across 1512 GC patients.Outlier kinases with mutation or amplification in at least 5% of GC patients are depicted using Oncoprint by cBioPortal.Patients without mutation/amplification of these 32 outlier kinases were removed.Percentages on the left of the graph show the percentage of total DNA alterations in each outlier kinase across the GC cohort.i Kaplan-Meier survival analysis based on expression of the outlier kinase IGF1R at the mRNA level.j Kaplan-Meier survival analysis based on expression of the outlier kinase SRC at the protein level.