SWATH‐based proteomics reveals processes associated with immune evasion and metastasis in poor prognosis colorectal tumours

Abstract Newly emerged proteomic methodologies, particularly data‐independent acquisition (DIA) analysis–related approaches, would improve current gene expression–based classifications of colorectal cancer (CRC). Therefore, this study was aimed to identify protein expression signatures using SWATH‐MS DIA and targeted data extraction, to aid in the classification of molecular subtypes of CRC and advance in the diagnosis and development of new drugs. For this purpose, 40 human CRC samples and 7 samples of healthy tissue were subjected to proteomic and bioinformatic analysis. The proteomic analysis identified three different molecular CRC subtypes: P1, P2 and P3. Significantly, P3 subtype showed high agreement with the mesenchymal/stem‐like subtype defined by gene expression signatures and characterized by poor prognosis and survival. The P3 subtype was characterized by decreased expression of ribosomal proteins, the spliceosome, and histone deacetylase 2, as well as increased expression of osteopontin, SERPINA 1 and SERPINA 3, and proteins involved in wound healing, acute inflammation and complement pathway. This was also confirmed by immunodetection and gene expression analyses. Our results show that these tumours are characterized by altered expression of proteins involved in biological processes associated with immune evasion and metastasis, suggesting new therapeutic options in the treatment of this aggressive type of CRC.


| INTRODUC TI ON
Globally, colorectal cancer (CRC) is the third most commonly diagnosed cancer in men and the second in women, with two million new cases and around 800 000 deaths in 2018. 1 Although CRC mortality has declined slightly over the past two decades, and despite advances in detection and surgical management, metastatic CRC (mCRC) is associated with poor prognosis, with 5-year survival rates with a range from 5% to 8%. 2 CRC is often classified into different phenotypes according to genetic alterations, including chromosomal instability (CIN) and microsatellite instability (MSI), and epigenetic changes such as silencing of genes due to methylation of the CpG islands (CIM) located in their promoters. 3 In addition, recent gene expression-based studies have been reported to classify CRC patients into distinct molecular subtypes. [4][5][6] More recently, the analysis of the interrelation of the proposed CRC subtypes has provided evidence for the existence of consensus molecular subtypes. 7 Interestingly, a stem-like/mesenchymal subtype associated with poor patient outcome was identified in all these gene-based classifications. 8 These molecular classifications can therefore identify different CRC phenotypes, which confer a different behaviour from the point of view of prognosis or response to specific treatments. This complexity is highly relevant for the development of more effective clinical treatments, through the identification of novel, more specific therapeutic targets and biomarkers that determine different phenotypes, which will lead to an individualized treatment for patients with CRC.
Proteomics could expand our current knowledge on abnormal processes among different CRC subtypes identifying new proteins or protein profiles to improve current classifications. In addition, proteomics could provide biomarkers that define differences in resistance to therapy, prognosis and metastatic spread in a specific subtype. Mass spectrometry (MS)-based technologies are powerful tools for the investigation of biomarkers in clinical samples, 9 and SWATH-MS (Sequential Window Acquisition of all THeoretical Mass Spectra-MS) is an innovative MS approach that allows the quantification of almost all peptides and proteins present in a single sample being useful in large sample cohort studies. 10,11 Therefore, the aim of our study was to identify and quantify differential expression of proteins in clinical samples of CRC using the SWATH-MS approach of proteomic analysis and to establish potential subtypes of CRC based on differential expression of proteins.

| Subjects
Forty patients over 18 years of age with resectable colorectal cancer submitted to surgery in Reina Sofía University Hospital (Córdoba, Spain) were included in this study (Table S1). In addition, 7 samples of healthy tissue from these patients (5 men and 2 women; mean age 67 ± 5 year) were used as controls. An independent cohort of 45 FFPE tumour samples was retrospectively analysed to confirm the association between CMS subtype and the SRSF3 and SERPINA1 expression (Table S2). The study protocol was approved by the Ethics Committee of the Reina Sofia University Hospital, according to the Code of Ethics of the World Medical Association (Declaration of Helsinki), and signed informed written consent was obtained from each patient.

| Sample preparation and protein extraction
Fresh tissue samples were washed with phosphate-buffered saline (PBS) at pH 7.4 (Sigma-Aldrich), being directly frozen in liquid nitrogen and stored at −80°C until further use. Protein lysates from fresh-frozen colorectal tissues were obtained by mechanical disruption of 100 mg of tissue by homogenizer pestle using a sample grinding kit (GE Healthcare) in 300 μL of lysis buffer (20 mmol/L Tris-HCl, pH 7.6; 0.5 M sucrose; 0.15 M KCl; dithiothreitol; PMSF; and 1 x anti-protease cocktail from Sigma-Aldrich), incubating on ice for 10 minutes and centrifugation at 25 000 × g for 15 minutes at 4°C. Supernatants were collected, and total protein was quantified using the Q ubit Protein Assay Kit (Thermo Fisher Scientific).

| Sample preparation for LC-MS analysis
Protein precipitation with trichloroacetic acid (TCA)/acetone was carried out to remove contaminants from the samples. Thus, TCA was added to the samples to a final concentration of 10% and was incubated on ice for 30 minutes. Then, precipitated proteins were collected by centrifugation for 30 minutes at 8000 × g and 4°C.
Pellets were resuspended with ice-cold acetone and were incubated at −20°C overnight. After performing a second centrifugation step, protein was solubilized in 50 μL of 0.2% RapiGest SF (Waters) with 50 mmol/L ammonium bicarbonate. Qubit Protein Assay Kit allowed to measure total protein, and 50 μg of protein was trypsindigested. 12

| Creation of the spectral library
In SWATH pipeline, chromatogram traces for the peptide fragment ions are matched to the peptides and proteins contained in a peptide spectral library, and then, the fragment chromatogram traces are used for peptide and protein quantitation. SWATH method heavily relies on the peptide spectral library, which is previously established by shotgun proteomic (data-dependent acquisition, DDA, runs) analysis of the same pooled samples. Therefore, the 47 samples were mixed in 9 pools, to maximize the number of peptides and proteins contained in the spectral library, which were analysed by LC-MS/MS for DDA massive protein identification (shotgun proteomics). In this way, all samples are represented in the DDA runs used in the database search used for building the spectral library.
Identification of peptides and proteins was carried out using the Protein Pilot software (version 5.0.1; Sciex) with a human Swiss-Prot database (March 2016). False discovery rate (FDR), calculated by Protein Pilot using the target-decoy database approach, was set at 0.01 for both peptides and proteins. MS/MS spectra of peptides were next used to create the spectral library for SWATH peak extraction using the PeakView software (version 2.1; Sciex) using MS/MSALL with SWATH Acquisition MicroApp (version 2.0, Sciex). Peptides with greater than 99% confidence interval were added to the spectral library.

| Relative quantification by SWATH acquisition
Forty-seven samples of colonic tissue (40 tumours and 7 healthy tissue) were evaluated using an independent data acquisition (DIA) method. Samples were analysed by LC-MS as described above to construct the spectral library but using a SWATH-MS acquisition method. SWATH method comprised a TOF MS (350-1250 m/z, acquisition time 50 ms) followed by 50 windows of variable size (230-1500 m/z, with acquisition time of 90 ms) with a minimum size of 5 m/z. SWATH variable window calculator from Sciex was used to adjust the window width of these variables to ion density.

| SWATH-MS data analysis
Data extraction from SWATH runs was carried out by PeakView using MS/MSALL with SWATH Acquisition MicroApp, resulting in a library containing 2915 proteins. Peptide retention times for each protein were realigned in each run according to indexed retention time (iRT) peptides (Biognosys AG, Schlieren/Zürich, Switzerland).
Chromatograms of the extracted ions were created for each selected ionic fragment. PeakView calculated a score and FDR for each assigned peptide using chromatographic and spectral components. MarkerView (version 1.2.1; Sciex) allowed signal normalization, and a t test was applied for testing differential abundance.

| Gene expression-based classification into colon cancer subtypes
Total RNA was extracted from samples using RNeasy Mini Kit (Quiagen) following the manufacturer's recommendations. We  (Table S2). The nSolver software (NanoString Technologies) was used for data analysis. Complement and immune-related gene expression in tumour subtypes was analysed by using

| Immunohistochemical analysis
Immunohistochemical (IHC) staining was performed incubating 4 µm FFPE sections in 10 mmol/L citrate buffer (pH 6.0) at 120°C for 5 minutes for antigen retrieval. Endogenous peroxidase was neu- Individual cores were scored by trained pathologists (CVP and SGL). involved. Then, a cut-off (P-value < .05) was used to select the significant biological/processes pathways genes/proteins. STRING database (http://string-db.org) was also used to assess proteinprotein interactions.

| SWATH-based proteomic analysis
A total of 40 human adenocarcinoma samples (see Table S1 for Clinical and pathological characteristics) and 7 samples of healthy tissue were analysed. Samples of colonic tissue were grouped in 9 pools and subjected to the shotgun proteomic analysis to construct the spectral library, as described in the Materials and Methods section. As a result, after integrating all nine data sets, 3080 proteins were identified (Table S4), with a FDR < 1% for both protein and peptide levels. We quantified 2752 proteins across all 47 samples in the SWATH-MS analysis, with a FDR threshold of 5%. Table S5 lists the quantification values for these 2752 proteins.

| Differentially expressed proteins in SWATH-MS completely discriminate between CRC tissues and the normal tissues
As shown in Figure 1, protein expression profiles revealed by SWATH-MS completely separate CRC tissues from the normal tissues. The unsupervised hierarchical clustering analysis demonstrated a clear discrimination between these two groups of samples with different protein expression patterns ( Figure 1A). Moreover, principal component analysis (PCA), which is another unsupervised method, confirmed that tumour and healthy tissues were clearly distinguished using the quantitative protein expression data obtained by SWATH-MS ( Figure 1B).

| SWATH-MS analysis identifies three molecular subgroups of CRC
The unsupervised clustering analysis of SWATH-MS data from tumour samples also revealed that 3 subgroups of CRC, denominated P1, P2 and P3, could be differentiated ( Figure 2). These proteomic that has been reported as a subset of the CCS1 subtype. 6 The P2 subgroup showed a more heterogeneous pattern but contained a majority of samples from the CCS2 subgroup, which is related to microsatellite instability (MSI) 5 and also most of goblet-like tumours, 4 that has been reported as a subset of the CCS2 subtype. 6 Finally, a third proteomic subtype (P3) was clearly differentiated from both P1 and P2 subtypes.
Importantly, this subtype included a majority of CCS3 tumours 5 that also were classified as stem-like tumours according to the classification of Sadanandam et al 4 This stem-like/mesenchymal CRC subtype is a distinct set of highly aggressive CRC tumours associated with poor patient outcome. Remarkably, this P3/mesenchymal/stem-like subgroup was associated with a significant lower 3-year overall survival rate when compared with P1 or P2 subgroups ( Figure S1A). However, this association was not observed comparing gene expression classifiers ( Figure S1B and S1C) adding value for risk stratification.

| The P3/mesenchymal/stem-like subgroup shows a distinct protein expression pattern compared to the other proteomic CRC subtypes
Proteomic expression profile of the mesenchymal/stem-like (P3) subgroup was analysed and compared with the rest of subgroups (P1 and P2). Due to the large number of proteins obtained by MS, a Volcano diagram was first made to select those proteins with statistically significant differential expression and no less than 2 fold change ( Figure 3). As a result, 186 proteins were found in the P3 subtype with increased expression, compared to P1 and P2, and 379 proteins with decreased expression.
In order to analyse most significant proteins in reference to their expression, we decided to consider the top 50 proteins showing the largest expression differences between P3 and both P1 and P2 subtypes ( Figure 4). As it can be observed, there was a clearly different protein expression pattern comparing the P3 subtype with the rest of subtypes. Of these top 50 proteins, 30 were clearly up-regulated in the P3 subtype, whereas 20 were markedly down-regulated, compared with both P1 and P2 subtypes (Table 1).

| Biological processes and pathways altered in the P3/mesenchymal/stem-like subtype of CRC
Gene Ontology enrichment analysis was carried out ( Figure S2).
As a result of this comparison of P3 subtype with both P1 and P2 F I G U R E 3 Volcano diagram of proteins with significant differential expression comparing P3 with the rest of proteomic subtypes in CRC. Volcano diagram resulted from comparison of subtypes P3 vs P1 and P2. Proteins are separated according to the log 2 of the fold change (xaxis) and the -log 10 of the P-values based on a two-tailed t test (y-axis). A total of 186 proteins (green dots) were found with increased expression in the P3 subtype, compared to P1 and P2, and 379 proteins (red dots) with decreased expression (Pvalue <.05; FC ≥ 2 or FC ≤ 0.5) subtypes, we found key biological processes, molecular functions, cellular components such as ribosome and molecular pathways related to spliceosome among others ( Figure S2).  was down-regulated in this type of tumours. Human osteopontin is subject to alternative splicing, and the molecular size of this protein is known to be variable ranging between 41 and 75 kD. 15 As illustrated in Figure 6, subtypes ( Figure 6C). Finally, to explore whether there is also an association between our proteomic subtypes and CMS subtyping, 16 we made a retrospective analysis of 45 FFPE CRC tumour samples (Table S2) and, after their classification into CMS subtypes by IHC as described by Trinh et al, 14 the expression of SERPINA1 and SRSF3 was evaluated by IHC in each of CMS1, CMS2/3 and CMS4 subtypes ( Figure S4). As expected, these analyses confirmed in CMS4 subtype the higher expression of SERPINA 1, whereas the expression of SRSF3 (spliceosome) was down-regulated in this type of tumours.
Therefore, the analysis of this additional cohort of tumour samples also indicated the association between proteomic subtyping and CMS classification. The above results validate the proteomic analysis and suggest novel biomarkers that can be useful tools for the molecular classification of CRC and also for the development of new therapeutic strategies in the mesenchymal CRC subtype of worse prognosis. Furthermore, our analyses demonstrate the added value of our proteomic data set, relative to published gene expression data sets generated for CRC.

| D ISCUSS I ON
In this study, the proteomic analysis of 47 samples of colorectal tissue allowed, first, to distinguish tumours from samples of healthy tissue used as a control and, second, to establish a classification of tumours based on their protein expression profiles. Our SWATH-MS data from tumour samples revealed 3 subgroups of CRC, denominated P1, P2 and P3. The P1 subtype was highly similar to CCS1 and TA subtypes that have been associated with an epithelial gene signature and better prognosis. 6 On the other hand, the P2 subgroup was more heterogeneous in term of molecular characteristics, and included tumours from the CCS2 subgroup and also most of goblet-like tumours, suggesting that this proteomic subgroup may comprise MSI CRC tumours. 6 Importantly, the unsupervised classification based on the proteomic profiles confirmed in our cohort a subgroup of mesenchymal tumours (P3)/stem-like, equivalent to that characterized by the expression of mesenchymal and stem genes, which is also associated with a poor prognosis and low patient survival. 16  Recent studies indicate that ribosome-independent functions may be involved in various physiological and pathological processes,  including tumorigenesis or tumour suppression. [25][26][27] In this study, we observed in P3 tumours a decreased expression of ribosomal proteins involved in tumour suppression, such as RPL6, 27 RPL23, 28,29 RPL26, 30 RPS3, 31 RPS14, 32 RPS15 and RPS20, 33 RPS25, 34 RPS26 35 and RPS27L. 36 In addition, elevated RPS27L expression in tumour has been related to a better prognosis in CRC patients. 37 Interestingly, RPL13A has been identified as a negative regulator of inflammatory proteins, suggesting that this ribosomal protein could be a repressor of inflammatory signalling. 38 Inflammatory response plays an essential role during tumorigenesis, and prolonged expression of inflammatory genes promotes tumour progression. Therefore, and in agreement with the tumour suppressive function of ribosomal proteins, RPL13A not only protects host tissues from inflammatory injury, but also prevents cancerous growth of the inflamed cells. 38 Accordingly, in our study P3 tumours showed a decreased expression of RPL13A and an increased expression of proteins related to the acute inflammatory response.
Splicing process is commonly deregulated in cancer, resulting in non-functional end products. 39 The results of the present study indicate that the P3/mesenchymal/stem-like subtype of CRC is characterized by an overall decrease in the expression of spliceosomal proteins. Notably, studies on differential splicing events among tumours support transcriptome instability as a molecular characteristic of CRC. 40,41 Furthermore, a strong inverse correlation was  46 Significantly, the mesenchymal/stem-like subtype defined by gene expression signatures is also characterized by a marked overexpression of genes involved in complement-related signalling. 16,47,48 Our results also support that novel immu-  51 Notably, OPN appears to play a main role in the mechanisms deployed by tumours to evade immune recognition by participating in the crosstalk between cancer cells and the host microenvironment. Furthermore, the human OPN transcript is subject to alternative splicing, and the expression patterns of splicing factors dictate the major OPN splicing isoform in a specific pathological condition. 15 In this regard, the altered expression of immunodetected OPN protein bands in P3 tumours may be related to the altered expression of spliceosome proteins in these tumours. Therefore, the results of the present study support the hypothesis that an altered OPN expression in P3/mesenchymal/stem-like subtype CRC could promote invasion and metastasis, being responsible for the poor prognosis and low survival in these patients.
On the other hand, serpins play a key role in the maintenance of cellular homeostasis. They are known to be irreversible suicide inhibitors of proteases, but they can also participate in critical proteolytic pathways such as blood coagulation (SERPINA1, SERPINA5, SERPINA8, SERPINA10), tissue remodelling (SERPINA1, SERPINA3), angiogenesis (SERPINAC1), inflammation, apoptosis and tumour metastasis (SERPINA1, SERPINA3, SERPINA4, SERPINAC1). 52,53 High levels of SERPINA1 are associated with inflammatory bowel disease and CRC progression. 54 These studies reinforce the notion that SERPINA1 is associated with tumour invasion and could be a useful protein marker for CRC diagnosis. In addition, this protein is related to tumour aggressiveness, local spread and capacity to produce metastases. 55 P3 subtype was further characterized by a marked decrease in the expression of the histone deacetylase 2 protein (HDAC2), compared to P1 and P2 subtypes. HDACs play an important role in epigenetic regulation of transcription by removing the acetyl group from histones and promoting chromatin compaction. 56 Significantly, recent research indicates that HDAC inhibitors are capable of inducing EMT in colon carcinoma cells. 57 Furthermore, HDAC inhibitors exert immune suppressive effects. 58 Although HDACs repress gene transcription by deacetylating lysine residues of histone proteins, they also remove acetyl groups from nonhistone proteins and modulate their activity. 59 The down-regulated levels HDAC2 expression in P3 CRC subtype may therefore contribute to the immunosuppressive mechanisms deployed by these tumours.
In summary, SWATH technology allows distinguishing different molecular subtypes of CRC. Significantly, differential protein expression has allowed to identify a subgroup of tumours similar to the mesenchymal/stem-like subtype defined by gene expression signatures. Our results show that these tumours are characterized by alterations in the expression of proteins involved in processes and signalling pathways that are key determinants in the crosstalk between cancer cells and tumour microenvironment, modulating immune evasion and the metastasis process. This proteomic analysis hence suggests new therapeutic targets for the treatment of this particularly aggressive type of CRC.

ACK N OWLED G EM ENTS
We greatly acknowledge the financial support of Instituto de Salud Carlos III through the project PI16/01508 granted to AR-A. (Plan Estatal de I+D+I 2013-2016 and co-funded by ISCIII-Subdirección General de Evaluación y Fomento de la Investigación and European Regional Development Fund/European Social Fund, 'Investing in your future'). We are also grateful to the Merck-Health Foundation