Data‐independent acquisition and quantification of extracellular matrix from human lung in chronic inflammation‐associated carcinomas

Abstract Early events associated with chronic inflammation and cancer involve significant remodeling of the extracellular matrix (ECM), which greatly affects its composition and functional properties. Using lung squamous cell carcinoma (LSCC), a chronic inflammation‐associated cancer (CIAC), we optimized a robust proteomic pipeline to discover potential biomarker signatures and protein changes specifically in the stroma. We combined ECM enrichment from fresh human tissues, data‐independent acquisition (DIA) strategies, and stringent statistical processing to analyze “Tumor” and matched adjacent histologically normal (“Matched Normal”) tissues from patients with LSCC. Overall, 1802 protein groups were quantified with at least two unique peptides, and 56% of those proteins were annotated as “extracellular.” Confirming dramatic ECM remodeling during CIAC progression, 529 proteins were significantly altered in the “Tumor” compared to “Matched Normal” tissues. The signature was typified by a coordinated loss of basement membrane proteins and small leucine‐rich proteins. The dramatic increase in the stromal levels of SERPINH1/heat shock protein 47, that was discovered using our ECM proteomic pipeline, was validated by immunohistochemistry (IHC) of “Tumor” and “Matched Normal” tissues, obtained from an independent cohort of LSCC patients. This integrated workflow provided novel insights into ECM remodeling during CIAC progression, and identified potential biomarker signatures and future therapeutic targets.


INTRODUCTION
Chronic inflammation-associated cancers (CIACs) account for one in four cancers worldwide and are responsible for more than 2 million deaths annually [1,2]. Chronic inflammation can be caused by diverse biological, chemical and physical factors [3]. Cigarette smoke is one significant risk factor, that can promote lung cancers [3][4][5], including lung squamous cell carcinoma (LSCC) [6]. LSCC, a subtype of nonsmall cell lung cancer (NSCLC), accounts for about a third of all lung cancers. LSCC arises in the epithelial cells lining the bronchi, and it progresses through squamous metaplasia and dysplasia [7]. Interestingly, at the site of chronic injury, inflammation can favor cell plasticity and lead to a remodeling of the tissue microenvironment by altering stromal and extracellular matrix (ECM) homeostasis, which in turn can promote a malignant fate through poorly understood molecular mechanisms [1]. Additionally, the dynamic ECM remodeling can subsequently alter not only ECM composition and stiffness, but also initiate a cascade of biochemical and biophysical cues that affect, in turn, cell signaling. Ultimately, the ECM plays a key role in promoting tumor proliferation, invasion and metastasis [8][9][10], representing a crucial and promising intervention target for therapies: Could "repair" of the ECM be a therapeutic intervention?
To enable in-depth ECM proteome characterization, Naba et al.
pioneered the integration of proteomic and bioinformatic datasets to generate a database of ECM and ECM-associated proteins, referred to as the matrisome [11,12]. The core matrisome includes collagens, ECM glycoproteins and proteoglycans, whereas the matrisome-associated  [13].
Advances in MS-based proteomics enable great opportunities to investigate ECM proteome remodeling in cancers and to identify novel protein biomarkers and therapeutic targets. In recent studies, aimed at uncovering ECM remodeling in various human cancers, datadependent acquisition (DDA) label-free quantification approaches were employed to investigate glioblastoma and medulloblastoma [14], as well as gastric antrum adenocarcinoma [15]. DDA-tandem mass tag (TMT)-based quantification workflows were applied to analyze ECM from human pancreatic ductal adenocarcinoma [16] and to investigate metastasis in various mouse models of triple-negative mammary carcinoma [17,18]. Finally, Naba et al. used isobaric tags for relative and absolute quantitation (iTRAQ)-based DDA quantification to analyze pancreatic islet ECM from a mouse model of insulinoma [19]. However, the semi-stochastic sampling and selection of precursor ions for MS/MS in DDA mode, in which the most abundant ions are selected for fragmentation during any given scan cycle, often lead to missing values and reproducibility challenges. For isobaric stable isotope labeling strategies, such as iTRAQ [20] and TMT [21], the labeled samples are typically pooled before DDA-MS acquisitions, implying that all samples are preferably collected and processed simultaneously, which is challenging for actively ongoing human studies, such as with prospec-

Statement of Significance of the Study
The extracellular matrix (ECM) is a complex scaffolding network composed of glycoproteins, proteoglycans and collagens, which binds soluble factors and, most importantly, significantly impacts cell fate and function. Alterations of ECM homeostasis create a microenvironment promoting tumor formation and progression, therefore deciphering molecular details of aberrant ECM remodeling is essential.
Here, we present a multi-laboratory and refined proteomic workflow, featuring i) the prospective collection of tumor and matched histologically normal tissues from patients with lung squamous cell carcinoma (LSCC), ii) the enrichment for ECM proteins, and iii) subsequent label-free data-  [22][23][24].
Alternatively, label-free data-independent acquisition (DIA) strategies performed on high-resolution, accurate-mass instruments represent a powerful tool to quantify ECM proteins across disease stages in prospective clinical cohorts, such as the one studied here. DIA relies on the systematic acquisition of MS/MS spectra for all detectable peptides contained in wide m/z isolation windows [25,26]. Generated DIA MS/MS spectra are then interrogated using dedicated data processing strategies [27,28], that typically rely on tissue-specific spectral libraries [29,30], pan-species spectral libraries [31,32] or library-free workflows, such as DIA-Umpire [33] or DIA-NN [34]. In this study, we are using an algorithm called directDIA, which is embedded within the Spectronaut software (Biognosys). In recent years, considerable efforts have been made to improve software algorithms [34][35][36], allowing to mine DIA data in-depth. DIA provides comprehensive and deep profiling of the proteome with highly reproducible and accurate quantification performances [37][38][39]. Numerous cancer and pre-clinical studies have been performed using DIA approaches [40,41]. Krasny et al. reported the first application of the DIA/SWATH methodology to profile mouse liver and mouse lung matrisomes, and benchmarked the performances of DIA/SWATH versus DDA [42].
The authors reported that DIA/SWATH achieved 54% more matrisomal protein identification and improved reproducibility performances compared to DDA-based analysis.
In this study, we present an efficient and robust multi-site workflow combining prospective collection of fresh human LSCC "Tumor" and matched adjacent histologically normal ("Matched Normal") tissue specimens from 10 cancer patients, ECM enrichment at UCSF, label-free comprehensive DIA quantification and stringent statistical processing at the Buck Institute, and finally candidate verification by immunofluorescence-based immunohistochemistry (IHC) at UCSF ( Figure 1). This workflow was applied to decipher ECM proteome remodeling in LSCC, thus allowing us to gain deeper mechanistic insights into how altered ECM can promote tumorigenesis, and to identify potential ECM targets whose modulation may restore ECM homeostasis and a microenvironment less permissive for malignancy. Bridge to Life Ltd., Northbrook, IL) to UCSF where they were processed for ECM proteomic analysis as described below. Information about each tissue specimen and patient, referred to as L01, L02, L03 . . . , and L10, is provided in Table S1.

Samples for immunohistochemistry validation
Formalin-fixed paraffin-embedded (FFPE) "Tumor" and "Matched Normal" lung specimens used for IHC were provided by CHTN or by the Department of Pathology at UCSF under human subject protocol 10-01532 (UCSF). Information about each tissue specimen is provided in  (Table S1). and an anti-actin antibody for cytoskeleton (Sigma, #A5441) ( Figure S1). The remaining ECM preparation was stored at -80 • C for further quantitative proteomic analysis.

Solubilization of ECM proteins
The extracted ECM pellets were solubilized by agitation for 10  ; proteins were solubilized, in-gel digested with Lys-C and trypsin, and extracted proteolytic peptides were de-glycosylated with PNGase F. All resulting samples were analyzed in duplicate on a nanoLC-TripleTOF 6600 system (QqTOF) operated in data-independent acquisition (DIA) mode, and data were processed with Spectronaut (Biognosys). Finally, candidates were validated on independent cohorts by immunofluorescence-based immunohistochemistry by the Tlsty team dodecyl sulfate (LDS) sample buffer (Life Technologies, Carlsbad, CA), followed by sonication for 10 min, and finally heating at 85 • C for 1 h with agitation.

Mass spectrometric analysis
LC-MS/MS analyses were performed on an Eksigent Ultra Plus nano-LC 2D HPLC system (Dublin, CA) combined with a cHiPLC system directly connected to an orthogonal quadrupole time-of-flight (Q-TOF) SCIEX TripleTOF 6600 mass spectrometer (SCIEX, Redwood City, CA). MS2 spectra were collected in "high-sensitivity" mode. The collision energy (CE) for each segment was based on the z = 2+ precursor ion centered within the window with a CE spread of 10 or 15 eV.

DIA data processing with Spectronaut
All DIA data were processed in Spectronaut version 14.10.201222.47784 (Biognosys) using a pan-human library that provides quantitative DIA assays for 10,316 human proteins [31].
Data extraction parameters were selected as dynamic, and non-linear iRT calibration with precision iRT was selected. Identification was performed using a 1% precursor and protein q-value, and iRT profiling was selected. Quantification was based on the MS/MS peak area of the 3-6 best fragment ions per precursor ion, peptide abundances were obtained by summing precursor abundances and protein abundances by summing peptide abundances. Interference correction was selected and local normalization was applied. Differential protein abundance analysis was performed using paired t-test, and p-values were corrected for multiple testing, specifically applying group-wise testing corrections using the Storey method [44]. For differential analysis, very stringent criteria were applied: protein groups with at least two unique peptides, q-value ≤ 0.001, and absolute Log2(fold-change) ≥ 0.58 were considered to be significantly altered (Table S3).

Bioinformatic analysis
The Pearson coefficients of correlation were determined between the different replicates using the cor() function of the stats package in R (version 4.0.2; RStudio, version 1.3.1093) and the abundances of all 1802 quantifiable protein groups as input. Violin plots were generated using the ggplot2 package [45]. Partial least square-discriminant analysis (PLS-DA) of the proteomics data was performed using the package mixOmics [46] in R. An over-representation analysis was performed using ConsensusPathDB-human (Release 35, 05.06.2021) [47,48] to determine which gene ontology (GO) terms were significantly enriched. The significantly 327 up-and 202 down-regulated protein groups were used as inputs and all quantified 1802 protein groups were used as customized background proteome (Table S3). GO terms identified from the over-representation analysis were subjected to the following filters: q-value <0.01 and term level ≥4. Dot plots were generated using the ggplot2 package [45] in R.

Efficient proteomic workflow for human lung extracellular matrix
To decipher changes that occur in the ECM of human chronic inflammation-associated lung squamous cell carcinoma, a refined multi-site experimental workflow was implemented, combining: i) fresh human tissue collection immediately after resection and pathol-  Table S1. Fresh and never-frozen tissue specimens stored in cold UW solution were sent to USCF for further ECM enrichment by sequential fractionation based on solubility. The quality of the ECM protein enrichment was assessed using Western blotting assays by examining the abundance of representative proteins for specific cellular compartments/fractions: collagen I for the ECM fraction, β1 integrin for the membrane fraction, heterogeneous nuclear ribonucleoprotein  (Table S2). More specifically, each scan cycle was composed of one full range MS scan (m/z 400-1250) and 64 MS/MS scans with isolation windows ranging between 5.9 m/z and 90.9 m/z, with smaller windows in highly populated m/z regions and wider windows in less populated m/z regions [25,37,38]. As a result, DIA MS/MS spectrum complexity is reduced and analyte specificity is increased. Collected DIA data were analyzed using a pan-human spectral library [31]. Although this publicly available pan-human library was  [50]. (C) Average protein abundance relative to the total protein abundance of the matrisomal (colored) and non-matrisomal (grey) protein groups. Abundance is based on the MS/MS peak area of the 3-6 best fragment ions per precursor ion. Protein abundances were obtained with summing peptide/precursor abundances as described in the Methods section. (D) Violin plots of the Pearson coefficients of correlation between the "Matched Normal" or "Tumor" replicates. The Pearson correlation compares all MS acquisitions within one condition to each other (one by one). The filled diamonds represent the average value of the coefficients: 0.76 for the "Matched Normal" group and 0.61 for the "Tumor" group. High heterogeneity of the "Tumor" ECM enrichments across cancer patients (right plot) contrasts with a more homogeneous profile for "Matched Normal" ECM enrichments (left plot) with at least two unique peptides at 1% false discovery rate (FDR) ( Table S3A). The median protein abundance (based on peak area) span 4.95 orders of magnitude over the entire dataset ( Figure S2B).
One powerful and unique aspect of this overall project is the prospective recruitment of patients, resulting in a continuous tissue collection, and subsequent proteomic analysis of tissue specimens.
Using a label-free DIA-MS strategy represents a high advantage as it offers the flexibility required to prepare and acquire the tissue samples in an independent fashion and without any required sample pooling (as necessary for isobaric labeling strategies). To account for the technical variability, an efficient normalization method, based on a RT-dependent local regression model [49], was applied ( Figure   S2C-D). Briefly, assuming that the systematic bias is not linearly related to peptide abundances, the locally weighted scatterplot smoothing (LOWESS) algorithm is applied to perform a linear least squares regression on localized subsets of peptides (this algorithm is implemented into Spectronaut).  (Table S3A).
Strikingly, although matrisomal proteins represented 9% of all quantified protein groups in the dataset, their peak area-based abundance accounted for 51% of the total protein abundance in the "Matched Normal" group and 22% in the "Tumor" group ( Figure 2C). Indeed, over 50% of the quantified matrisomal protein groups (84/162) were present among the first most abundant protein groups quartile, with, for instance, collagen alpha-1(VI) chain, collagen alpha-2(VI), collagen alpha-3(VI) chain, fibronectin, and vitronectin in the top 10 most abundant protein groups (Table S2B). This highlights the efficient enrichment for ECM proteins achieved using this workflow. Figure 2D displays violin plots of the Pearson coefficients of correlation between the different replicates of each of the "Tumor" and "Matched Normal" groups. The increased variability in the "Tumor" group compared to the "Matched Normal" group revealed here that ECM enrichments from tumors are biologically highly heterogeneous across cancer patients, while the ECM from the matched histologically normal lung tissues appeared much more homogeneous across individuals.

Human lung squamous cell carcinoma features ECM remodeling
By investigating the quantitative DIA-MS results more closely, it was obvious and interesting to discover that both "Tumor" and "Matched Normal" groups were quite distinct, and could be clearly clustered apart using a supervised clustering analysis by partial least squaresdiscriminant analysis (PLS-DA) ( Figure 3A). Notably, distinct clustering was observed for specimens collected from both male and female patients ( Figure S3). To explore the remodeling of ECM associated with LSCC, very stringent significance thresholds, specifically with qvalue ≤0.001 and absolute Log 2 (fold-change) ≥0.58, were applied.
The differential analysis of all 1802 protein groups resulted in 529 significantly changing proteins comparing "Tumor" to "Matched Normal" samples. Specifically, this analysis revealed 327 significantly up-regulated protein groups and 202 significantly down-regulated protein groups (in "Tumor" vs. "Matched Normal") as shown in Figure 3B and Table S3B. Among the significantly changing proteins, 49 protein groups are well-known components of the core matrisome: 12 collagens, 29 ECM glycoproteins, and 8 proteoglycans, whereas 17 protein groups are matrisome-associated proteins: 4 ECM-affiliated proteins, 10 ECM regulators, and 3 secreted factors [50] (Figure 3C; Figure S4; Table S3B).
Interestingly, ECM proteins down-regulated in "Tumor" vs. Of these significantly up-regulated protein groups, several protein candidates could potentially be highly relevant in the context of cancer and disease progression. For example, tenascin-C, which showed a 4.16-fold-increase in "Tumor" vs. "Matched Normal" with q-value = 2.57e-69 ( Figure 3D), is a glycoprotein and member of the tenascin family. Tenascin-C is barely expressed in adult tissues, except in specific niches, such as at inflammation sites and in the stroma of solid tumors, where it is highly abundant [51]. In NSCLC, tenascin-C may participate in tumor immune evasion, progression, and recurrence via a mechanism involving the inhibition of tumor-infiltrating lymphocyte proliferation and interferon-γ secretion [52].
Additionally, annexin A1, up-regulated by a factor 4.06 in "Tumor" vs. "Matched Normal" with q-value = 5.27e-7 ( Figure 3D), is a member of the Ca 2+ -regulated phospholipid-binding protein superfamily, involved in various cellular processes, such as inflammation, proliferation regulation, apoptosis, and tumorigenesis [53]. Notably, Annexin A1 appears as prognostic factor for longer overall survival in LSCC by suppressing metastasis, but not cancer cell proliferation [54].
Tumor ECM is known to be mechanically stiffer and exhibit higher tension compared to the ECM of healthy tissues. Collagens, whose organization relies on sophisticated crosslinking networks, largely contribute to this phenomenon [8], suggesting that COL1A1, here upregulated by a factor 3.25 in "Tumor" (q-value = 8.81e-6) ( Figure 3D), could participate in ECM stiffening. Moreover, COL1A1 is associated with hypoxia in NSCLC [55]. This protein was also reported to correlate with late LSCC progression, and it appears as a potential biomarker of metastasis to lymph nodes [56], poor prognosis and chemoresistance [57] in LSCC.
Periostin was also significantly up-regulated (3.86-fold) in "Tumor" vs. "Matched Normal" with q-value = 2.99e-64 ( Figure 3D). Periostin is a key player in ECM structure and organization, particularly for collagen fibrillogenesis, and it interacts with other proteins, such as integrins, fibronectin, and tenascin [58,59]. This protein is primarily expressed by cancer-associated fibroblasts (CAFs), located in the stromal microenvironment, and has been implicated in LSCC progression, tumor cell proliferation and migration [58,60]. Specifically,

Ratajczak-Wielgomas et al. reported that periostin in LSCC cancer
cells could modulate the expression of the proteinase MMP-2, which may further regulate tumor cell invasion, and that periostin expression correlates with the incidence of lymph node metastases [61].
Moreover, the authors showed a putative interaction between cancer cells and stromal CAFs, which could promote cell invasion. Finally, periostin is associated with poor prognosis and tumor grade [58,60], and correlates with the incidence of lymph node metastasis [60,61].
To determine the biological processes altered in LSCC, an overrepresentation analysis was performed with ConsensusPathDB database [47,48]. The gene ontology (GO) analysis revealed that processes related to lipid storage and localization, lipopolysaccharidemediated signaling pathway, protein-containing complex remodeling, ECM organization, and endoderm development were down-regulated ( Figure 3E) in "Tumor" vs. "Matched Normal" comparison. In contrast, processes related to glucose-6-phosphate metabolism, glycolysis, ATP generation, chaperon cofactor-dependent protein folding, and ribosome assembly were up-regulated ( Figure 3F). The alterations of these biological processes clearly revealed aberrant ECM remodeling as well as alterations of metabolism in tumor cells. More specifically,  (Table S3) sugar metabolism is reprogrammed in cancer cells as characterized by an exacerbated glucose uptake and a strong increase in lactate production, a phenomenon known as "the Warburg Effect" [62], which may further impact the tumor microenvironment by favoring cell invasion and immunotolerance [63]. Down-regulation of lipid-related processes could be linked to alterations in the plasma membrane organization and/or of the lipid metabolism [64]. Proteins associated with such biological processes include apolipoprotein A-I (APOA1), a component of high-density lipoproteins involved in the transport of cholesterol, required for tumor cell viability, which in turn might promote tumor progression [65]. Another example is caveolin-1 (CAV1), which can undergo autophagic degradation in CAFs to protect adjacent epithelial tumor cells against apoptosis.

Changes in basement membrane proteins, small leucine-rich proteins, serpins, desmosomal proteins and keratins
To decipher how ECM is remodeled in LSCC, the dataset was investigated by focusing on proteins involved in ECM structure and organization, as well as on specific protein families. Figure 4 and Figure S5 display heatmaps of the "Tumor" vs. "Matched Normal" significant fold-changes for the ECM abundance of basement membrane proteins, small leucine-rich proteins (SLRPs), serpins, desmosomal proteins, and keratins measured for each patient. Interestingly, very robust and strong signatures were observed for each of the patients.
SLRPs represent a subgroup of proteoglycans, that is divided into four classes based on gene and protein homology: ASPN, BGN, and DCN belong to class I, LUM and PRELP to class II, and OGN to class III [69]. SLRPs are involved in various processes including ECM assembly regulation, collagen fibrillogenesis, sequestration of growth factors, cell-matrix interactions, and cell behaviors by interacting with plasma membrane receptors, such as toll-like receptors, tyrosine kinase receptors, and other matrisomal factors [70]. For example, decorin acts as a tumor suppressor by limiting tumor growth, angiogenesis, tumor cell mitophagy, and regulating the immune and inflammatory response [71]. While decorin and biglycan are the closest SLRPs, biglycan shows opposite activities by promoting inflammation, angiogenesis, tumor cell proliferation, migration, and metastasis, although tumor suppressive effects of this SLRP were also reported [71,72]. Lumican binds to collagens to prevent degradation by proteinases, such as matrix metalloproteinases (MMPs), and is observed to have both pro-and anti-tumoral properties by regulating cell proliferation and invasion [72,73]. Understanding the effects and interplay of these different SLRPs, that show significantly lower abundance in LSCC "Tumor" vs. "Matched Normal", may thus be of high relevance.
Strikingly, the overall loss of the core matrisome and matrisomeassociated proteins in the "Tumor" vs. "Matched Normal" stroma, as illustrated for the basement membrane proteins and SLRPs, explains the >2-fold decrease of the matrisomal protein abundance relative to the total protein abundance in the "Tumor" group compared to the "Matched Normal" group ( Figure 2C).
In our study, the observed alterations in protein abundance in the ECM for serine protease inhibitors (serpins) in LSCC were different for different family members: serpin family B member 6 (SERPINB6) was significantly down-regulated in "Tumor" vs. "Matched Normal" (ratio = 0.59 with q-value = 2.24e-4), whereas both serpin family B member 5/maspin (SERPINB5) and SERPINH1 (also referred to as heat shock protein 47, HSP47) were significantly up-regulated in "Tumor" vs. "Matched Normal" (SERPINB5: ratio = 1.91 with q-value = 8.20e-9; SERPINH1: ratio = 2.35 with q-value = 7.84e-15) ( Figure 4; Figure   S5). SERPINB6 interacts with cathepsin G in monocytes and granulocytes to inhibit this inflammation-related protein and interacts with other trypsin-like proteases as well [74]. SERPINB5 has a tumor suppressive activity [74]. In LSCC, this protein may be associated with cancer development by regulating the p53 signaling pathway [75]. SER-PINH1 is an endoplasmic reticulum protein with chaperone activity which ensures proper folding and ultimately conformation of type I procollagen trimer [76]. SERPINH1 is dysregulated in a large number of cancers and might play a role in tumor immunity [77]. For instance, in breast cancer, Hsp47/SERPINH1 is a key player in cancer progression by promoting the secretion and deposition of ECM proteins, for example, collagens and fibronectin [78], as well as of metastasis by regulating the cancer cell-platelet interaction via a collagen-dependent mechanism [79].  The observed global upregulation of keratins in this study highlights the strong keratinization process observed during LSCC, as also previously reported in laser micro-dissected tumor cells [80]. Interestingly, keratinization might be associated with smoking, a risk factor of lung CIAC, and keratinization correlates with poor clinical outcome in LSCC [81].
In addition, a coordinated, significant gain of desmosomal proteins was observed in "Tumor" vs. "Matched Normal", with the up-regulation of desmoglein-2 (DSG2), junction plakoglobin (JUP), desmoplakin (DSP) and plakophilin-3 (PKP3) (Figure 4; Table S3). Furthermore, plakophilin-1 and 2 (PKP1, PKP2) and desmocollin-2 (DSC2) were also up-regulated, however with slightly lower fold-change or just above histologically normal lung tissues from two of these cases and four additional histologically normal lung tissue specimens adjacent to lung cancers were probed for SERPINH1 level by immunofluorescence-based immunohistochemistry (IHC). (B) Left: SERPINH1 level was quantified based on the percentage of area with positive (FITC; green) staining in five independent images per specimen (pixels with positive staining above baseline threshold/total number of pixels per image). An example of pseudo-colorized positive area (pink) is shown for a matched set of specimens. Right: Plot corresponding to averaged values of positive staining of five images for each of 12 human specimens (six "Matched Normal" and six "Tumor"). Statistical analysis was carried out as described in the Methods section. Magnification: 20× DSG3 being associated with poor prognosis [82]. In addition, PKP1 overexpression contributes to cell proliferation and survival in LSCC by positively regulating MYC translation [83]. Our study, presented here, highlights the relevance of the desmosomal protein assembly as part of LSCC.
Altogether, the significant and robust changes in the ECM of tumor tissues revealed a conserved signature in LSCC characterized by the concomitant loss of basement membrane proteins and SLRPs and the increase in SERPINH1, as well as other significant changes relative to keratin and desmosome protein family members.

SERPINH1 ECM levels are dramatically increased in LSCC
SERPINH1/Hsp47 has been described as a collagen-specific molecular chaperone that is essential for procollagen folding and function, and subsequently for collagen network formation. In order to validate the highly significant up-regulation of SERPINH1, that was observed in LSCC "Tumor" vs. "Matched Normal" as determined by the mass spectrometric ECM analysis (2.35-fold increase; q-value of 7.84e-15), we employed an orthogonal method relying on IHC. Immunofluorescencebased IHC was conducted on an independent cohort of patients with LSCC from CHTN and UCSF. Samples for IHC were tumor tissues from six cancer patients with LSCC, two matched normal lung tissues from two of the cancer patient cases, and four additional histologically normal lung tissue specimens adjacent to lung cancers (Table S1). Figure 5A displays representative IHC images, demonstrating the strong upregulation of SERPINH1 in cancer stroma in LSCC tissues, while it was barely detected in "Matched Normal" tissues. Quantification was based on the percentage of area with positive FITC staining in five independent images for each specimen ( Figure 5B). The quantitative analysis of stained tissue sections confirmed the dramatic up-regulation of SERPINH1 in LSCC "Tumor" tissues compared to "Matched Normal" tissues. A mean of 0.39% SERPINH1-positive area was obtained in the six "Matched Normal" specimens; SERPINH1 increased 10.36-fold (pvalue = 0.0116) to 4.05% SERPINH1-positive area in the six "Tumor" specimens. While SERPINH1 was previously reported as a factor influencing tumor immunity and metastasis [77][78][79], further investigations will be performed to determine the specific biological significance of SERPINH1 in LSCC, and more broadly in other multiple CIAC tumor types. SERPINH1/Hsp47 was one of our first biological targets, that we validated in an independent human lung cancer cohort. The Proteomics Pipeline provided several other, highly promising candidate proteins, that are being followed up in future studies, performing additional experiments to validate their role in cancer.

CONCLUDING REMARKS
In summary, our work presents an efficient, robust, and multilaboratory proteomic workflow to gain in-depth insights into ECM remodeling in LSCC by prospectively collecting fresh tumor and patient-matched histologically normal tissue adjacent to tumor from patients, enriching for insoluble ECM components, performing a refined ECM protein solubilization, applying comprehensive label-free DIA quantification and stringent statistical filtering. The unbiased DIA strategy offers the possibility to capture highly confident and robust protein changes, overcoming the biological individual-to-individual variability, while providing the flexibility required to prospective studies and sample collection scheduling. It is worth noting that one can consider applying this ECM proteomic workflow to any type of cancer or, more globally, diseases of interest. Although this study was conducted on cohorts with a limited number of patients, namely 10 patients for the discovery ECM proteomic analysis, it suggests that potential protein candidates can be further assessed and validated by orthogonal methods, such as Western blotting and IHC assays, on independent cohorts thus generalizing the observations. In this study, the application of immunofluorescence-based IHC confirmed in a second cohort the dramatic increase in Hsp47/SERPINH1 abundance identified by MS in the tumor stromal microenvironment of a first cohort of LSCC patients. In addition, the application of label-free global DIA strategy for the discovery step is an asset for the easy and efficient development of targeted parallel reaction monitoring (PRM) assays on similar MS platforms as validation step and further translation into true clinical cohort measurements. Moreover, as small amounts of material are needed for the presented workflow, MS-based proteomics can be further integrated with additional -omics technologies, such as (epi)genomics, transcriptomics, and CODEX, performed on the same tissue specimens in order to achieve a multifaceted tissue assessment. The combination of this compelling approach with regular discussions between surgeons, pathologists, cancer biologists and -omics scientists represents a cornerstone to formulate new hypotheses, and thus to gain deeper mechanistic insights into the continuum of disease processes and identify novel and promising stromal-targeted therapies.