The collagen landscape in cancer: profiling collagens in tumors and in circulation reveals novel markers of cancer‐associated fibroblast subtypes

Cancer‐associated fibroblasts (CAFs) deposit and remodel collagens in the tumor stroma, impacting cancer progression and efficacy of interventions. CAFs are the focus of new therapeutics with the aim of normalizing the tumor microenvironment. To do this, a better understanding of CAF heterogeneity and collagen composition in cancer is needed. In this study, we sought to profile the expression of collagens at multiple levels with the goal of identifying cancer biomarkers. We investigated the collagen expression pattern in various cell types and CAF subtypes in a publicly available single‐cell RNA sequencing (RNA‐seq) dataset of pancreatic ductal adenocarcinoma. Next, we investigated the collagen expression profile in tumor samples across cancer types from The Cancer Genome Atlas (TCGA) database and evaluated if specific patterns of collagen expression were associated with prognosis. Finally, we profiled circulating collagen peptides using a panel of immunoassays to measure collagen fragments in the serum of cancer patients. We found that pancreatic stellate cells and fibroblasts were the primary producers of collagens in the pancreas. COL1A1, COL3A1, COL5A1, COL6A1 were expressed in all CAF subtypes, whereas COL8A1, COL10A1, COL11A1, COL12A1 were specific to myofibroblast CAFs (myCAF) and COL14A1 specific to inflammatory CAFs (iCAF). In TCGA database, myCAF collagens COL10A1 and COL11A1 were elevated across solid tumor types, and multiple associations between high expression and worse survival were found. Finally, circulating collagen biomarkers were elevated in the serum of patients with cancer relative to healthy controls with COL11A1 (myCAF) having the best diagnostic accuracy of the markers measured. In conclusion, CAFs express a noncanonical collagen profile with specific collagen subtypes associated with iCAFs and myCAFs in PDAC. These collagens are deregulated at the cellular, tumor, and systemic levels across different solid tumors and associate with survival. These findings could lead to new discoveries such as novel biomarkers and therapeutic targets. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
The tumor microenvironment (TME) has a key role in cancer development and therapeutic response [1].The TME comprises cancer cells, a diverse group of immune cells, and stromal cells including fibroblasts.Cancer-associated fibroblasts (CAFs) are activated fibroblasts that produce and alter the extracellular matrix (ECM) and affect cancer progression and therapeutic response.Therefore, excessive ECM and CAFs are the focus of new therapeutics [2].However, indiscriminate depletion of ECM or CAFs has proven to not provide clinical benefit [3]; instead, normalizing the ECM and CAF activity seems promising [4].To successfully normalize CAFs and the ECM they produce, we need a better understanding of CAF heterogeneity and accompanying ECM composition.An expanding list of CAF subtypes is emerging, with notable examples being myofibroblast CAFs (myCAFs), inflammatory CAFs (iCAFs) [5], and antigen-presenting CAFs (apCAFs) [6].Furthermore, in the liver and pancreas, so-called stellate cells are thought to be activated to produce tumorpromoting ECM and adopt a myofibroblast phenotype [4].Recent advances in single-cell RNA-sequencing (RNA-seq) technologies allow for the exploration of this CAF heterogeneity at the single-cell level [6,7], yet our understanding of CAFs, and in particular the specific ECM they produce, remains lacking.This gap in knowledge is important to address because it can lead to novel biomarkers and therapies.
The ECM is the noncellular part of tissues and can roughly be categorized into two compartments: the basement membrane, which separates the epithelial or endothelial layer of cells from the stroma, and the interstitial matrix, which provides structural support and maintains tissue architecture [8].The ECM comprises a wide variety of macromolecules with diverse functions, including collagens, elastin, laminin, and a variety of proteoglycans.Of all the ECM components, the collagens are the most abundant [8].The cancer-promoting activity of CAFs is thought to stem from their excessive production of collagens.However, studies on the role of type I collagen have proven contradictory since type I collagen exhibits both cancer-promoting activity [9,10] and cancerrestraining activity [11,12].The processing of type I collagen also seems to be important in this context: type I collagen is cancer-promoting if degraded and internalized [13,14] or if it is secreted as a homotrimer by cancer cells [15].Historically, most of the research into the role of collagen in cancer has focused on type I collagen.Although the role of type I collagen is important to address, the collagen family of proteins goes beyond the well-known and abundant collagens.This narrow focus can in part be explained by the lack of good analytical tools to measure other collagens.As one of the first research groups in the world, we have published research developing tools to detect and explore the role of other collagens in cancer as well as their potential as biomarkers [16][17][18].The collagen family has 28 protein members and can be categorized into subfamilies, including fibrillar, fibril-associated, membrane-associated, basement membrane, networking collagens, and more (supplementary material, Table S1) [19].The collagenous ECM is extensively remodeled in cancer by deregulated expression patterns leading to abnormal collagen formation, altered turnover, and degradation of collagens, as well as extensive post-translational modification [20].This deregulation can impact cancer progression and therapy response [21], in part because the collagenous ECM is a reservoir for protumor cytokines such as transforming growth factor beta (TGFβ), which is released as a result of ECM remodeling [8].A byproduct of this process is the release of detectable collagen fragments into circulation.These highly specific fragments can be assessed as cancer biomarkers prognostic for overall survival (OS) and predictive of therapy response [22][23][24][25].Despite these observations, most of the collagen family members remain unexplored as markers of cancer and CAFs.Therefore, by taking a more complete view of the collagen family in cancer we may discover novel biomarkers and novel targets of therapy directed against the collagens and fibroblast subtypes.
In this study, we sought to profile the collagen family at multiple levels with the goal of identifying cancer biomarkers.Because pancreatic cancer is known to be the most collagen-rich cancer, we started our investigation by analyzing collagen gene expression in a publicly available single-cell RNA-seq dataset of pancreatic ductal adenocarcinoma (PDAC) patients [26].Here, we identified subtypes of CAFs and their collagen profile, which we confirmed in a public non-small cell lung cancer (NSCLC) dataset [27].To investigate whether these CAF-associated collagens were associated with patient outcome in PDAC and other cancers, we profiled the collagen gene expression in tumor samples from The Cancer Genome Atlas (TCGA) project [28] and evaluated the association with survival outcomes.Using data from the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) project [29], we also confirmed that collagen gene expression generally correlates with protein levels in PDAC tumors.Lastly, we investigated if specific collagens, and collagen fragments, were also released to the circulation as a function of cancer, by measuring a panel of collagen peptides by ELISAs in serum samples from patients with various solid tumor types including PDAC.These assays collectively target 15 collagen biomarkers, each corresponding to a processing event for a collagen subtype.Overall, we found that collagens are deregulated at the cell, tumor, and systemic (serum) levels.Further, specific collagen subtypes were associated with CAF subtypes and were prognostic for survival in a range of solid tumors.Lastly, fragments of these collagens were released to the circulation where they could be quantified, demonstrating potential as cancer biomarkers reflecting CAF activity.

Single-cell RNA-Seq data
We analyzed single-cell RNA-Seq data from Peng and colleagues [26] that can be found in the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa/) under the accession CRA001160.As outlined in the original paper by Peng and colleagues, samples were collected from 24 patients with PDAC prior to treatment at the time of surgery and from 11 non-PDAC patients, 3 of which were patients with bile duct or duodenal tumors and 8 of which were patients with benign pancreatic tumors e.g.pancreatic cysts [26].We downloaded gene and barcode matrices and used the Seurat V4 [30] R-package for the analysis.We only included cells annotated with a cell type, as defined in Peng et al [26], which excluded cells with under 200 genes per cell, genes with under 3 cells per gene and cells with over 10% mitochondrial genes.Using collagen genes as features, we used the 'FindAllMarkers' function of Seurat to find collagens associated with cell types.We used the 'FindMarkers' function to test differential expression of collagens between cell types of PDAC and the non-PDAC pancreas samples.This function ranks genes based on non-parametric Wilcoxon rank sum tests between the two cell types.For the clustering of fibroblasts, we included only cells annotated as 'Fibroblast cell' or 'Stellate cell.'Using the 2,000 most variable genes, we found neighbors and clusters using the first 15 principal components at a resolution of 0.25.And visualized the clustering using a graph-based Uniform Manifold Approximation and Projection (UMAP) plot.To label the clusters, we used the markers defined by Wang and colleagues [7] including markers for myCAF, iCAF, apCAF, metabolic CAF (meCAF) and stellate-like CAFs.For the clusters where the marker signature was inconclusive, we performed Gene Set Enrichment Analysis (GSEA) to identify pathways enriched in the cluster.We used the 'FindAllMarkers' and 'FindMarkers' functions to test differential expression of collagens in the identified CAF subtypes.We analyzed NSCLC single-cell RNA-seq data from Prazanowska and Lim [27], which includes data from seven separate datasets comprising a total of 185 NSCLC tumor samples.This integrated dataset excluded cells with under 200 or over 3,000 genes per cell and cells with over 20% mitochondrial genes.We used the 'CAF' subset of the already processed version of the integrated dataset and mapped the cells to the PDAC reference dataset described above using the 'FindTransferAnchors' and 'MapQuery' functions of Seurat which uses an anchor-based methodology for transferring cell annotations.

Bulk RNA-Seq data
We analyzed publicly available RNA-sequencing data from TCGA (https://portal.gdc.cancer.gov/)[28] and Genotype-Tissue Expression (GTEx) (https://gtexportal.org/ home/) [31] projects.We extracted data from the University of California Santa Cruz (UCSC) Xena Toil hub [32] (https://toil.xenahubs.net/)using the UCSCXenaTools R-package [33] (accessed 5 March 2022).We used data from the 'TCGA TARGET GTEx' cohort and downloaded the 'TcgaTargetGtex_gene_expected_count,' 'TCGA_GTEX_category,' 'TCGA_survival_data,' and 'TcgaTargetGTEX_phenotype' datasets.Our sample subset included 'Normal Tissue' and 'Primary Tumor' samples from the GTEx and TCGA projects, respectively.The sample subset is summarized in supplementary material, Table S2, and included 1,884 samples from normal tissue and 5,281 samples from primary tumors from 10 different tissues.The log-transformed expected counts were back-transformed to counts.Sample outliers, lowly expressed genes, and non-protein-coding genes were filtered out.Count data were upper-quartilenormalized.The final DGEList object contained 7,165 samples and 16,055 genes.The Limma [34] workflow, including voom transformation and linear modeling, was used to assess differential expression of all genes between normal and tumor samples.The resulting log2 fold-change and Benjamini-Hochberg adjusted p values for the collagen genes were summarized into a heatmap.
The proportion or abundance of CAFs in a tumor sample was estimated using two separate tools, namely EPIC [35] and Microenvironment Cell Populations-counter [36] to correlate collagen gene expression to CAF abundance using Pearson correlations.We also analyzed proteomics data from the CPTAC -Pancreatic Ductal Adenocarcinoma Discovery Study, which includes processed data on both protein and mRNA abundance.The dataset included 140 tumor samples from patients with PDAC.We accessed the dataset from the LinkedOmics [37] website (https://www.linkedomics.org/data_download/CPTAC-PDAC/)and used the RNA-seq data RSEM upperquartile-normalized [Illumina HiSeq platform (San Diego, CA, USA), gene-level] mRNA data and the gene-level, median-normalized intensity protein data.We correlated gene and protein levels using Pearson correlations and summarized them in a heatmap.
For the survival analysis, the cancers with few mortality events (<50 OS events) were excluded from the analysis (supplementary material, Table S3).The voom-transformed expression of the collagens was transformed to Z-scores and used in multivariate Cox proportional-hazards regressions adjusted for age, cancer stage, and the CAF abundance estimated by the EPIC algorithm.Regression was performed independently for each collagen in each of the cancers.Collagen gene expression, age, and CAF abundance were incorporated into the model as a continuous variable.Cancer stage was included as a categorical variable.The resulting hazard ratios (HRs) and Benjamini-Hochberg adjusted p values for the collagens were summarized in a heatmap.Each contrast, i.e. tumor versus normal in each tissue, was tested separately, meaning adjusted p values were only adjusted for multiple testing down the length of genes.

Serum collagen biomarkers
Collagen fragments were measured by immunoassays that incorporate monoclonal antibodies raised against specific epitopes of various collagens.The biomarkers are summarized in supplementary material, Table S4, with a link to the original reference describing the technical validation of the assay.Assays were run blinded in a College of American Pathologistscertified laboratory and according to the manufacturer's instructions (Nordic Bioscience, Herlev, Denmark).All analytes were quantified in double determinations and only accepted if the coefficient of variation was <15%.The intra-and interassay variations were <10% and <15%, respectively, for all assays.Biomarker levels were log-transformed, and diagnostic accuracy was tested by receiver operating characteristic (ROC) curve analysis, including the area under the curve (AUC).

Serum samples from patients with cancer and controls
The cohort included serum samples from 220 patients with cancer and 33 healthy controls.Of the patients with cancer, 10 groups of 20 each of bladder, breast, colorectal, head and neck, kidney, lung, ovarian, pancreatic, prostate, and stomach cancer were included.Serum samples from patients with cancer were obtained from Proteogenex (Los Angeles, CA, USA) and the healthy controls from BioIVT (Westbury, NY, USA).Samples were stored at À80 C prior to analysis.A summary of the cohort characteristics can be found in supplementary material, Table S5.According to the vendors, sample collection was approved by an Institutional Review Board or independent ethics committee and patients gave their informed consent: Russian Oncological Research Centre n.a.Blokhin RAMS (PG-ONC 2003/1) and Western Institutional Review Board, Inc. (WIRB ® Protocol #20161665).All investigations were carried out according to the Declaration of Helsinki.

Stellate cells and fibroblasts are the primary producers of collagens
The first aim of this study was to investigate and profile collagen expression in different cell types of pancreatic tumors.We evaluated collagen gene expression in a single-cell RNA-seq dataset of 24 PDAC samples and 11 non-PDAC samples [26].We used the cell types defined in the original dataset, which included 41,986 cells from PDAC and 15,544 cells from non-PDAC samples.The cell types included acinar cells, endocrine cells, endothelial cells, fibroblasts, stellate cells, macrophages, T cells, B cells, and two different ductal cell types (Figure 1A).Type 2 ductal cells were only present in the PDAC samples.
Stellate cells and fibroblasts were the primary producers of collagens, expressing a wide variety, but with the fibrillar and beaded filament collagens such as COL1A1, COL3A1, COL5A1, and COL6A1 having the highest (most abundant) expression (Figure 1B and supplementary material, Table S6).Endothelial cells produced collagens typically associated with the basement membrane such as COL4A1, COL4A2, and COL15A1.Interestingly, unlike the other basement membraneassociated collagens, COL18A1 was expressed in a wide variety of cell types with the highest levels in ductal and stellate cells (supplementary material, Figure S1).COL18A1 expression was greater in PDAC stellate cells compared to non-PDAC pancreas stellate cells (supplementary material, Table S7).Conversely, expression seemed to decrease in T and B cells, suggesting some connection between COL18A1 and tumor immunity.Other cell types generally had lower expression of collagens compared to the fibroblasts, with a few exceptions: COL17A1 expression was restricted to ductal cell type 2 and COL27A1 expression was mostly restricted to ductal cell type 1 (Figure 1B and supplementary material, Table S6).Of note, some collagens were detected in a very small proportion of cells, and some were not detected in any cells, including COL2A1, COL7A1, COL19A1, COL20A1, and COL22A1.For reasons unknown to us, COL26A1 was entirely excluded from the dataset and could therefore not be evaluated.

Cancer-associated fibroblasts and stellate cells have the largest alteration of their collagen expression profile as compared to cells from non-PDAC pancreas
When comparing PDAC samples to the non-PDAC pancreas samples, overall, the collagen expression was higher in most cell types (Figure 1B and supplementary material, Table S7).The biggest fold-change increase in expression between PDAC and non-PDAC pancreas samples was seen in fibroblast and stellate cells where collagens such as COL1A1 and COL3A1 were severalfold increased (Figure 1B and supplementary material, Table S7).Beyond these abundant major collagens, stellate and fibroblast cells also had an upregulation of two distinct sets of quantitatively minor collagens when comparing PDAC to non-PDAC pancreas samples: Cancer-associated fibroblasts upregulated COL8A1, COL8A2, COL10A1, COL11A1, COL12A1, and COL16A1, whereas cancer-associated stellate cells upregulated COL12A1, COL14A1, COL15A1, and COL18A1 (Figure 1B and supplementary material, Table S7).

Cancer-associated fibroblast subtypes differ in their collagen expression profile
Next, we sought to profile collagen expression in CAF subtypes.Using only PDAC samples we performed clustering on the fibroblast and stellate cells to identify CAF subtypes (supplementary material, Figure S2A,B).A clear separation of fibroblast and stellate cells was obtained.Moreover, we observed larger patientto-patient variation in the overall gene expression profile of fibroblasts compared to the variation seen in stellate cells, indicating heterogeneity in fibroblast geneexpression profiles relative to a more uniform expression profile in stellate cells (supplementary material, Figure S2C).To label each cluster, we looked at the expression of known CAF subtype markers as defined by Wang and colleagues [7].Within the 11 clusters identified (supplementary material, Figure S2A), Clusters 3, 4, and 6 had high expression of POSTN, COL10A1, MMP11, and SDC1, indicative of myCAFs (supplementary material, Figure S3).Cluster 1 had high expression of markers of iCAFs, including APOD, C7, PTGDS, and EGR1 (supplementary material, Figure S4).Similarly, Cluster 8 also had high C7 and EGR1 expression, suggesting an iCAF signature in this cluster also.Indeed, Cluster 8 was high in other markers associated with iCAFs including IL6, LIF, HAS1, and CCL2 (supplementary material, Figure S5).Cluster 1 was CAFs express noncanonical collagens that can be detected peripherally not positive for IL6, suggesting two separate populations of iCAFs: one IL6 positive and another IL6 negative.Clusters 7 and 9 had high expression of antigen presenting genes CD74 and HLA-DRA suggesting these are apCAFs (supplementary material, Figure S6).Interestingly, one cluster of the apCAFs had higher expression of antigen-presenting markers, suggesting one strongly and one weakly presenting subpopulation of apCAFs (supplementary material, Figure S6).The two stellate-like CAFs identified by Wang and colleagues [7] corresponded to Cluster 0 and Cluster 2. Cluster 0 expressed RGS5, CD36, and PDGFRB, corresponding to the stellate-like CAF C1 (supplementary material, Figure S7), and Cluster 2 had high RHOB, MT1M, TAGLN, and SOD3 expression, corresponding to stellate-like CAF C2 (supplementary material, Figure S8).For Cluster 10, GSEA analysis revealed it was associated with cell cycle genes (not shown) but was also high in stellate-like CAF C1 markers (supplementary material, Figure S7), suggesting it is stellate-like CAFs undergoing cell division.Cluster 10 was therefore included in the stellate-like CAF cluster.Wang and colleagues also defined meCAFs characterized by PLA2G2A, CRABP2, LDHB, and PGK1 expression.Cluster 5 expressed high levels of PGK1 but did not otherwise fit the signature (supplementary material, Figure S9A,B).However, GSEA analysis of the Cluster 5 gene set revealed enrichment of translation-related genes, glycolysis, and MYC pathways (supplementary material, Figure S9C,D), similar to what Wang and colleagues described for their meCAFs, suggesting Cluster 5 may be meCAFs after all.In summary: Clusters 3, 4, and 6 were labeled myCAF; Cluster 1 and 8 iCAF; Clusters 7 and 9 apCAF; Clusters 0, 2, and 10 stellate-like CAF; and Cluster 5 meCAF.
To evaluate the robustness of the CAF subtypes and the collagen profile therein, we analyzed a separate dataset from NSCLC.Using the PDAC dataset as a reference, we mapped the CAF clusters onto the CAFs from the lung cancer dataset and found that the clusters also seemed to be present in NSCLC (supplementary material, Figure S10A).For the myCAFs, iCAFs, apCAFs, and meCAFs, the median prediction scores were all above 80%, suggesting a substantial overlap in the gene signatures between the CAFs from the two cancers (supplementary material, Figure S10B).For the stellate-like CAFs the median prediction was lower, around 60%, suggesting that the stellate-like CAF signature was not as evident in the lung cancer dataset.Using this reference-mapped lung cancer dataset, we confirmed the CAF-specific expression of collagens, including the expression of COL8A1, COL10A1, COL11A1, and COL12A1 in myCAFs as well as COL14A1 in iCAFs (supplementary material, Figure S10C and Table S10).

Collagen gene expression profiles are tumor specific and associated with survival
After profiling collagen gene expression in the different CAF populations of PDAC and NSCLC, we sought to expand on this by summarizing the collagen gene expression of TCGA and GTEx samples.Our sample subset included 'primary tumor' samples from patients with various types of cancers and 'normal tissue' samples from control individuals.We performed differential expression analysis between tumor and normal samples for each tissue and summarized this in log2 fold-change values and p values in a heatmap (Figure 2A).We only included the collagens that were associated with subtypes of CAFs based on the previous analysis: COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL5A2, COL6A1, COL6A2, COL6A3, COL8A1, COL8A2, COL10A1, COL11A1, COL12A1, COL14A1, COL16A1, and COL18A1.
In this dataset, the gene expression of all investigated collagens were upregulated in pancreatic tumors compared to healthy tissue, confirming our findings above; in contrast, almost all collagens were downregulated in prostate tumors compared to healthy tissue (Figure 2A).The rest of the tumors seemed to arrange themselves in a spectrum of collagen upregulation to collagen downregulation.Most tumors had an CAFs express noncanonical collagens that can be detected peripherally 27 28 J Thorlacius-Ussing et al upregulation of COL10A1 and COL11A1 (myCAF), and most had a downregulation of COL14A1 and COL16A1 (iCAF).Overall, the collagens arranged themselves in a spectrum from mostly fibrillar collagens (COL11A1, COL1A1, COL3A1, COL1A2, COL5A2, COL5A1) that were upregulated in many cancers to more basement membrane and fibril-associated collagens (COL16A1, COL14A1, COL18A1, COL4A2, COL4A1), which were downregulated.The collagens that differed between myCAF and iCAFs, COL8A1, COL10A1, COL11A1, COL12A1, and COL14A1, were also upregulated in pancreas cancer in this dataset, suggesting that both myCAF and iCAF collagens are upregulated in pancreatic cancer.In this set of CAF-collagens, only COL10A1 and COL11A1 seemed consistently upregulated across many cancers.This may indicate that COL10A1 and COL11A1 are pancancer myCAF markers, and the others (COL8A1, COL12A1 and COL14A1) are more specific to myCAFs in pancreatic cancer.To confirm an association with tissuebased evaluation of the collagen gene expression in TCGA, we confirmed the correlation between collagen gene expression and CAFs using the EPIC and MCP-counter tools.These showed a strong correlation between CAF abundance and collagens across tumor types for the CAF collagens we have described so far (supplementary material, Figure S11) and similar to findings from the collagen gene expression in the single-cell RNA-seq dataset of PDAC samples [26].
To investigate whether these differences in gene expression also resulted in different protein levels, we evaluated mRNA and protein levels to determine whether they correlate in the CPTAC-PDAC cohort where both datatypes are available.In general, we found a strong correlation between mRNA levels and protein levels in PDAC tumor, samples with notable exceptions, including COL2A1, COL6A5, COL18A1, COL22A1, and COL28A1, where there seems to be a clear disconnect between the two measures (supplementary material, Figure S12).Overall, this suggests that the differences in collagen gene expression likely also result in differences in protein levels.
Next, we wanted to determine whether the gene expression of collagens had prognostic value.In some cancers, namely those of the pancreas, breast, skin, kidney, colon, bladder, ovary, and prostate, collagen expression as a whole was generally higher in later stages of cancers compared to earlier stages (Figure 2B).For stomach tumors, the trend was unclear, and for lung tumors the trend seemed the opposite, with lower levels in later stages of cancer.Although limited by sample sizes in each stage group, it seemed prudent to adjust for stage of disease for further analysis.We used multivariate Cox proportional hazard models, adjusted for age, cancer stage, and CAF abundance, to evaluate the association between collagen expression and OS in each cancer type and summarized the resulting HRs as a heatmap (Figure 2C).Overall, the expressions of most of the collagens were positively correlated with poor OS and significant for cancers like those of the kidney, stomach, and pancreas.Interestingly, for pancreatic cancer, the myCAF collagens (COL8A1, COL10A1, COL11A1, COL12A1) were associated with poor OS, whereas the iCAF collagen COL14A1 was not significantly associated with survival.We also evaluated whether any of these collagens had prognostic value using progression-free survival (PFS), diseasefree survival (DFS), and disease-specific survival (DSS) as survival measures.In pancreatic cancer, several collagens remain significantly associated with PFS, DFS, and DSS, although most of the CAF-specific collagens described above are not significantly associated with PFS and DFS but remain associated with DSS (supplementary material, Figure S13).Across all survival measures, pancreatic cancer remains the cancer where collagen expression seems to have the most prognostic value.Kidney cancer ranks second, and third may be breast cancer.Overall, this suggests that many of the collagens associated with OS are also predictive of progression and cancer-specific mortality and associated with disease recurrence of pancreatic cancer.

The collagen turnover profile is uniquely altered in patients with various solid tumor types and can be quantified by serum-based biomarkers
After establishing a relevance of distinct collagens for CAF subtypes and types of cancer, we next sought to profile the presence of collagen fragments in the circulation of patients with various types of cancer.To do this, we used a panel of 15 biomarkers, measured by immunoassays to quantify specific epitopes of specific collagens that reflect collagen formation or degradation (supplementary material, Table S4).We measured these collagen fragments in serum from patients with bladder, breast, colorectal, head and neck, kidney, lung, melanoma, ovarian, pancreatic, prostate, and stomach cancer, with 20 patients in each group (supplementary material, Table S5).We compared serum levels of the collagen epitopes with levels in serum from 33 age-matched healthy controls.The distribution of serum levels of each epitope is shown in Figure 3A.
Of the markers reflecting collagen formation, i.e. assays measuring the release of propeptides as part of collagen maturation, COL11A1 (P), a collagen linked to myCAFs based on the gene-expression analysis, had the best diagnostic accuracy (AUC value) across all cancers tested (Figure 3B).COL3A1 (P) and COL5A2 (P) had lower, but still good, AUC values in several cancer types, including ovarian, colorectal, bladder, lung, and pancreatic cancer.The last two collagen formation markers, COL6A3 (P) and COL1A1 (P), fared worse, with only notable AUC values in ovarian, colorectal, and bladder cancer for COL6A3 (P) and only ovarian cancer for COL1A1 (P).
Of the markers of collagen degradation (MMP-, granzyme B-, or cathepsin K-mediated degradation), COL6A1 (M), COL1A1 (M), COL3A1 (M), COL3A1 (T), and COL4A2 (G) had good AUC values across a variety of cancers.COL4A1 (M) had both higher and lower CAFs express noncanonical collagens that can be detected peripherally 29 levels in some cancers as compared to healthy controls, and COL5A1 (M) had no notable AUC values in any cancers.Interestingly, cathepsin K-degraded COL10A1 (another myCAF collagen) had lower levels compared to controls in most cancers, except breast cancer.Markers that could not be classified as either formation or degradation markers included COL8A1 (N), which fared well with excellent AUC values across many cancers and again a collagen identified as myCAF based in singlecell analysis; and COL4A1 (I), which had elevated levels in ovarian, colorectal, bladder, and lung cancer, but had lower levels in kidney, head and neck, and prostate cancers.
Three biomarkers include epitopes on myCAF collagens, including COL8A1, COL10A1, and COL11A1.The C-terminus of COL8A1, in the form of COL8A1 (N) (supplementary material, Table S4), and formation of COL11A1, in the form of COL11A1 (P), were highly elevated, suggesting that increased formation of COL11A1 and increased processing of COL8A1 take place across cancers.In contrast, cathepsin degradation of type X collagen (COL10A1 [C]) was downregulated.This suggests that different epitopes of myCAF collagens can provide different information on the cancer and CAF activity with some peptides that can be used to reflect specific and increased collagen formation, e.g.COL11A1 (P), and other peptides could potentially be used to reflect (a lack of) specific degradation, e.g.COL10A1 (C), and may therefore imply high myCAF activity in the TME.Unfortunately, there was no validated serum biomarker measuring COL14A1 for a potential iCAF signature.

Discussion
In this study, we comprehensively profiled the collagen family in cancer at multiple levels: at the single-cell level in PDAC and NSCLC, at the tumor level in tumor samples from a range of solid tumors, and, lastly, at the systemic level, where we quantified collagen epitopes in serum from patients with solid tumors.At the single-cell level in PDAC, we found that the overall collagen gene expression was generally higher in PDAC compared to benign pancreas tumor samples.There were specific collagens associated with specific cell types, fibroblasts, and collagens associated with fibroblast subtypes, myCAFs, and iCAFs (Figure 4).Some of these collagens were also upregulated in tumor samples and across several solid tumors, and their expression had prognostic value independent of cancer stage and CAF abundance in several cancers.Lastly, we found elevated levels of many collagen epitopes (fragments) in the serum of cancer patients and determined that these epitopes could effectively discriminate between serum from cancer patients and healthy individuals.

J Thorlacius-Ussing et al
We found that the primary source of collagens was the fibroblasts of the pancreas.In PDAC, both stellate cells and fibroblasts produce substantial amounts of collagen.Under normal physiological conditions, pancreatic stellate cells are thought to be quiescent but become hyperactive in PDAC.This activation results in excessive production of collagen, which can promote tumor growth and invasion but can also impair access of chemotherapeutics and immune cells to the tumor [38].In stellate cells, we saw a dramatic increase in collagen production when comparing PDAC with non-PDAC pancreas samples, supporting the idea that stellate cells are a contributor of excessive collagen production in PDAC.Collagen was also significantly elevated in CAFs of PDAC relative to non-PDAC fibroblasts, but, in contrast to the resident stellate cells of non-PDAC samples, collagen production was already high in the resident fibroblasts of non-PDAC.This shows a larger relative deregulation of collagen production in stellate cells relative to the deregulation in fibroblasts.But in absolute numbers, we saw greater increases in collagen production in fibroblasts compared to stellate cells in PDAC samples, suggesting that activated CAF-like stellate cells are not the primary source of excessive collagen compared to PDAC CAFs, but they may nonetheless tip the balance in favor of tumor growth.
In addition to the production of interstitial fibrillar collagens, we observed a different expression pattern for the basement membrane-associated collagens.COL4A1 and COL4A2 had higher levels in endothelial and stellate cells in comparisons of PDAC to benign pancreas tissue, whereas COL15A1 only had higher levels in endothelial cells.In contrast, COL18A1 was expressed in a wide variety of cells, and levels were higher in stellate cells and lower in T and B cells.
Recently, it was discovered that CAFs use contractile forces to expand preexisting gaps in the basement membrane and made it physically weaker, so that cancer cells could invade through the gaps.It was proposed that CAFs primarily compromised basement membrane integrity through contractile activity independent of protease degradation [39].An increase in the expression of basement membrane collagens could be a response to this weakening of the membrane.
In the single-cell PDAC dataset, we identified several collagens associated with myCAFs, including both wellknown and abundant collagens such as COL1A1 and COL3A1, but also less abundant collagens highly specific for myCAFs, including COL8A1, COL10A1, COL11A1, and COL12A1.For pancreatic tumor samples of the TCGA dataset, elevated expression of these myCAF collagens was also evident, and elevated expression of COL10A1 and COL11A1 was consistent across many cancers.In serum, elevated levels of a COL11A1 formation marker and lower levels of a COL10A1 degradation marker were also evident across cancers.All this suggests that the turnover of these myCAF collagens is deregulated across cancers, and further study of their role in cancer biology and their potential as cancer biomarkers and therapeutic targets is warranted.COL11A1 is a known marker of CAFs [40][41][42][43], but COL10A1 is a novel PDAC CAF marker.COL10A1 has been associated with CAFs in breast cancer [44].Actually, one study found that a specific CAF subtype in breast cancer, associated with an immunosuppressive TME, was unique in its increased expression of specific collagens, including type VIII, X, and XI collagens [45].COL10A1 has also been shown to be upregulated in bladder cancer and associated with poor survival [46].In PDAC, LRRC15 + CAFs, predominantly accumulated in late stages, also express increased levels of type VIII, X, and XI collagens [47].In circulation, type X collagen is elevated in the plasma of patients with breast cancer [48] or NSCLC [49].These observations are in agreement with our own quantification of type X collagen and together justify further study of this collagen in CAFs.
In addition to COL10A1 and COL11A1, we identified collagen subtypes that could be used as markers for PDAC CAFs: COL8A1 and COL12A1 for myCAFs and COL14A1 for iCAFs.COL8A1 and COL8A2 together constitute type VIII collagen, a basement membrane-associated collagen that forms networking structures similar to the networks of type IV collagen [50].The literature supports a role for type VIII collagen in CAFs and in cancer.Recently, Yan et al [51] described how CAF derived COL8A1 induced PDAC progression by interacting with ITGB1 and discoidin domain receptors [51], and COL8A1 was identified as part of a CAF signature in gastric cancer [52] and was upregulated in head and neck CAFs and exerted its effects by activating discoidin domain receptors [53].The discoidin domain receptors are involved in cancer metastasis and are, among others, activated by type VIII collagen [54].Beyond CAFs, COL8A1 expression is higher in breast cancer and is associated with poor OS [55,56].COL8A1 is also upregulated in colon and gastric cancers and expression correlated with progression and poor survival [57][58][59].Similarly, COL8A2 is upregulated and associated with poor survival in glioblastoma [60].Type VIII collagen also has a matrikine in its C-terminal NC1 domain, called Vastatin, that reportedly has anti-angiogenic properties [61,62], yet is elevated in the serum of colorectal patients [63].These findings point to the need for more research into the role of type VIII collagen in CAFs.
Col12a1 in myCAFs and Col14a1 in iCAFs have been described in mouse pancreatic CAFs [6], and CAF-derived COL12A1 was recently described as an important regulator of type I collagen organization in the progression of breast cancer, and high expression levels were predictive of worse survival independent of traditional CAF signatures [64].Also in lung cancer, type XII collagen has been described and associated with a specific CAF subtype (FAP + αSMA + ) associated with T-cell exclusion [65].In other tissues, namely skin and cornea, COL12A1 and COL14A1 regulate the organization of the collagenous matrix that stores TGFβ [66,67].These collagens are also regulated by TGFβ themselves [68,69] and in this way may therefore be part of a TGFβ signaling feedback mechanism that drives the CAF phenotype and the accumulation of the dense and stiff collagen matrix.
The role of myCAFs is currently being debated.While myCAFs have been described as tumor constraining, helping to contain tumors and prevent invasion [12], it is becoming evident that this constraining effect also has the effect of preventing therapeutics and immune cells from reaching the tumor [70,71].Further, remodeling of collagen in the TME can lead to an increased invasive capacity of cancer cells, as it has been suggested that collagen fibers organized into parallel and linear superstructures can increase the migratory capacity of cancer cells [72].Therefore, although the activity of myCAFs may be tumor restraining at a certain stage of cancer development ('in situ'), the accumulation of collagen surrounding the tumor may eventually turn protumorigenic by preventing efficient therapeutic intervention and allowing uncontested growth.This observation suggests that the myCAFs may be a good target when trying to normalize excessive collagen production in the TME.
Collagens are some of the most post-translationally modified (PTM) proteins, and therefore there may be a disconnect between gene-expression analysis and protein/peptide product quantified by an immunoassay.To determine whether these differences in gene expression also result in different protein levels, we evaluated the correlation of mRNA and protein levels in the CPTAC-PDAC cohort where both datatypes are available.In general, we found a strong correlation between mRNA levels and protein levels in PDAC tumor samples.This suggests that the differences in collagen gene expression likely also result in differences in protein levels.In addition to protein levels, several studies point to a significant relevance of understanding collagen processing (e.g.PTMs, degradation, lack of propeptide release).Tian et al [73] demonstrated that CAFs were the major producers of collagens, which is in agreement with the findings discussed here.But complicating matters further, the authors showed greater levels of the C-terminal propeptides of COL1A1, COL1A2, COL3A1, and COL5A2 in the ECM of PDAC compared to normal pancreas [73].Of the epitopes included in the current study, only COL5A2 (P) measured the same C-terminal propeptide as the Tian et al [73] study and had highly elevated levels in pancreatic cancer serum compared to healthy controls.This seems contradictory in that Tian et al [73] described a retention of the C-terminal propeptide in the PDAC ECM, whereas we found higher levels in circulation.This incongruence may be a consequence of altered processing activity between the stromal and cancer cell regions of pancreatic tumors.This should be explored further and may reveal further insight into the processing of collagens in cancer.This also highlights the importance of collagen processing events, rather than simply total collagen amounts, in cancer and underlines the need for selective biomarkers that measure the processes to stratify patients more effectively.Su et al [74] demonstrated that cleaved type I collagen activated DDR1 signaling to promote PDAC tumor growth, whereas intact type I collagen triggered DDR1 degradation to restrain PDAC tumor growth [74].In support of this, we saw increased levels of the collagen degradation marker COL1A1 (M) in the serum of pancreatic cancer.We also saw elevated levels of many other collagen degradation markers, including COL6A1 (M) and COL3A1 (M).Fragments of these collagens should also be explored for their potential regulatory and signaling properties, similar to what is seen for COL1A1.
There is a need for minimally invasive biomarkers predictive of treatment benefit because a significant portion of patients involved in clinical cancer trials do not respond to treatment.A potential source is collagen peptides released as a consequence of the altered collagen turnover in tumors.By profiling the processing of collagens noninvasively, we can profile patients to find those who are most likely to respond to treatment.This is particularly important for when antifibrotic drugs enter clinical trials, and these biomarkers may be predictive or be used to monitor efficacy.Similarly, other cancer treatments, including chemotherapy and radiotherapy, have been shown to cause further matrix remodeling in the tumor, and noninvasive collagen biomarkers may be utilized to track response or resistance [75,76].In order for such biomarkers to be effective, both the processing of collagens and different collagen subtypes should be measured.This is also relevant for cancer immunotherapy.A crucial modulator of tumor immunity and immunotherapy response is tumor fibrosis [77].An effective immunotherapeutic response depends on T-cell infiltration into the tumor, which may be prevented by excessive collagen deposition [65,[78][79][80].In the immunotherapy context also, how collagens are formed and processed plays an important role, as demonstrated in mouse models of pancreatic cancer; only collagen type I homotrimers, as compared to normal heterotrimers, inhibit T-cell infiltration and immunotherapy [15].
There are several limitations to this study.Although we only made informal comparisons between the different collagen profile datasets in this study, the differences and commonalities can only be stated definitively once the different technologies are employed in the same sample material.Such a study could integrate the information of the different technologies and make a more informative comparison.The single-cell data also only included samples from the pancreas and lung, so the collagen profile of other cancer types await exploration.Further, the sampling of tumor tissues brings with it sampling heterogeneity and a risk of sample impurities.Further still, our quantification in serum could not pinpoint the source of the circulating collagens, so it cannot not be excluded that the fragments we measured originated from other tissues.Lastly, it is likely not enough to study single alpha chains of collagens at either the gene or protein level.Including information on the likely composition of a collagenous triple helix and the quality of the fibers (e.g.linearization of collagens) may provide additional information.For example, Chen et al recently showed that pancreatic cancer cells produce an abnormal type I collagen homotrimer (α1/α1/α1) with protumorigenic properties [15].As another example, type III collagen has been shown to have dual functions by both sustaining tumor dormancy and promoting tumor progression, depending on the linearization of the collagen fibers [72].Future studies are needed to explore further how the collagen landscape and CAF subtypes changes in cancer.

Conclusion
In this study, we looked at collagen gene expression in PDAC at the single-cell level and discovered that overall collagen gene expression was higher in PDAC than in benign pancreas samples.Specific collagens were found to be associated with specific cell types, such as fibroblasts, as well as fibroblast subtypes, such as myCAFs and iCAFs.COL8A1, COL10A1, COL11A1, and COL12A1 expression was specific to myCAFs.COL10A1 and COL11A1 were also upregulated in tumor samples and across a variety of solid tumors, and their expression was associated with prognosis.Finally, we discovered elevated levels of many collagen peptides, including peptides originating from COL8A1 and COL11A1, in cancer patients' serum, and the ability of these epitopes to distinguish between cancer and healthy serum.In all, we have shown that collagens are deregulated at the single-cell, tumor, and systemic levels.In addition, we have shown that minor collagens and their processing in cancer can be measured noninvasively and have biomarker potential.These markers may be useful in future studies of CAF biology and CAF targeting and may be instrumental to developing biomarkers of collagen turnover products, around which strategies can be developed to normalize, rather than deplete, the fibroblast niche to tumor neutral or even tumor restraining.S1.Summary of collagen family of proteins and their corresponding genes and subfamily Table S2.Number of samples included from GTEx (normal tissue) and TCGA (primary tumor) projects in each tissue Table S3.Number of events for each cancer type.Cancer types with fewer than 50 events were excluded from the survival analysis Table S4.Descriptions of ELISAs: their epitope and a broad description of their biological meaning Table S5.Demographic of cohort used in serum collagen turnover profile Table S6.

Figure 1 .
Figure 1.Profiling collagens in cells of PDAC tumors reveals collagens specific to CAF subtypes.(A) Clustering of cells (both PDAC and non-PDAC cells) with cell types labeled as defined in Peng et al [26] and visualized using a UMAP plot.(B) A dotplot of collagen expression in a variety of cell types from PDAC (red) and non-PDAC pancreas samples (blue).The size of each dot represents the percentage of cells where the gene was detected in the selected population of cells.The strength of the color of the dot represents the average of expression in the form of Z-scores.(C) Clustering of PDAC fibroblasts and stellate cells in a UMAP plot.Clusters were combined and labeled according to known CAF markers as defined in Wang et al [7].(D) Dot plot of collagen expression in the different PDAC CAF subtypes identified.

Figure 2 .
Figure 2. Collagens are deregulated across solid tumor types and are associated with prognosis.(A) Heatmap of log2 fold-change between collagen gene expression of tumor and normal samples in each tissue.ns indicates a p value above 0.05 after adjusting for multiple testing of the 15,732 genes using the Benjamini-Hochberg method.(B) Collagen expression in different cancer stages, plotted as points corresponding to the mean Z-score for each cancer stage.Error bars correspond to the standard error.(C) Heatmap of HRs from multivariate Cox regressionadjusted for age, cancer stage, and CAF abundance, as estimated by the EPIC algorithmof the gene expression of each collagen in each tumor tissue.Asterisks (*) indicate a p value below 0.05 when adjusted for the multiple testing of genes using the Benjamini-Hochberg method.

Figure 3 .
Figure 3. Elevated levels of circulating collagen epitopes allow for discriminating between cancer and healthy serum.(A) Distribution of collagen serum markers across cancers and healthy controls.(B) Heatmap of area under ROC curve (AUC) valuesa metric for diagnostic performance across a range of biomarker cutoffs.Markers are sorted from highest to lowest mean area under the ROC curve values.

Figure 4 .
Figure 4. Venn diagram summarizing how collagen subtypes are associated with CAF subtypes.

32 J
Thorlacius-Ussing et al © 2023 The Authors.The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.www.pathsoc.orgJ Pathol 2024; 262: 22-36 www.thejournalofpathology.com

Figure S4 .
Figure S4.Expression of representative markers of iCAFs Figure S5.Expression of other known iCAF markers Figure S6.Expression of representative markers of apCAFs involved in MHC class II antigen presentation Figure S7.Expression of markers representative of stellate-like CAFs C1 Figure S8.Expression of markers representative of stellate-like CAFs C2 Figure S9.Expression of representative meCAF markers Figure S10.Overlap of CAF subtypes and collagen profiles in PDAC and NSCLC cells Figure S11.CAF abundance was estimated with EPIC-algorithm (left) and MCP-counter method (right) Figure S12.Correlation between mRNA levels and protein levels in PDAC tumors Figure S13.Heatmaps summarizing survival analysis using Cox proportional hazards model adjusting for age, cancer stage, and CAF abundance as estimated by EPIC algorithm TableS1.Summary of collagen family of proteins and their corresponding genes and subfamily

Table S7 .
Table of collagen genes significantly associated with specific cell type in pancreas Table of differentially expressed collagens when comparing PDAC and noncancer pancreas cell populations

Table S8 .
Collagen genes associated with CAF subtypes

Table S9 .
Differential expression of collagen genes when comparing myCAFs and iCAFs

Table S10 .
Collagen genes associated with CAF subtypes in lung cancer36J Thorlacius-Ussing et al