Association of the collagen signature with pathological complete response in rectal cancer patients

Abstract Collagen in the tumor microenvironment is recognized as a potential biomarker for predicting treatment response. This study investigated whether the collagen features are associated with pathological complete response (pCR) in locally advanced rectal cancer (LARC) patients receiving neoadjuvant chemoradiotherapy (nCRT) and develop and validate a prediction model for individualized prediction of pCR. The prediction model was developed in a primary cohort (353 consecutive patients). In total, 142 collagen features were extracted from the multiphoton image of pretreatment biopsy, and the least absolute shrinkage and selection operator (Lasso) regression was applied for feature selection and collagen signature building. A nomogram was developed using multivariable analysis. The performance of the nomogram was assessed with respect to its discrimination, calibration, and clinical utility. An independent cohort (163 consecutive patients) was used to validate the model. The collagen signature comprised four collagen features significantly associated with pCR both in the primary and validation cohorts (p < 0.001). Predictors in the individualized prediction nomogram included the collagen signature and clinicopathological predictors. The nomogram showed good discrimination with area under the ROC curve (AUC) of 0.891 in the primary cohort and good calibration. Application of the nomogram in the validation cohort still gave good discrimination (AUC = 0.908) and good calibration. Decision curve analysis demonstrated that the nomogram was clinically useful. In conclusion, the collagen signature in the tumor microenvironment of pretreatment biopsy is significantly associated with pCR. The nomogram based on the collagen signature and clinicopathological predictors could be used for individualized prediction of pCR in LARC patients before nCRT.


| INTRODUC TI ON
Currently, colorectal cancer is one of the tumors with the highest morbidity and mortality, and LARC accounts for approximately 70% of rectal cancers. 1 To improve the rates of R0 resection and sphincter-preserving surgery, neoadjuvant chemoradiotherapy (nCRT) followed by total mesorectal excision is the standard treatment for LARC patients. 2 Approximately 20%-25% of patients achieve pathological complete response (pCR) after nCRT, and these patients experience a better prognosis than patients with non-pCR. 3 Some researchers consider the proportion of patients who achieve pCR after nCRT and attempt to identify alternative treatment options for TME due to the surgery-related deaths and postoperative functional complications associated with TME, especially abdominoperineal resection. Habr-Gama et al. 4 found that the "wait and see" policy in clinical complete response patients compared with pCR patients who underwent TME showed no differences in prognosis. This original finding was subsequently supported by a series of studies 5,6 ; therefore, the "wait and see" policy can be considered an alternative treatment strategy for TME. There is a significant clinical need for a reliable biomarker to accurately predict pCR in LARC patients who may safely adopt the "wait and see" policy after nCRT.
The tumor microenvironment consists of various tumor cell components and noncellular extracellular matrix (ECM), and the ECM interaction with tumor cells plays a critical role in tumor progression, metastasis, and therapeutic efficacy. 7,8 Collagen is the dominant component of the ECM, and its structure has been increasingly recognized as a robust biomarker to predict the prognosis of multiple tumor types, such as prostate cancer and gastric cancer. 9,10 Furthermore, previous studies found that the collagen structure in a biopsy is associated with the treatment response associated with nCRT in rectal cancer and breast cancer. 11,12 Nevertheless, the relationship between the collagen structure in pretreatment biopsy and pCR has not been examined. Therefore, we hypothesized that collagen structure in the tumor microenvironment of the biopsy is associated with pCR in LARC patients.
Multiphoton imaging (MPI) is a fast, label-free, high-resolution imaging technology that combines two nonlinear optical effects, the SHG signal generated by collagen and the two-photon excitation fluorescence (TPEF) signal for cells, to observe detailed information on collagen structure and cell morphology at the subcellular level. 13 Moreover, due to the inherent physical features of collagen, MPI has become a useful optical tool to visualize the collagen structure in the tumor microenvironment. 14,15 In addition, high-throughput and fully quantified collagen structure features can be extracted from highresolution multiphoton images through the automatic image analysis method, 10,14,15 which aids in interpreting the relationship between collagen structure and pCR.
Here, we clarified the correlation between the collagen structure in the biopsy tumor microenvironment and pCR and then developed and validated a nomogram for accurately individualized prediction of pCR in patients with LARC before nCRT. of 0.891 in the primary cohort and good calibration. Application of the nomogram in the validation cohort still gave good discrimination (AUC = 0.908) and good calibration. Decision curve analysis demonstrated that the nomogram was clinically useful.
In conclusion, the collagen signature in the tumor microenvironment of pretreatment biopsy is significantly associated with pCR. The nomogram based on the collagen signature and clinicopathological predictors could be used for individualized prediction of pCR in LARC patients before nCRT.  The clinicopathologic characteristics collected from three medical records were as follows: age, sex, body mass index (BMI), differentiation status, pretreatment carcinoembryonic antigen (CEA) level, pretreatment carbohydrate antigen 199 (CA-199) level, distance from anal verge, pretreatment T stage, pretreatment N stage, and tumor dimension.
In this study, pretreatment biopsies were reviewed for evidence of tumor budding through a ×4 lens (×40 magnification) with confirmation of positive cases at ×10 (×100 magnification).
Tumor budding was defined as a single cancer cell or a group of <5 detached tumor cells found in the stroma of the biopsy specimen. 16 Therefore, any budding seen at ×4 and confirmed at ×10 was deemed positive.

| Treatment and definition of pCR
All patients underwent preoperative radiotherapy at a total dose of 50.4 Gy in 28 fractions. Concomitantly, preoperative chemotherapy was delivered, and radiotherapy was administered according to National Comprehensive Cancer Network (NCCN) guidelines. 17 TME was performed within 6-8 weeks after completion of nCRT by senior attending surgeons. Adjuvant chemotherapy started within 6 weeks after surgery. The regimen was the same as that for preoperative chemotherapy.
The treatment response was evaluated by two gastrointestinal pathologists who were blind to the clinical outcomes according to surgical resection specimens. Patients with pCR were defined according to the tumor regression grade system. 18

| Image acquisition and collagen feature extraction
MPI and collagen structural feature extraction were as follows: A ×20 objective lens was selected in this study to image the entire biopsy tissue and present the collagen structural features. 12,19 Then, multiphoton images were compared with the H&E images for histological evaluation. The extraction of collagen features was performed using MATLAB 2016b (MathWorks). 20 In total, 142 collagen features were extracted, including eight morphological features and 134 texture features (Table S1). More details about the imaging system and feature extraction are provided in the Supplementary Information.

| Collagen feature selection and collagen signature construction
Least absolute shrinkage and selection operator (Lasso) regression is characterized by variable selection and complexity regularization while fitting the generalized linear model. It can be used to select the most predictive markers from high-dimensional data and reduce the interaction between markers to avoid overfitting. 21 Therefore, Lasso regression was used to select collagen features and construct the collagen signature.

| Development and validation of the individualized prediction model
Univariable and multivariable logistic regression analyses were used to analyze the value of clinicopathological candidate predictors and collagen signatures in the primary cohort. Then, an individualized prediction model for pCR was developed based on the results of the multivariable analysis and presented as a visual nomogram. 22,23 The discrimination and calibration of the nomogram was measured by the ROC curve and calibration curve with the Hosmer-Lemeshow test. In addition, the variance inflation factor was calculated to evaluate the multicollinearity of the multivariate prediction model.

| Clinical utility of the prediction model
DCA and CIC were used to assess the clinical usefulness of the nomogram. 24 In addition, all patients were divided into two groups according to the Youden index in the primary cohort, namely, the high-and low-probability pCR groups, to assess the sensitivity, specificity, accuracy, PPV and NPV of the prediction model in the primary cohort, validation cohort, and all patients, respectively.

| Incremental value of the collagen signature to traditional model
To estimate the incremental value of the collagen signature to the clinicopathological predictors, a clinicopathologic characteristicbased model (i.e., the traditional model) was developed without a collagen signature for comparison with the nomogram. Furthermore, the improvement of the nomogram based on the collagen signature was evaluated by the area under the ROC (AUC), NRI, and index IDI.

| Follow-up and association of the prediction model with prognosis
Patients achieved follow-up after radical surgery. The association between the nomogram-predicted high-and low-probability pCR and DFS and OS was analyzed.

| Statistical analysis
All statistical tests were performed using SPSS 24.0 and R statistical software (version 4.0.3). The chi-square test or Fisher's exact test was applied to compare categorical variables. Univariate and multivariate logistic regression analyses were used to identify the ORs of independent predictors and 95% confidence intervals (CIs). Survival curves are presented according to the Kaplan-Meier method and were compared by the log-rank test. A Cox proportional hazards model was used to determine the HR and 95% CI of variables for DFS and OS. Statistical tests were two-sided, and p < 0.05 was considered statistically significant.

| Patient characteristics
According to the inclusion and exclusion criteria, 516 patients were included in this study (353 and 163 in the primary and validation cohorts, respectively) ( Figure S1). The detailed baseline characteristics of the primary and validation cohorts is listed in Table 1. Univariate analysis revealed that differentiation status, pretreatment CEA level, pretreatment CA199 level, pretreatment T stage, and tumor dimension were significantly different between the pCR and non-pCR groups in the primary and validation cohort cohorts (p < 0.05).
The rate of pCR in the primary (21.5%, 76/353) and validation cohorts (22.7%, 37/163) was balanced (p = 0.819), and the baseline characteristics were similar (Table S2) between the two cohorts, which verified their use as primary and validation cohorts.

| Collagen feature selection and collagen signature construction
The flowchart of this research is presented in Figure 1. In total, 142 collagen features shrunk to four potential features by implementing Lasso regression in the primary cohort ( Figure S2). These collagen features were presented in the collagen signature calculation formula: Tumor budding was identified in the pretreatment biopsy in 70 of the 353 patients (19.8%). Comparison of the four collagen features with tumor budding showed statistically significant differences in collagen straightness, collagen crosslink density and collagen orientation (Table S3).
Representative H&E images, SHG/TPFF images, and binary images of the pCR and non-pCR patients are presented in Figure   identified as independent predictors for predicting pCR by multivariable analysis ( Table 2). A prediction model that integrated these five predictors was constructed and presented as a nomogram ( Figure 4A). Among these independent predictors, the  Table S6). The variance inflation factor of the five predictors was less than five, demonstrating no multicollinearity among all predictors ( Figure S5).

| Development of the individualized prediction model
We further investigated the relationship between these four clinicopathological predictors and the four collagen features (Tables S7-S10

| Evaluation and validation of the performance of the prediction model
The

| Clinical utility of the prediction model
DCA indicated that using the nomogram to predict pCR showed a greater advantage than either the "treat-all scheme" or "treat-none scheme" in the primary cohort, validation cohort, and all patients ( Figure 5A). Based on these DCAs, CICs were performed to evaluate the clinical impact of the nomogram to help us more intuitively recognize its significant value by building a simulated model comprised of 1000 LARC cases to more accurately identify patients with potential pCR. The results showed the great predictive ability of the nomogram when the probability threshold of nearly 0.4 was optimal to identify patients who would achieve pCR from nCRT ( Figure 5B).
In addition, the maximum value of the Youden index was 0.251, which was the cut-off value in the primary cohort. Then, the patients were separated into a high probability pCR group and a low-probability pCR group. The nomogram also had satisfactory sensitivity, specificity, accuracy, PPV, and NPV (Table 3).

| Incremental value of the collagen signature to traditional model
The collagen signature was excluded, and a traditional model based on pretreatment CEA level, pretreatment CA199 level, differentiation status, pretreatment T stage, and tumor dimension (Table S11) (Table 4; Figure 5C).
Moreover, all the NRI and IDI values were >0, with p-values <0.05 between the nomogram and traditional model, indicating that the nomograms performed better than the traditional model (Table 4).
DCA also showed that the nomogram had a higher net benefit than the traditional model for predicting the probability of pCR ( Figure 5A). In addition, the nomogram had higher sensitivity, specificity, accuracy, PPV, and NPV than the traditional model (Table 3).

| Follow-up and association of the prediction model with prognosis
The median (IQR) DFS and OS were 44.5 months (28-57 months) and 48 months (36-58 months), respectively. Among patients with a high probability of pCR, DFS was significantly better than that among patients with a low probability of pCR (3-year DFS: high probability of pCR, 91.2%; low probability of pCR, 70.6%; log-rank p < 0.001; Figure 6A). Furthermore, the OS of patients with a high probability of pCR was also better than that of patients with a low probability of pCR (3-year OS: high probability of pCR, 94.1%; low probability of pCR, 81.3%; log-rank p < 0.001; Figure 6B). The collagen signature and other predictors with the corresponding survival status are shown in Figure 7. for OS (Table 5). This result showed that the nomogram-predicted probability of pCR was significantly associated with prognosis after adjusting for other variables.

| DISCUSS ION
In this study, we developed and validated a prediction model based on the collagen signature and presented it as an easy-to-use nomogram. The nomogram with satisfactory performance was intended to be used by surgeons to predict the personalized probability of pCR and provide an effective tool for clinical decision-making.
Collagen is the main component of the ECM; it provides structural and mechanical support for cells and tissues and regulates a variety of cell functions. 25 Growing evidence has proven that changes in the collagen structure in the tumor microenvironment could importantly influence the growth, invasion, metastasis, and survival of tumor cells and even affect therapeutic sensitivity. [26][27][28] Therefore, collagen with great potential clinical application value is currently one of the hotspots of individualized medical research. [29][30][31] However, the relationship between pCR and collagen structure in the tumor microenvironment of pretreatment biopsy is unclear. With the development of interdisciplinary approaches, MPI can accurately and selectively be used to visualize collagen in the tumor microenvironment in a label-free manner. [32][33][34] In addition, it is feasible to automatically extract high-throughput collagen feature information from multiphoton images for conducting subsequent data analysis to provide decision support. 14,15 Based on the above factors, pretreatment biopsy was imaged by MPI, and 142 collagen features were extracted in this study for subsequent analysis.
In recent studies, multimarker analyses that combine singular markers into marker panels have been accepted and can increase the prediction performance. 35 Lasso regression is a useful algorithm to select the most predictive value of parameters from highdimensional data while avoiding overfitting. 36,37 In this study, we Collagen score for each patient in the primary cohort and comparison of the collagen signature between patients with pCR and non-pCR in the primary cohort. Red represents pCR, and blue represents non-pCR. pCR, pathological complete response collagen signature with a low probability of pCR after nCRT. In addition, another feature used to construct the collagen signature was a Gabor wavelet transform feature. The Gabor wavelet transform is a multiscale image analysis method that divides the image data into different frequency components. 42 The texture features extracted from the wavelet decomposed image can further present the spatial heterogeneity of collagen at multiple scales. 43  migration. 44 Therefore, these results suggested that tumor cells are prone to migrate and develop tumor budding in the tumor microenvironment with high collagen straightness, collagen crosslink density, and collagen orientation.
The epithelial-mesenchymal transition (EMT), the process during which epithelial cells lose adhesion with neighboring cells and are converted to migratory and invasive cells, is closely tied to cancer progression. 45 Collagen in the ECM is critical for EMT. 46 Four collagen features may reflect the tumor microenvironment, which is related to the promotion of EMT. Patients with high collagen straightness, collagen orientation, collagen crosslink density, and low Gabor feature may represent increased matrix stiffness. 44 Increased matrix stiffness could lead to improved interstitial pressure, tumor and stromal cell deformation, and initiation of EMT. 46 Moreover, increased matrix stiffness could also drive EMT through a TWIST1-G3BP2 mechanotransduction pathway. 47 The increased collagen crosslink density can promote EMT by weakening cell-cell adhesions. 48  The y-axis represents the net benefit, the x-axis represents the different threshold probabilities, the red line represents the collagen nomogram, the cyan line represents the traditional model, the yellow line represents the "treat-all scheme," and the black line represents the "treat-none scheme." The decision curve revealed that using the nomogram to predict pCR could add more benefit than the traditional model, the "treat-all scheme" and the "treat-none scheme." (B) Clinical impact curves for the nomogram. Of 1000 patients, the red line shows the total number of LARC patients who would be deemed pCR for each threshold probability. The black line shows how many of those would be true positives (cases). The closer the curves, the higher the probability that the nomogram would identify pCR patients from a total estimated number of pCR in LARC patients. The threshold value represents the value after which the rate of misdiagnosis would be lowest, thereby providing an optimal benefit ratio for the patient. (C) ROCs for the nomogram and the traditional model. The red line represents the nomogram; the cyan line represents the traditional model. AUC, area under the curve; LARC, locally advanced rectal cancer; pCR, pathological complete response; ROC, area under the receiver operator characteristic and radiomics also needs accurate tumor segmentation. 53 The collagen signature we proposed is based on biopsy specimens before treatment, so the error caused by the manual marking process is avoided. Furthermore, the collagen signature provided additional prognostic information and helped researchers to understand the interactions between tumor cells and their structural microenvironment; here, the collagen signature was clinically relevant and worth investigating.

TA B L E 3
The performance of the nomogram and the traditional model in predicting pCR in the primary cohort, validation cohort, and all patients pCR. Therefore, we used pretreatment biopsy tissue rather than posttreatment resected samples. We did not evaluate the discrepancy between the biopsy specimen and the resected sample in this study. Of course, the resected specimen underwent radiotherapy, which may cause excessive collagen deposition and structural disorganization by myosin IIA expression and oxidative stress. 56,57 In addition, the regressed tumor tissue is replaced by interstitial fibrosis. 58 In short, the collagen structure of the resected specimen may be deposited and disorganized by radiotherapy compared with pretreatment biopsy tissue.
In conclusion, we found that the collagen signature in the tumor microenvironment of pretreatment biopsy samples was significantly associated with pCR. We developed and validated a nomogram based on the collagen signature for accurately individualized prediction of pCR in patients with LARC before nCRT.

D I SCLOS U R E
The authors have no conflict of interest.