Association of collagen deep learning classifier with prognosis and chemotherapy benefits in stage II‐III colon cancer

Abstract The current tumor‐node‐metastasis staging system does not provide sufficient prognostic prediction or adjuvant chemotherapy benefit information for stage II‐III colon cancer (CC) patients. Collagen in the tumor microenvironment affects the biological behaviors and chemotherapy response of cancer cells. Hence, in this study, we proposed a collagen deep learning (collagenDL) classifier based on the 50‐layer residual network model for predicting disease‐free survival (DFS) and overall survival (OS). The collagenDL classifier was significantly associated with DFS and OS (P < 0.001). The collagenDL nomogram, integrating the collagenDL classifier and three clinicopathologic predictors, improved the prediction performance, which showed satisfactory discrimination and calibration. These results were independently validated in the internal and external validation cohorts. In addition, high‐risk stage II and III CC patients with high‐collagenDL classifier, rather than low‐collagenDL classifier, exhibited a favorable response to adjuvant chemotherapy. In conclusion, the collagenDL classifier could predict prognosis and adjuvant chemotherapy benefits in stage II‐III CC patients.


| INTRODUCTION
Colon cancer (CC) is one of the main causes of cancer mortality worldwide. 1 At present, patients with CC are treated based on the tumornode-metastasis (TNM) staging system, which has been widely used in the clinic and has been continuously revised to the eighth edition. 2 Nevertheless, the TNM staging system cannot adequately distinguish the prognosis of stage II-III CC patients, especially those who receive adjuvant chemotherapy. 3,4 The 5-year overall survival (OS) of these patients is between 50% and 90%, which indicates that the TNM system cannot provide sufficient prognosis and adjuvant chemotherapy benefit information. Therefore, there is an urgent need for an efficient biomarker to complement the current TNM staging system to improve the accuracy of treatment selection and prognosis prediction.
The scaffolding of the tumor microenvironment (TME) is composed of the extracellular matrix (ECM), which could impact the biological behavior of cancer cells. 5,6 Collagen is the main component of the ECM and plays a major role in ECM function. 7,8 Some studies have proven that collagen in the TME is a valuable marker for evaluating the prognostic outcomes of patients with gastrointestinal tumors. [9][10][11] In addition, the stiffness of tumor tissue could increase because of abnormal collagen cross-linking and deposition, which leads to chemotherapy resistance. 6,8,12 Thus, we hypothesize that collagen in the TME is an effective biomarker to provide prognosis and chemotherapy benefit information in stage II-III CC patients.
Multiphoton imaging, based on nonlinear optics and femtosecond lasers, can acquire the collagen structure and cell morphology of biological samples. The second harmonic generation (SHG) signal is produced from collagen fiber, and the two-photon excitation fluorescence is produced by cells. 13,14 Due to its imaging principle and physical origin, multiphoton imaging has high selectivity and specificity for collagen. Therefore, multiphoton imaging can accurately capture information about the conformational alterations of collagen in the TME.
Currently, several machine-learning algorithms have been applied to explore disease information in medical images. Among them, deep learning (DL) is one of the most effective approaches used to perform medical image recognition and classification. [15][16][17][18][19] Moreover, DL has promising potential to predict prognosis by extracting prognosisassociated information from medical images. 20,21 Hence, this study aimed to construct a collagen-deep-learning (collagen DL ) prognostic classifier based on multiphoton imaging and DL for effectively predicting the survival of patients with stage II-III CC and explore whether the collagen DL classifier could identify patients with high-risk stage II and III CC who might benefit from adjuvant chemotherapy.

| Patient characteristics
The clinicopathologic characteristics of the patients in the training, internal validation, and external validation cohorts are shown in

| Collagen DL classifier and prognosis
The workflow of this study is shown in Figure 1. All patients were divided into high-and low-collagen DL classifier subgroups according to the optimal cut-off recurrence probability value (0.436) in the training cohort ( Figure S1). The relationships between clinicopathological characteristics and the collagen DL classifier are shown in Table S1.
Representative images of the high-and low-collagen DL classifier subgroups are shown in Figure 2a. In the training cohort, the 5-year DFS and OS rates were 90.6% and 93.2% for patients with a low collagen DL classifier and 59.0% and 68.6% for patients with a high collagen DL classifier, respectively (P < 0.001). Then, the collagen DL classifier was applied to the internal and external validation cohorts, and the results demonstrated that the collagen DL classifier could also significantly distinguish patients with different prognoses in these cohorts (Figure 2b,c).
In the training cohort, the collagen DL classifier yielded Harrell's concordance indexes (C-indexes) of 0.699 for DFS and 0.678 for OS. Good discrimination was also verified in the internal and external validation cohorts, with C-indexes of 0.692 and 0.697 for DFS and 0.660 and 0.692 for OS, respectively. In the three cohorts, the 5-year time-dependent receiver operating characteristic (ROC) curves also confirmed that the collagen DL classifier has good discrimination in terms of predicting prognosis ( Figure S2).
Stratified analyses of patients with stage II and III CC indicated that the collagen DL classifier was a valuable biomarker to identify stage II-III patients with different prognoses in the three cohorts ( Figure 3). When each clinicopathological characteristic was used for stratified analysis, the collagen DL classifier could still distinguish patients with different DFS and OS rates ( Figure S3-S8).

| Individualized collagen DL nomogram construction and performance assessment
Univariate Cox regression analysis demonstrated that venous emboli and/or lymphatic invasion and/or perineural invasion (VELIPI), T stage, N stage, and the collagen DL classifier were candidate predictors of prognosis in the training cohort (Table S2)

| Assessment of the incremental value of the collagen DL classifier in predicting DFS and OS
Two clinicopathological models based on three clinicopathological predictors in the training cohort were used to predict DFS and OS (Table S3). Compared to either the clinicopathological model, TNM stage, or the collagen DL classifier, the collagen DL nomogram displayed better discrimination with higher C-indexes in the three cohorts (Table S4). In addition, this result was confirmed by 5-year timedependent ROC curves ( Figure S9 and Table S5). The corresponding net reclassification improvement (NRI) and integrated discrimination improvement (IDI) showed that the collagen DL nomogram had a significantly increased classification accuracy for survival outcomes compared with the clinicopathological model (Tables S6 and S7, and

| The collagen DL classifier and adjuvant chemotherapy
We further analyzed the relationship between the collagen DL classifier and the chemotherapy benefit in high-risk stage II and III CC patients.
The results confirmed that the collagen DL classifier was significantly associated with prognosis regardless of whether patients received chemotherapy ( Figure S11). However, the collagen DL classifier might have a more powerful relationship with the prognosis of patients who have not received chemotherapy. Hence, a subset analysis was performed based on the collagen DL classifier. Examination of the interaction between the collagen DL classifier and adjuvant chemotherapy suggested that patients with high collagen DL classifier derived more benefits from adjuvant chemotherapy than patients with low collagen DL classifier in the CC patients with high-risk II and III stages (Table S8). The corresponding Kaplan-Meier survival curves revealed The workflow of this study. (a) Study design of this study. (b) Flow chart of the collagen DL classifier. A representative ROI with a field of view of 512 Â 512 μm was chosen in the HE staining image, and the corresponding multiphoton image was obtained. Then, the probability value of recurrence from these multiphoton images is output through Res-net 50. Finally, the collagen DL classifier was constructed according to the optimal cut-off probability value. HE, hematoxylin and eosin; MPI, multiphoton imaging; Res-net 50, 50-layer residual network; ROI, region of interest. powerful collagen feature for estimating 5-year OS. 10 Furthermore, the collagen structure is related to chemotherapy resistance. Collagen crosslinking promotes an increase in tissue hardness, which changes the growth and integrity of blood vessels. 26,27 Excessive deposition and abnormal remodeling of collagen can increase interstitial pressure, which affects the effect of drug delivery. 6,26,27 Taken together, these pieces of evidence demonstrate that collagen in the TME is a potentially valuable biomarker for estimating prog- Multiphoton imaging has been used for real-time in vivo imaging and optical biopsy due to its label-free advantages and stability. Multiphoton imaging can visualize the morphology of cells and the structure of tissues at the subcellular level, which is comparable to traditional hematoxylin and eosin (HE) staining. 13,42 Importantly, due to the endogenous physical properties of collagen, multiphoton imaging can specifically image the collagen structure. 43,44 The structural information of collagen analyzed from multiphoton images can be used for the diagnosis and prognosis assessment of several diseases. 9,11,45-47 Therefore, multiphoton imaging is a strong tool to assess the association between the collagen structure in the TME and the prognosis of patients with stage II-III CC.

| CONCLUSIONS
The collagen DL classifier can effectively classify CC patients with stage II-III disease and increase the predictive value of the TNM staging system. Furthermore, the collagen DL classifier could be a helpful predictive tool to identify patients who are more likely to benefit from adjuvant chemotherapy. The collagen DL nomograms might facilitate the personalized postoperative surveillance and management of stage II-III CC patients.

| Study design and patients
Ethics approval was obtained from the institutional review boards of the two academic medical centers: Nanfang Hospital and the Sixth Affiliated Hospital, Sun Yat-sen University. The requirement for informed consent was waived for this study. The study was conducted following the guidelines of the Declaration of Helsinki.
The data of patients with stage II-III CC who underwent radical surgery at either of the two participating centers were reviewed in this retrospective study. The inclusion criteria were as follows: (1) patients ≥18 years; (2)  The primary objective of the study was to construct an effective prediction model to estimate the DFS and OS of stage II-III CC patients.

| Region of interest selection and multiphoton imaging
Formalin-fixed, paraffin-embedded samples were sliced into 5-μm-thick serial sections for HE staining. Two gastrointestinal pathologists with more than 10 years of experience who were unaware of the prognostic information used a microscope to reassess the invasive area of the tumor on the HE image. When the two pathologists had different opinions, the final decision was made by the director of the Pathology Department.
Finally, three regions of interest (ROIs) of 512 μm Â 512 μm per section in the invasive region were randomly chosen, and the corresponding regions on the other serial section were used for multiphoton imaging.
Image acquisition for multiphoton imaging was performed with a 100Â original magnification objective on another unstained serial section and then compared with HE staining for histologic assessment. 13,53 More information about the multiphoton imaging system is shown in the Supplementary Methods.

| Collagen deep learning classifier construction
Res-net is a representative deep convolutional neural network (CNN) that is widely applied in the field of target classification. 50 The discriminative ability and calibration of the collagen DL nomograms were assessed via the C-index, 5-year time-dependent ROC curve, and calibration curve. DCA was applied to determine the clinical application value of the collagen DL nomograms. 58

| Assessment of the incremental value of the collagen DL classifier in individualized DFS and OS estimations
The incremental value of the collagen DL classifier for the clinicopathological model, which was based on clinicopathological predictors, was evaluated with respect to discrimination, calibration, and clinical application value. In addition, the performance of the collagen DL nomogram and the clinicopathological model were compared by the NRI and IDI. 59,60

| Statistical analysis
The Res-net 50 model was implemented with the open-source software Python (version 3.9.0) and TensorFlow (version 2.6.0-GPU), and statistical analysis was conducted in R software (version 3.6.0) and SPSS software (version 22.0). The chi-square test or Fisher's exact test was used to assess differences between two groups of categorical variables. Univariate and multivariable Cox analyses were used to select the predictors and calculate the HR with 95% CI. The difference between Kaplan-Meier curves was assessed using the log-rank test.
All tests were two-tailed, and a P value <0.050 was determined to be statistically significant.