Histological phenotypic subtypes predict recurrence risk and response to adjuvant chemotherapy in patients with stage III colorectal cancer

Abstract Histological ‘phenotypic subtypes’ that classify patients into four groups (immune, canonical, latent and stromal) have previously been demonstrated to stratify survival in a stage I–III colorectal cancer (CRC) pilot cohort. However, clinical utility has not yet been validated. Therefore, this study assessed prognostic value of these subtypes in additional patient cohorts along with associations with risk of recurrence and response to chemotherapy. Two independent stage I–III CRC patient cohorts (internal and external cohort) were utilised to investigate phenotypic subtypes. The primary endpoint was disease‐free survival (DFS) and the secondary endpoint was recurrence risk (RR). Stage II–III patients, from the SCOT adjuvant chemotherapy trial, were utilised to further validate prognostic value and for exploratory analysis assessing associations with adjuvant chemotherapy. In an 893‐patient internal cohort, phenotypic subtype independently associated with DFS (p = 0.025) and this was attenuated in stage III patients (p = 0.020). Phenotypic subtype also independently associated with RR (p < 0.001) in these patients. In a 146‐patient external cohort, phenotypic subtype independently stratified patients by DFS (p = 0.028), validating their prognostic value. In 1343 SCOT trial patients, the effect of treatment type significantly depended on phenotypic subtype (p interaction = 0.011). Phenotypic subtype independently associated with DFS in stage III patients receiving FOLFOX (p = 0.028). Furthermore, the immune subtype significantly associated with better response to FOLFOX compared to CAPOX adjuvant chemotherapy in stage III patients (p = 0.013). In conclusion, histological phenotypic subtypes are an effective prognostic classification in patients with stage III CRC that associates with risk of recurrence and response to FOLFOX adjuvant chemotherapy.


Introduction
There has been an enormous effort to develop a prognostic classification of colorectal cancer (CRC) based mainly on transcriptomic analysis of tumours [1,2]. We recently proposed a histological phenotypic subtyping method based on phenotypic characteristics of the tumour microenvironment (immune and stromal infiltrate) and tumour proliferation [3], extrapolated from the consensus molecular subtypes (CMS), that could translate more readily to the clinic [1]. This method classifies into four prognostic groupsimmune, canonical, latent and stromalin a discovery cohort of 237 patients with stage I-III CRC [3], but requires validation.
Establishing distinct patient groups will allow development of personalised treatment approaches for CRC, as evidenced by the use of mismatch repair deficiency for response to immunotherapy [4]. There is currently a lack of biomarkers to predict response to adjuvant chemotherapy, particularly important for CRC and 5-FU-based adjuvant chemotherapy, as the SCOT trial recently demonstrated that patients receiving CAPOX (capecitabine and oxaliplatin) have similar survival with 3-versus 6-months duration, whereas patients receiving FOLFOX (bolus and infused fluorouracil with oxaliplatin) require 6-months duration [5]. This highlights the importance of identifying which patients need to receive the longer more invasive FOLFOX regimen compared to the potentially shorter less invasive CAPOX regimen. Histological phenotypic subtyping could provide such a tool.
Hence, the primary aim of this study was to validate the prognostic value of the histological phenotypic subtypes in distinct stage I-III CRC internal and external cohorts. The secondary aim was to assess associations with risk of recurrence. Finally, the exploratory aim was to investigate adjuvant chemotherapy in a subset of stage II-III CRC patients from the SCOT trial to assess associations with treatment type or duration.

Study cohorts
In the internal cohort there were 1030 patients who had undergone a potentially curative resection for stage I-III CRC between 1997 and 2007 at the Glasgow Royal Infirmary, Western Infirmary or Stobhill Hospitals (Glasgow, UK, GSH/18/ON/007; Ethics No. 16/WS/0207). In the external validation cohort, there were 166 patients who had undergone a potentially curative resection for stage I-III CRC between 1992 and 2016 at St Vincent's University Hospital (Dublin, Ireland) or the Academic Medical Center (Amsterdam, The Netherlands, AMC-AJCCII-90; Ethics No. W12/011/12.17.0020). In the TransSCOT cohort there was tissue available from 3000 patients, derived from the SCOT adjuvant chemotherapy trial (ISRCTN no. 59757862; 6088 patients) who had undergone potentially curative resection for high-risk stage II or stage III CRC between 2008 and 2013 within the UK.
All patients were followed up for at least 3 years and patients who died within 30 days of surgery, had no tissue/survival data available or for whom staining missing were excluded from the analysis, providing 893 eligible patients within the internal cohort, 146 eligible patients within the external validation cohort and 1343 eligible patients within the TransSCOT cohort ( Figure 1). The study complied with the Declaration of Helsinki and individual ethics committee guidelines. It was not appropriate to involve patients or the public in the design, conduct, reporting or dissemination of this research.

Patient cohort clinicopathological characteristics
Tumours were staged using the fifth edition (UK) or seventh edition (Amsterdam and Dublin) of the AJCC/ UICC-TNM staging system. In the internal cohort, the presence of venous invasion was assessed using elastica staining. Tumour differentiation was graded as well/moderate or poor in accordance with Royal College of Pathologists guidelines [6] and additional data were taken from pathology reports. Mismatch repair status was available for the internal and external validation cohorts. Following surgery, patients with stage III or high-risk stage II disease and without significant co-morbid disease precluding adjuvant treatment were considered for 5-fluorouracil-based chemotherapy. For the TransSCOT cohort, all patients were treated with FOLFOX (bolus and infused fluorouracil with oxaliplatin) or CAPOX (capecitabine and oxaliplatin) adjuvant chemotherapy randomised to 3-or 6-months duration. Date and site of recurrence and cause of death were crosschecked using electronic case records.

Assessment of phenotypic subtypes
Phenotypic subtype measures were assessed on full sections taken from the deepest point of invasion. For the internal, Amsterdam and TransSCOT cohorts, slides were assessed by a single observer (AKR or JHP) with 10% co-scored by the other observer; interclass correlation coefficient was >0.7 for all markers. The classification was externally validated by an independent observer (SA) at St Vincent's Hospital, Dublin, with 10% co-scored by a second observer (AKR); interclass correlation coefficient was >0.7 for all markers. Local inflammatory cell infiltrate was visually assessed using the Klintrup-Makinen (KM) grade [7]. Briefly, the KM grade was assessed on a H&E section at the invasive margin; a score of 0-1 (no increase or mild/patchy increase in inflammatory cells) was graded as weak and a score of 2-3 (prominent inflammatory reaction forming a band at the invasive margin, or florid cup-like infiltrate at the invasive edge with destruction of cancer cell islands) was graded as strong. Stromal invasion was visually assessed using tumour stroma percentage (TSP) on an H&E section with a cut-off value of 50% for low (≤50% stroma present) and high (>50% stroma present) [8]. Proliferation rate was assessed using Ki67 proliferation index with automated hotspot cell counts within a single pre-determined field of view at ×400 magnification utilising the SlidePath digital image hub (Leica, UK). Proliferation rate cut-offs were assessed by receiver operating characteristic (ROC) analysis in the internal cohort and 30% deemed to be the optimal cut-off. Proliferation was graded as low (≤30%) or high (>30%).

Study endpoints
The primary endpoint was disease-free survival (DFS; measured from date of surgery/randomisation to date of recurrence at any location or death from any cause). The secondary endpoint was recurrence risk (RR: measured from date of surgery/randomisation to date of recurrence at any location or death from CRC). The

285
Histological phenotypic subtypes for CRC exploratory endpoint was associations with adjuvant chemotherapy type and duration.

Statistical analysis
The prospectively powered outcome analysis compared the immune (37%) and stromal (19%) subtypes. By using a two-sided α = 0.05 analysis and assuming a hazard ratio (HR) of 2.0 and an immune subtype prevalence of 37%, a sample size of >225 patients gave >90% power to detect a survival difference between the immune and stromal subtypes. SPSS (version 25; IBM, New York, NY, USA) was used for statistical analysis. Kaplan-Meier and log-rank analysis compared DFS or RR (adjusted for T-stage, N-stage and treatment duration when assessing chemotherapy interactions). The log-rank for overall trend was reported for all DFS and RR survival analysis. HRs and CIs were calculated from univariate Cox regression survival analysis. Multivariable Cox regression survival analysis using a backward conditional elimination model and a statistical significance threshold of P value less than 0.1 was performed to identify independent prognostic biomarkers. A Cox proportional hazards interaction model was performed to assess interactions between phenotypic subtype and treatment type/duration. The study conformed to the REMARK guidelines [9] and statistical significance was set at P value less than 0.05. All statistical tests were two-sided.

Results
The internal cohort (n = 893) was utilised to validate the prognostic utility of the phenotypic subtypes in stage I-III CRC patients (for cohort characteristics, see supplementary material, Table S1). No adjuvant therapy data was available. Median follow-up was 11.3 years (range 6.2--16.0 years) with 256 cancer deaths, 287 non-cancer deaths and a 33% recurrence rate. 305 (34%) patients had an immune subtype, 248 (28%) a canonical subtype, 186 (21%) a latent subtype and 154 (17%) a stromal subtype. The immune, canonical and latent subtypes contained older patients with earlier stage cancer, however, the stromal subtype contained younger patients with later stage cancer (see supplementary material, Figure S1).
To ensure the hierarchy of the phenotypic subtype classification developed in the pilot study was robust, the three markers utilised were entered into multivariate analysis (see supplementary material, Table S2, n = 881) with KM grade (p = 0.002) and TSP (p = 0.073) but not Ki67 (p = 0.971) demonstrated to be independently prognostic for DFS. The hierarchy of the three markers was KM grade being considered first as it was independently prognostic for DFS, then TSP as it was trending towards being independently prognostic for DFS and finally Ki67 as it showed dependence on the other two markers. Hence, phenotypic subtype was defined as follows for all cohorts ( Table 1): Immune -High KM-grade, any TSP and any Ki67; stromallow KM grade, high TSP and any Ki67; canonicallow KM-grade, low TSP and high Ki67; and latentlow for all three markers.
To address the primary endpoint, associations with DFS were assessed (Figure 2A-C). Phenotypic subtype was shown to significantly stratify DFS (HR 1.15 95% CI 1.07-1.24, p = 0.002; Figure 2A), with the immune subtype having the best outcome and the stromal subtype the worst outcome. To investigate the phenotypic subtypes in important disease stages, patients were stratified into stage II or stage III disease. Phenotypic subtype did not stratify DFS in stage II CRC (HR 1.05 95% CI 0.93-1.18; Figure 2B); however, they did significantly associate with DFS in stage III CRC, with the immune subtype having significantly improved survival (HR 1.17 95% CI 1.04-1.31, p = 0.004; Figure 2C). Next to address the secondary endpoint, associations with RR were assessed ( Figure 2D-F). Phenotypic subtype significantly associated with RR, with the immune subtype having a low risk and the stromal subtype having the highest risk (HR 1.40 95% CI 1.27-1.56, p < 0.001; Figure 1D). When assessing important disease stages, phenotypic subtype associated with RR in both stage II (HR 1.44 95% CI 1.20-1.72, p < 0.001; Figure 2E) and stage III (HR 1.19 95% CI 1.04-1.36, p = 0.004; Figure 2F) CRC. In stage II disease, only patients with a stromal subtype had a high risk of recurrence whereas, in stage III disease, canonical, latent and stromal patients were all at risk of recurrence.

287
Histological phenotypic subtypes for CRC  Figure 3A), with the stromal subtype having the worst outcome. Under multivariate analysis with TNM-stage and mismatch repair status (see supplementary material, Table S3), only phenotypic subtype (p = 0.028) was independently prognostic for DFS.
To validate the stage III findings in an up-to-date cohort, a subset of patients from the SCOT trial, the TransSCOT cohort, was utilised as it contained a high proportion of stage III patients (83%) with differing adjuvant chemotherapy regimens and durations (for cohort characteristics; see supplementary material, Table S1). All patients received FOLFOX or CAPOX adjuvant chemotherapy for at least 3 months. Median follow up was 3.0 years (range 0-7.0 years) with 339 DFS events. 208 (15%) patients had an immune subtype, 547 (41%) a canonical subtype, 197 (15%) a latent subtype and 391 (29%) a stromal subtype. To ensure this was representative of the full SCOT trial cohort, patient characteristics were compared (see supplementary material, Table S1). Both cohorts had a similar proportion of males, stage, and DFS events; the only difference was that the TranSCOT cohort had fewer rectal cancers. Therefore, the TranSCOT cohort is a reasonable representation of the overall trial population.
To address the exploratory endpoint, phenotypic subtype was interrogated for associations with adjuvant chemotherapy type and duration. Multivariate cox proportional hazards analysis to assess interactions between phenotypic subtypes and chemotherapy type or duration was performed (see supplementary material, Table S4). An interaction between phenotypic subtypes and chemotherapy type (CAPOX versus FOLFOX; p interaction = 0.011) was observed but not with duration (3 months versus 6 months; p interaction = 0.809). As the effect of chemotherapy type depends on phenotypic subtype, associations with DFS were assessed in patients stratified for chemotherapy type (Figure 4; adjusted for T-stage, N-stage and treatment duration). In patients receiving FOLFOX adjuvant chemotherapy, phenotypic subtype significantly stratified DFS (HR 1.40 95% CI 1.16-1.68, p < 0.001;  Figure 4A) with the immune subtype having the best outcome. No difference in DFS was noted in patients receiving CAPOX adjuvant chemotherapy (HR 0.98 95% CI 0.87-1.11, p = 0.745; Figure 4B). Furthermore, when differences in DFS were assessed between treatment types stratified by phenotypic subtype, the immune subtype showed a significant difference in DFS between the two regimens (HR 1.67 95% CI 1.09-2.58, p = 0.019; Figure 4C) with patients receiving FOLFOX adjuvant chemotherapy having improved outcomes. No differences in DFS between the two regimens were seen within any other phenotypic subtype ( Figure 4D-F). As phenotypic subtype specifically stratified stage III patients within the full cohort, cox proportional hazards interaction analysis was repeated within stage II and stage III patients (see supplementary material, Table S4). No interactions with phenotypic subtype and treatment duration were seen for either stage. However, phenotypic subtype did interact with treatment type in stage III disease (CAPOX versus FOLFOX; p interaction = 0.031). Therefore, patients were stratified into stage II and III disease (see supplementary material, Figure S2; adjusted for treatment duration). For patients receiving FOLFOX, DFS only associated with phenotypic subtype in stage III disease (HR 1.45 95% CI 1.20-1.76, p < 0.001; see supplementary material, Figure S2B). Furthermore, for patients with an immune subtype (see supplementary material, Figure S2C,D), treatment type only associated with DFS in stage III disease (HR 1.82 95% CI 1.13-2.91, p = 0.013; see supplementary material, Figure S2D).
Phenotypic subtype was then taken forward into multivariate analysis with clinical and therapeutic factors for DFS (Table 3). When assessing any treatment, phenotypic subtype was not independently prognostic in the full cohort (p = 0.374) or when restricted to only stage III patients (p = 0.262) compared to T-stage and N-stage. Phenotypic subtype was also not independently prognostic in any patient groups receiving CAPOX. However, for patients receiving FOLFOX, T-stage (p = 0.018), N-stage (p < 0.001) and

Discussion
The results of the present study validate that, in the general population, histological phenotypic subtypes are an effective independent prognostic classification for patients with stage III CRC. Furthermore, phenotypic subtype can independently stratify RR across stage I-III CRC, with the stromal subtype having the highest risk of recurrence. The exploratory adjuvant chemotherapy analysis suggests that phenotypic subtype is an independent prognostic classification for stage III patients receiving FOLFOX. Interestingly, stage III patients with an immune subtype appear to respond better to FOLFOX compared to CAPOX adjuvant chemotherapy, although this exploratory analysis requires validation in an independent cohort.

AK Roseweir et al
Phenotypic subtype was independently associated with DFS in stage I-III CRC patients from the internal and external validation cohorts. This was attenuated in stage III patients but lost in stage II patients, suggesting this classification may have more utility in later stage disease, and may aid clinicians when assessing adjuvant therapy. Interestingly, the immune and stromal subtype showed similar prognosis at all stages, whereas prognosis was significantly worse in stage III tumours for the canonical and latent subtypes. This agrees with previous studies reporting that TSP associates with poor prognosis in CRC and other cancers independent of stage [8,[10][11][12]. Similarly, high intra-tumoural lymphocytic infiltrate has previously been reported to associate with improved survival in CRC patients independent of stage [13][14][15][16][17]. However, survival effects for Ki67 proliferation rate vary in the literature suggesting that prognostic effects are dependent on other clinical factors as demonstrated by a lack of independence in multivariate analysis [18,19]. How-]. However, this change in the prognosis of the canonical and latent subtypes from stage II (good prognosis) to stage III (poor prognosis) disease is significant for patients, as the stage II data would suggest minimal intervention. However, if the patient recurs and progresses to stage III disease it may be too late for intervention. Therefore, should these patients be put forward for adjuvant therapy for stage II disease due to their potential future prognosis if their disease recurs and progresses?
To address this further, the current study assessed RR for the phenotypic subtypes. Phenotypic subtype independently associated with risk of recurrence across all patients with stage I-III CRC, with the stromal subtype having the highest risk of recurrence and the immune subtype the lowest risk. This could be expected as TSP is known to associate with epithelialto-mesenchymal transition (EMT) and invasion, which prepares the tumour for metastasis [8]. Therefore, any residual cells in stromal subtype tumours are likely to have undergone EMT and be primed for recurrence. However, similar to DFS, the canonical and latent subtypes had a low risk in stage II but a high risk in stage III disease. As both subtypes have a similar change, this suggests the effect is not dependent on proliferation rate, but potentially the low levels of immune infiltrate and stroma within the two groups. This is in line with previous findings where a low inflammatory infiltrate in stage III disease led to an unfavourable prognosis but no difference was seen between low and high immune infiltrate in stage II disease [20,21]. Similar changes are not seen for low stroma [12], suggesting this difference may be influenced by  293 Histological phenotypic subtypes for CRC the low immune infiltrate. This effect can be further seen in our stage II external validation cohort, where the immune, canonical and latent subtypes have a good prognosis whereas the stromal subtype has a poor prognosis similar to the internal cohort.
One difference between the results of the internal and external validation cohorts is that phenotypic subtype is an independent prognostic factor for the stage II external cohort but not for stage II patients in the internal cohort. This difference may be due to the ages of the cohorts, with the external cohort containing more recent patients with up-to-date adjuvant therapies. Furthermore, due to constraints on data available for the external cohort, only TNM-stage and mismatch repair status are included in the analysis compared to multiple clinical factors for the internal cohort. Therefore, if more clinical factors were available for the external cohort this independence may be lost. Nevertheless, the external cohort validates the prognostic effect seen in the internal cohort and the translation of the method to an independent laboratory.
To assess if modern adjuvant chemotherapy regimens affect the prognostic value of the phenotypic subtypes, they were assessed in a subset of patients from the SCOT trial. In the TransSCOT cohort, phenotypic subtype could stratify DFS; however, this was not independent of other clinical factors suggesting that the adjuvant chemotherapy regimens are diminishing the prognostic effect of the subtypes. When stratified for stage, high risk stage II patients had improved DFS across all subtypes suggesting that these patients respond well to the treatment independent of subtype. Whereas, in stage III patients, with poorer DFS, a significant difference in response to chemotherapy is observed, but again this was not independent of other clinical factors.
To decipher if this loss of independent prognostic power was due to a specific treatment type or duration, interactions were investigated. A significant interaction was observed with chemotherapy type (CAPOX versus FOLFOX) but not duration (3 versus 6 months) suggesting it may be a specific type of adjuvant chemotherapy that is affecting the prognostic value of the subtypes. No survival difference was seen for patients receiving CAPOX suggesting that all subtypes had a similar response to this adjuvant therapy. However, a significant independent difference in DFS was noted for patients receiving FOLFOX, with immune patients doing significantly better compared to other subtypes, similar to stage III patients in the internal cohorts. Furthermore, when FOLFOX patients where stratified for stage, only stage III patients had an independent association with DFS, consistent with the other cohorts.
This makes sense as the internal cohort patients were likely to have been treated with 5-FU chemotherapy as used in this regimen and the external cohort was treated with a mixture of 5-FU and FOLFOX-based regimens. However, CAPOX utilises an oral version, capecitabine, that is metabolised to 5-FU, so potentially this metabolic interaction may be interacting with components utilised to define the phenotypic subtypes.
To assess if the prognostic difference between the two treatment types was due to a specific subtype, patients were stratified into the four subtypes. Patients with an immune subtype did significantly better on FOLFOX compared to CAPOX especially in stage III patients, whereas patients with a stromal subtype trended towards better survival on CAPOX. Validation of these observations is required, but suggest it is the adaptive immune cells that interact differently with the two chemotherapy regimens. When immune cells are high and infiltrating the tumour as seen in the immune subtype, FOLFOX is favourable; however, when excluded, as seen in the stromal subtype, CAPOX is favourable, agreeing with previous literature reporting that patients with high tumour-infiltrating and circulating lymphocytes had significantly improved survival when receiving FOLFOX [22][23][24]. One hypothesis may be that, for CAPOX, the high levels of immune cells hamper the final stage of metabolism of capecitabine inhibiting its cytotoxic effect. If this observation is validated, the immune subtype is a promising marker to identify patients more likely to benefit from FOLFOX rather than CAPOX adjuvant chemotherapy. This is the first histological CRC subtyping method to have independent prognostic power in stage III patients as well as associations with risk of recurrence and adjuvant chemotherapy. The method also utilises sections routinely prepared within clinical pathology laboratories making it easy to translate to clinical practice. As well as being readily translated to clinical practice and being cost effective, phenotypic subtype can also classify all patient tumours, unlike the CMS classification where 20% of tumours are unclassifiable [1]. Phenotypic subtype assesses cell types within the tumour and microenvironment separately whereas the CMS classification looks at all cell types together. In contrast, the cancer-cell intrinsic subtypes only utilise tumour-cell specific transcriptomic analysis, which the author suggests alleviates any issues with tumour heterogeneity [2]; however, this ignores the tumour microenvironment known to be vital for both anti-tumour and protumour mechanisms. The phenotypic subtypes take all of this into account in a simple, readily translated and cost-effective way utilising routine clinical methods, unlike the above transcriptomic-based classifiers, which have failed to be translated into clinical practice due to issues with robustness/reproducibility, turnaround time, and high associated costs.
In conclusion, in the general population, the histological phenotypic subtype classification has independent prognostic power for patients with stage III CRC. Furthermore, phenotypic subtype can predict risk of recurrence for these patients, with the immune subtype having a significantly diminished risk compared to the other three subtypes. In an adjuvant chemotherapy trial population, phenotypic subtype may independently predict response to FOLFOX adjuvant chemotherapy within stage III patients, with the immune subtype having a better response to this treatment when compared to CAPOX. Going forward, the utility of these subtypes will be distinguishing the optimal treatments for each group, including established therapeutics or the development of novel interventions. Another important area of research is to develop a digital pathology approach to distinguish the phenotypic subtypes utilising deep learning to allow a quick automated analysis of the patient tissue, either by automating and integrating the current manual approaches or developing surrogate markers utilising neural classification networks. Upon further validation, the histological phenotypic subtype classification could be a useful aid in the clinic for CRC prognosis, particularly for stage III disease, identifying patients with a risk of recurrence and patients who could benefit from FOLFOX adjuvant chemotherapy. Figure S1. Proportion of common clinical characteristics between each phenotypic subtype Figure S2. Response to adjuvant chemotherapy in stage II versus stage III patients from the TransSCOT cohort Table S1. Patient characteristics for cohorts Table S2. Multivariate analysis for components of the phenotypic subtype classification for DFS and recurrence risk Table S3. Multivariate analysis of phenotypic subtypes, clinicopathological factors and disease-free survival in the external validation cohort Table S4. Multivariate interaction analysis of phenotypic subtypes, chemotherapy type and chemotherapy duration in the TransSCOT adjuvant chemotherapy cohort

296
AK Roseweir et al