A digital score of tumour-associated stroma infiltrating lymphocytes predicts survival in head and neck squamous cell carcinoma

The infiltration of T-lymphocytes in the stroma and tumour is an indication of an effective immune response against the tumour, resulting in better survival. In this study, our aim is to explore the prognostic significance of tumour-associated stroma infiltrating lymphocytes (TASILs) in head and neck squamous cell carcinoma (HNSCC) through an AI based automated method. A deep learning based automated method was employed to segment tumour, stroma and lymphocytes in digitally scanned whole slide images of HNSCC tissue slides. The spatial patterns of lymphocytes and tumour-associated stroma were digitally quantified to compute the TASIL-score. Finally, prognostic significance of the TASIL-score for disease-specific and disease-free survival was investigated with the Cox proportional hazard analysis. Three different cohorts of Haematoxylin&Eosin (H&E) stained tissue slides of HNSCC cases (n=537 in total) were studied, including publicly available TCGA head and neck cancer cases. The TASIL-score carries prognostic significance (p=0.002) for disease-specific survival of HNSCC patients. The TASIL-score also shows a better separation between low- and high-risk patients as compared to the manual TIL scoring by pathologists for both disease-specific and disease-free survival. A positive correlation of TASIL-score with molecular estimates of CD8+ T cells was also found, which is in line with existing findings. To the best of our knowledge, this is the first study to automate the quantification of TASIL from routine H&E slides of head and neck cancer. Our TASIL-score based findings are aligned with the clinical knowledge with the added advantages of objectivity, reproducibility and strong prognostic value. A comprehensive evaluation on large multicentric cohorts is required before the proposed digital score can be adopted in clinical practice.


Introduction
There are around 900,000 annual new cases of head and neck cancers worldwide and 450,000 annual deaths [1]. Head and neck squamous cell carcinoma (HNSCC) accounts for approximately 90% of head and neck cancers and is the sixth leading cancer by incidence worldwide [2]. HNSCC predominantly develops in the epithelial lining of the oral cavity, sinonasal tract, pharynx, and larynx [3]. Major risk factors include tobacco smoking, tobacco chewing, alcohol consumption and human papillomavirus infection [4], [5]. The prognosis of HNSCC remains poor with a 28-67% chance of survival at five years [6] highlighting the need for novel biomarkers and objective quantitative analysis of any potential prognostic markers to stratify patients into appropriate risk groups and identify those who may benefit from aggressive treatment from those that can be put under surveillance [7].
Tumour infiltrating lymphocytes (TILs) have been shown to be of prognostic significance for HNSCC [8], [9, p. 3]. TILs are not routinely quantified in diagnostic practice although some methods for manual TIL scoring on Haematoxylin and Eosin (H&E) stained tissue sections have been reported in the literature. However, this quantification process is subjective and prone to inter-and intra-observer variability. Recently, the International Immuno-Oncology Biomarker Working Group has developed guidelines for TIL assessment in breast cancer [10] to standardise and obtain a more reproducible and objective TIL score. However, there are no such guidelines for TIL assessment in HNSCC. Therefore, a computer-based automated method may help to eliminate the subjectivity through objective TIL quantification and assist with prognostic stratification.
The emerging area of computational pathology has seen a surge of interest in recent years with a variety of algorithms proposed in the literature for detection of lymph node metastasis in breast cancer [11], tumour detection [12] and segmentation [13]- [15], cancer grading [16], [17] and prognostics [18], [19], to list only a few. There have been some studies on automated quantification of lymphocytic infiltration in whole slide images (WSIs) of H&E stained histology tissues slides of different cancers. For instance, Saltz et al. [20] quantified the spatial patterns of lymphocytes in WSIs, independent of tumour location, to investigate their prognostic significance in fourteen different cancer types. Maley et al. [21] reported immune-cell-cancer colocalisation as a prognostic factor for breast cancer. Nawaz et al. [22] used hotspot analysis to identify the statistically significant regions of cancer and immune cells. Shaban et al. [19] quantified the abundance of TILs for disease-free survival TASIL-score predicts survival in head and neck squamous cell carcinoma 4 (DFS) analysis of oral squamous cell carcinoma (OSCC) patients. Most existing automated quantification methods either only consider lymphocytes or lymphocytic infiltration in tumour regions. However, some clinical studies [23], [24] have reported the prognostic significance of lymphocyte infiltration in tumour-associated stroma (TAS) in HNSCC.
We propose a novel deep learning based objective quantification method of lymphocytic infiltration in TAS (see Figure 1). The proposed method calculates the percentage of TAS colocalised with lymphocytes that we term as the TASIL-score. We evaluate the prognostic significance on three independent patient cohorts (n=537 cases in total). The proposed TASIL-score shows prognostic significance (p=0.002) for disease-specific survival (DSS) of HNSCC patients. The TASIL-score is also a prognostic indicator for DSS and DFS in two OSCC and oropharyngeal squamous cell carcinoma (OPSCC) cohorts. We have also compared the predictive ability of TASIL-score based survival model with existing quantification methods through a concordance index measure where the TASIL-score achieved the highest concordance compared to its counterparts. The main highlights of this work are as follows: • A novel digital score of tumour-associated stroma infiltrating lymphocytes (TASIL-score) carries prognostic significance for DSS of oral, oropharyngeal, and head and neck SCC patients.
• The TASIL-score is a strong prognostic indicator for DSS and DFS of OSCC and OPSCC.
• The TASIL-score shows a better separation between low-and high-risk patients compared to manual TIL scoring by pathologists for DSS and DFS.
• The TASIL-score also shows a positive correlation with molecular estimates of CD8+ T cells, which is in agreement with existing findings [23].
The TCGA-HN cohort (C1) is a publicly available dataset which comprises 450 diagnostic H&E WSI of squamous cell carcinoma (SCC) cases from different sites of the head and neck (H&N). However, many of these slides suffer from preparation and scanning artefacts. After excluding cases with poor quality of slides, our final TCGA-HN cohort consists of 342 cases with one WSI per case. The H&E WSIs from these cases have mostly been scanned at 40 with some scanned at 20. The SKM cohort (C2) comprises of 100 OSCC cases collected from the Shaukat Khanum Memorial Hospital and Research Centre (SKMCH&RC) Lahore, Pakistan [19]. The PredicTR1 cohort (C3) contains 95 OPSCC cases collected from six different centres across the United Kingdom [26]. The representative H&E tissue sections for both SKM and PredicTR1 cohort were all scanned at 40 magnification.

Survival Information
The DSS information was available and obtained for most of the selected cases in all three cohorts.
The DSS time is calculated from the date of diagnosis to the date of death or the date of the last followup in case of censored data. The DFS information was only available for C2 and C3 cohorts. The DFS is censored at the date of first recurrence or death, whichever occurred first, or the date of the last contact for the patients alive without recurrent disease. A detailed description of patient characteristics and available clinical and pathological parameters can be found in the supplementary materials document.

Quantification of tumour-associated stroma infiltrating lymphocytes
We propose an objective and automated score for the quantification of tumour-associated stroma infiltrating lymphocytes, namely the TASIL-score, which quantifies lymphocytes in the vicinity of TAS using the spatial co-occurrence statistics of both TAS and lymphocytes in a WSI. The TASIL-score is 7 computed in two steps: segmentation of WSIs into clinically significant tissue types and calculation of the spatial co-occurrence statistics.
In the first step, we divided a given WSI into small sub-images (tiles or patches) and employed a deep learning based patch classification method to segment the WSI into four different types of regions: tumour, TAS, lymphocytes, and all other tissue regions as non-regions of interest (Non-ROIs). In the second step, spatial co-occurrence statistics are calculated using the co-occurrence analysis of different types of patches in a WSI. A patch adjacent to another patch in any direction was considered as an instance of co-occurrence. First, six different patch co-occurrence patterns are defined based on three clinically significant tissue types, as shown in Figure S1. Then the TASIL-score is calculated as follows: where represents the number of times tumour associated stroma and lymphocytes patches cooccur in a WSI. Similarly, and denote the number of co-occurrences of TAS patches with other TAS patches and tumour patches, respectively. The TASIL-score ranges from 0 to 1, where 0 represents no infiltration and 1 represents high degree of lymphocytic infiltration in the tumour associated stroma.

Statistical Methods and Data Analysis
Survival analysis was performed with DSS and DFS data. The Kaplan-Meier estimator was used and the log-rank test performed to stratify patients into low-and high-risk groups (log-rank test based pvalues were calculated to assess the significance of the various features including the proposed TASILscore). The Cox proportional hazard regression model was used for univariate and multivariate analyses and 95% confidence intervals were computed. Spearman rank-order correlation coefficient was used for correlation analyses between TASIL-score and molecular estimates, and between TASIL-score and the manual TIL score assigned by a pathologist. The concordance statistics were used to compare the different automated quantification scores for DSS analysis.

Results
Automated segmentation of Whole Slide Images for quantification of TASIL-score A deep learning based segmentation algorithm was trained and evaluated on WSIs from C1 and C2 cohorts, where 10 WSIs were used for training and 2 for validation from each cohort. More than 179K patches (141K for training and 38K for validation) were annotated by an expert pathologist (SAK). The segmentation method achieved an average accuracy of 0.85 and macro F1-score of 0.83. Quantitative and visual results for WSI segmentation are presented in Table S5 and Figure Figure S3).
Higher TASIL-score is associated with better disease-specific survival of HNSCC

Patients
We investigated the prognostic significance of the deep learning based TASIL-score for DSS of HNSCC patients. First, we considered C1 as a discovery cohort and both C2 and C3 as a joint validation cohort.
Patients in the validation cohort were divided into two groups based on the TASIL-score using an optimal threshold value obtained from the analysis of the discovery cohort. We found that the patient  Figure 2. These curves show a clear separation between low-and high-risk patient groups when stratified using the TASIL-score. TASIL-score is a prognostic indicator for the SCC of both oral and oropharynx sites The C2 and C3 cohorts were curated from only the oral cavity and oropharynx, respectively. Therefore, we investigated the prognostic significance of TASIL-score for patients with SCC of a specific site. First, the OSCC cohort (C2) was considered as a discovery cohort and the OPSCC cohort (C3) was considered as a validation cohort. Supporting our aforementioned findings, the TASIL-score remains prognostically significant (p=0.000159, HR = 0.20, 95% CI 0.08-0.49) for OPSCC patient stratification into low-and high-risk groups for DSS. Second, we swapped the two cohorts considering the OPSCC cohort (C3) as the discovery cohort and the OSCC cohort (C2) as a validation cohort. We found that the TASIL-score based OSCC patient stratification again proved prognostically significant (p=0.000935, HR = 0.08, 95% CI 0.01-0.65). Repeating the experiments to evaluate the prognostic significance of the TASIL-score for DFS follows the same pattern where the TASIL-score stratifies patients in prognostically significant low-and high-risk groups. Patient stratification into low-and high-risk groups is presented in Figure 3 through

TASIL-score is independent of clinical and pathological variables
We investigated the prognostic significance of the TASIL-score through multivariate analysis in the presence of clinical and pathological variables. The C2 cohort was considered for multivariate analysis as it has more clinical and pathological parameters along with both DSS and DFS information (Table   S2)  the total number of patients in stage IVb and IVC are 9 and 1, respectively, which is quite small compared to the total number of patients ( Figure 4).  TASIL-score shows a better separation between low-and high-risk patients compared to the manual TIL score The pathologist score for tumour/stroma infiltrating lymphocytes is usually a categorical score with low, moderate, and high infiltration categories. Only low and high categories show prognostic significance in the C3 cohort for DSS and DFS, as shown in Table S3 . We merge the moderate category in two different ways to split the patients into low-and high-risk groups, see Figure 5. The Kaplan-Meier curves in Figure 5 show that the TASIL-score stratifies patients into low-and high-risk groups with better prognostic significance compared to the manual TIL score in both DSS and DFS analyses.

Comparison with existing digital scores
In recent years, researchers have proposed several automated quantification methods for colocalisation in different types of cancers to develop a digital prognostic biomarker [19], [21], [27], [28]. The use of computerised methods addresses the issue of subjectivity and produces objective and reproducible quantification scores. Geessink et al. [28] presented tumour to stroma ratio (TS-Ratio) for rectal adenocarcinoma and showed that a higher TS-Ratio had prognostic association with poor patient survival. Maley et al. [21] used an ecological measure for quantification of immune-cancer cell colocalisation (IC-Colocalisation) in breast cancer. They found that the higher colocalisation of immune and tumour cells was associated with better patient survival.
Similarly, Shaban et al. [19] proposed a tumour infiltrating lymphocytes abundance (TILAb) score for OSCC which showed prognostic significance for DFS in both univariate and multivariate analysis. We Disease-specific survival Disease-free survival

TASIL-score is correlated with molecular estimates of CD8+ T cells
We further investigated the correlation of the proposed TASIL-score with molecular estimates of immune cell fractions in the TCGA-HN cohort (C1). Throsson et al. [29] have estimated the fraction of 22 immune cell types in the histology slides of patients in the TCGA cohort using gene expression data through CIBERSORT. We used those estimates for the correlation analysis with our TASIL-score. The immune subtypes were grouped based on nine different immune cell types: dendritic, mast,

neutrophils, eosinophils, monocytes, macrophages, natural killer cells, T cells and B cells. Our main
finding is that the TASIL-score shows a moderate but highly significant positive correlation with T cell estimates and negative correlation with macrophage estimates (Figure 7).
Moreover, CD8+ T cell fraction shows the highest positive correlation among all immune subtypes (Table 2), which may indicate that the lymphocytes in the vicinity of the TAS are mainly CD8+ T cells [30]. An explanation for the value of correlation between TASIL-score and cell fractions not being very high is that TASIL-score and the molecular estimates are computed on formalin-fixed paraffinembedded (FFPE) and fresh frozen tissue sections, respectively. Both tissue sections belong to tissue blocks from the same patient, but their exact spatial relation is unknown. However, a good correlation of TASIL-score with CD8+ T cell indirectly validates the significance of the proposed score.

Figure 7: Spearman correlation between TASIL-score and molecular estimates of Macrophages (left) and T Cells (right)
fractions.

Discussion
We have proposed a deep learning based objective measure, namely the TASIL-score, for quantification of tumour-associated stroma infiltrating lymphocytes in digitised images of HNSCC tissue slides. We found that higher value of the TASIL-score was associated with better DSS of HNSCC patients ( Figure 2) and with both DSS and DFS of OSCC and OPSCC patients (Figure 3). The TASIL-score was independent of clinicopathological parameters for DSS and DFS of OSCC patients (Table 1). It also showed better separation between low-and high-risk OPSCC patients compared to manual TIL score assessed by an expert pathologist ( Figure 5). We compared the TASIL-score with the existing automated quantification methods for stroma and lymphocytic quantification with respect to the tumour. The TASIL-score achieves high concordance score compared to its counterparts ( Figure 6) and also showed a moderate but highly significant correlation with molecular estimates of CD8+ T cells ( Table 2).
Most of the existing automated quantification methods were developed for the quantification of lymphocytes or stroma in relation to the tumour such as stroma to tumour ratio in breast and ovarian cancer [27], [28], lymphocyte and tumour colocalisation in breast cancer [21], and abundance of tumour infiltrating lymphocyte in OSCC [19]. However, automated quantification of stromal TILs has not been fully explored yet. To the best of our knowledge, the proposed TASIL-score is the first automated quantitative score of lymphocytic infiltration in the TAS of HNSCC.
Salgado et al. [10] found that stromal TILs are a superior and more reproducible prognostic parameter compared to intratumoral TILs in breast cancer. Xu et al. [24] also reported that stromal TILs were of clinical relevance as a high stromal TIL score was associated with better patient prognosis for HNSCC.
Furthermore, stromal TILs have been shown to be an independent risk factor for DSS and DFS of HNSCC patients. Our automated TASIL-score has shown similar prognostic significance pattern, see Figure 2 and Figure 3.
In clinical practice, both H&E and immunohistochemistry (IHC) staining are used for manual TIL scoring. The TASIL-score based results are in agreement with previous findings based on H&E and IHC based TIL quantification [23], [31], [32]. In Figure 5, we have shown that both manual H&E based pathologist score and TASIL-score carry prognostic significance for DSS and DFS of OPSCC. However, TASIL-score shows better separation between Kaplan-Meier curves of low-and high-risk patients of OPSCC. Ruiter et al. [31] conducted a meta-analysis of IHC based studies to investigate the prognostic value of T cells in HNSCC. They found a favourable prognostic role of CD3+ and CD8+ T cell infiltration in HNSCC patients. Balermpas et al. [23] found that high CD8 expression in tumour stroma is a prognosticator for HNSCC. The proposed TASIL-score also shows a positive and highly significant correlation with genomic estimates of CD8+ T cells in HNSCC patients in the TCGA-HNSCC cohort.
T lymphocyte infiltration in the stroma and tumour indicates an effective immune challenge to the tumour and is related to better outcome and treatment response. The proposed TASIL-score based findings are aligned with the clinical knowledge with the added advantages of objectivity, reproducibility, and strong prognostic value. Therefore, the automated, objective, and quantitative TASIL-score has the potential to provide valuable insights into tumour behaviour and prognosis in an efficient and consistent manner. Although we validated our method on three different cohorts (n=537 cases in total), a comprehensive evaluation on large multicentric cohorts is required before the proposed digital score can be adopted in clinical practice.

Patient Characteristics
In TCGA-HN cohort (C1), most of the patients were diagnosed between 2007 to 2013, and the average age of the patients is 61.09 year with a standard deviation of 11.82. There are more male patients (n=252) compared to female patients (n=90). The distribution of TNM-stage of the cases is somewhat skewed toward higher stages, with 52% cases from stage IVa patients. The ratio of alive and deceased patients is also imbalanced, with only 25% deceased cases, and the average DSS of all patients is 28.69 months with a standard deviation of 24 months. Detailed statistics of the cohort are presented in

Pathologist Annotations
We used 24 cases, one WSI per case, for training and evaluation of the coarse segmentation method.
Half of the cases are taken from the C1 cohort, and the remaining half were selected from C2 cohort.

Whole Slide Image Segmentation
The architecture our patch based segmentation is presented in Figure S3 which consists of multiple convolution, pooling, and feature concatenation layers. All the convolution layers preceded by a batch norm and ReLU based activation layers apart from the first one where these two layers are used after the convolution layer. The main building block of the proposed network is the dense block which consists of multiple pair of convolution layers where the depth of a block depends on the number of iterations selected for the block. In each iteration, the pair of convolution layers converts the input feature-map into 32 channels feature-map and concatenate it with the input feature-map. The last convolution layer followed by softmax layer takes the output feature-maps of all the dense blocks through skip connections after spatial average pooling, if required, and outputs the probability maps for the given number of classes.
We compared our segmentation method with existing patch-based segmentation methods using average accuracy and macro F1-score based metrics. Our proposed segmentation method achieved a superior performance compared to standard patch-based segmentation methods (see Table S5).
Visual results of our segmentation method are presented in Figure S4, which shows good segmentation of some of the main constituents of the tumour microenvironment. 1