Multiscale deep learning framework captures systemic immune features in lymph nodes predictive of triple negative breast cancer outcome in large‐scale studies

Abstract The suggestion that the systemic immune response in lymph nodes (LNs) conveys prognostic value for triple‐negative breast cancer (TNBC) patients has not previously been investigated in large cohorts. We used a deep learning (DL) framework to quantify morphological features in haematoxylin and eosin‐stained LNs on digitised whole slide images. From 345 breast cancer patients, 5,228 axillary LNs, cancer‐free and involved, were assessed. Generalisable multiscale DL frameworks were developed to capture and quantify germinal centres (GCs) and sinuses. Cox regression proportional hazard models tested the association between smuLymphNet‐captured GC and sinus quantifications and distant metastasis‐free survival (DMFS). smuLymphNet achieved a Dice coefficient of 0.86 and 0.74 for capturing GCs and sinuses, respectively, and was comparable to an interpathologist Dice coefficient of 0.66 (GC) and 0.60 (sinus). smuLymphNet‐captured sinuses were increased in LNs harbouring GCs (p < 0.001). smuLymphNet‐captured GCs retained clinical relevance in LN‐positive TNBC patients whose cancer‐free LNs had on average ≥2 GCs, had longer DMFS (hazard ratio [HR] = 0.28, p = 0.02) and extended GCs' prognostic value to LN‐negative TNBC patients (HR = 0.14, p = 0.002). Enlarged smuLymphNet‐captured sinuses in involved LNs were associated with superior DMFS in LN‐positive TNBC patients in a cohort from Guy's Hospital (multivariate HR = 0.39, p = 0.039) and with distant recurrence‐free survival in 95 LN‐positive TNBC patients of the Dutch‐N4plus trial (HR = 0.44, p = 0.024). Heuristic scoring of subcapsular sinuses in LNs of LN‐positive Tianjin TNBC patients (n = 85) cross‐validated the association of enlarged sinuses with shorter DMFS (involved LNs: HR = 0.33, p = 0.029 and cancer‐free LNs: HR = 0.21 p = 0.01). Morphological LN features reflective of cancer‐associated responses are robustly quantifiable by smuLymphNet. Our findings further strengthen the value of assessment of LN properties beyond the detection of metastatic deposits for prognostication of TNBC patients. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
Solid tumours engage the lymphatic system, whereby the draining lymph nodes (LNs) are often the first site of dissemination outside the primary tumour [1]. For the treatment of invasive breast cancer, the tumourdraining LNs, including the sentinel lymph node (SLN), are routinely excised, and details of the presence and size of LN metastases provide the basis for pathological staging. However, extensive dissection of all draining LNs does not reduce the mortality of breast cancer patients [2]. In recent years, surgical management of the axilla has become less radical, with fewer complete axillary clearances for those with small volume metastasis, for example of micrometastasis (≤2 mm). As demonstrated in the International Breast Cancer Study Group (IBCSG) trial 23-01, eliminating axillary dissection had no adverse effect on survival compared to axillary dissection in early breast cancer patients with ≥1 micrometastasis [3].
Above and beyond the detection of metastasis, LNs serve as immunological hubs between the tumour and the patient's systemic immunity and provide an opportunity to study systemic host defence mechanisms, both pro-and anti-tumoural, and their role in likely disease trajectories [1]. Such responses in the LNs include changes in fibroblastic architecture, the abundance of macrophages in sinuses, and hyperplasia of lymphoid follicles [4]. These morphological changes can be captured by computational approaches, such as deep learning (DL)-based algorithms. A series of competitive international challenges have demonstrated the utility of exploring digitised whole-slide images (WSI) of LNs, e.g. CAMELYON16 and 17, however, to date, with the focus only on cancer cell detection [5,6]. By manually assessing $3,000 H&E-stained LNs and patient-matched primary tumours on glass slides [7], we and others have demonstrated a higher risk of developing distant metastases, in particular in TNBC patients with low levels of tumour-infiltrating lymphocytes (TILs), when the cancer-free LNs lacked germinal centre (GC) formation [7][8][9][10]. GCs are highly organised structures with an inner B-cell follicle and an outer T-cell zone that generate long-lived memory B cells and plasma cells. Since these highly proliferative GC cells show some morphological similarities to cancer cells (e.g. size of cells), lymphoid follicle detection using convolutional neural networks (CNNs) has been proposed to exclude these areas for tumour detection in LNs [11]. However, DL-based algorithms to capture the formation of GCs and other morphometric immune responses in LNs have, so far, not been utilised, in particular to determine whether their patterns hold clinically relevant information.
CNNs have superior performance in imaging tasks to alternative models, mainly due to their efficient parameter-sharing between kernels and local connectivity properties [11,12]. Based on the standard CNN framework, fully convolutional networks (FCNs) replace fully connected layers at the end of the network with convolutional layers, enabling pixel-level classification and a low dimensional reconstruction of the input [12]. In biomedical applications, where datasets are often sparse relative to other computer vision tasks, the U-Net architecture has proven to be an effective network for segmentation [13,14]. However, whilst the standard U-Net architecture is trained on singlescale images, histopathologists analyse glass slides at multiple magnifications and integrate information from multiple fields of view when making clinical diagnostic, and prognostic factor, decisions. Thus, currently, this single-scale feature encoding of images used by most convolutional models is not commensurate with a pathologist's visual assessment of multiple fields. In light of this, recent methods integrating a multiscale feature extractor into the U-Net architecture, mimicking a histopathologist's assessment, have been developed and demonstrated superior segmentation performance on a range of medical image data modalities [15].
For this study, we built a supervised multiscale U-Net-based DL framework named smuLymphNet to capture and quantify GCs and sinuses within LNs on digitised H&E-stained WSIs of 5,228 cancer-free and involved LN sections from both LN-negative and LN-positive breast cancer patients, enriched for TNBC cases. We benchmarked smuLymphNet performance relative to manual LN assessments of four pathologists. By applying smuLymphNet to a retrospective breast cancer cohort and a clinical trial dataset, both with extensive longitudinal outcome data, we have revealed associations between smuLymphNet-predicted immune features in LNs and the risk of subsequently developing distant metastasis in TNBC patients, which could be applied to tailor clinical therapy and to expand response assessment aspects in future clinical trials for these high-risk patients.

Study design
The smuLymphNet DL-based framework (Figure 2A) to capture and quantify morphological perturbations in axillary LNs from breast cancer patients consists of (1) digitising the diagnostic LN glass slides (details of data collection are provided in supplementary methods); (2) a LN-detection algorithm to determine the boundaries of each LN section on the WSI using an Otsu-based thresholding [19] and contouring algorithm; (3) a LN metastasis classifier to determine involved or cancer-free LNs [20]; (4) a supervised DL-pipeline for the segmentation of GCs and sinuses ( Figure 2B); and (5) the quantification of the number, size, and shape of the predicted features.

DL framework
For the segmentation of GCs and sinuses in LNs, we tested three FCNs based on a U-Net architecture with symmetrical encoder-decoder paths: (1) a standard U-Net architecture with five convolution blocks and skip connections (referred to as U-Net), (2) a U-Net model with an attention mechanism that upweights salient features during training (referred to as AttenU-Net) [21], and (3) a multiscale U-Net approach that assimilates semantic information from different scales during training using atrous convolutions [22] (referred to as MS U-Net) (see Supplementary materials and methods and supplementary material, Figure S1). A single LN section was selected from 114 H&E-stained WSIs from Guy's Hospital (London, UK) and manually annotated for GCs and sinuses by a pathologist (FL). Each annotated LN was divided into a series of overlapping equally sized tiles at magnifications of Â2.5, Â5, and Â10. Models were trained on tiles from 100 LN sections, with tiles from nine LN sections used for validation and tiles from five LN sections used as a holdout test set to evaluate model performance. Details for tile preprocessing and the model training procedures are provided in Supplementary materials and methods.

Morphological feature quantification
For each LN, we captured the (1) number, (2) mean area, and (3) mean circularity of GCs (details of calculations are in Supplementary materials and methods). To estimate the overall area of the sinuses within a LN, the smuLymphNet-captured sinuses were summed and normalised by overall LN size, due to a statistically significant correlation between the LN size and overall sinus area (supplementary material, Figure S2).

Interpathologist concordance
A single LN on 24 WSIs was randomly selected for manual annotation by four pathologists (FL, ASh, SR, PG) using QuPath version 0.3.0. [23]. The ground-truth binary masks of the 24 pathologist-annotated LNs were compared for every pair of pathologists using the Dice coefficient [24].

SCS quantification
A heuristic method was implemented to calculate the width of the SCS. Four points were selected based on a reference axis along the LN, chosen as the longest diameter across the LN section. Two points were  CONSORT diagram outlining data selection process. A retrospective dataset of 1,800 WSIs from 177 breast cancer patients from Guy's Hospital (London, UK) was retrieved. All WSIs were visually inspected, and WSIs with poor quality or pen marks were removed. Based on the LN section level classification obtained from the patient's histology report and after a visual assessment of all sections pertaining to a single LN, a single LN section level for each LN was selected. WSIs from three LN-positive breast cancer patients were removed as the histological subtypes for these carcinomas were missing. The final dataset was split into three groups based on histological breast cancer subtype, namely in triple negative (TNBC), HER2-positive/ER-negative, and ER-positive (HER2-positive and HER2-negative) breast cancer patients.
DL framework captures prognostic morphological lymph node features 379 where w is the diameter at each of the selected points on the SCS.

Statistical analyses
Standard summary statistics were calculated to establish associations between morphometric immune features and patients' prognosis. In the Guy's and Tianjin cohorts, the primary endpoints were distant metastasisfree survival (DMFS), defined as the date of first invasive recurrence or second primary tumour or death from any cause. The endpoint in the Dutch-N4Plus series was distant recurrence-free survival (DRFS). Given the large number of LN sections per patient in the Guy's cohort, morphometric features were averaged across all assessed LNs for outcome analyses. In the Dutch-N4Plus series, the total number of GCs across all assessed LNs and the LN with the maximum sinus area were used to assess outcome, as a much lower number of LNs was available per patient. We performed an iterative process to determine the optimal cut-off points by a minimal p value approach [25]. Kaplan-Meier methods were used to compare survival curves across groups. Cox regression proportional hazards models were performed to estimate the hazard ratios according to clinicopathological and histologically assessed features across all endpoints in univariate and multivariate analyses. The statistical significance of features was assessed using the loglikelihood ratio test across all cohorts, whereby a two-sided p < 0.05 was considered significant. We used the statistical language R (version 4.1.1) to calculate the statistics [26].

A multiscale embedded DL framework to capture immune responses in digitised LN WSIs
The DL framework smuLymphNet is illustrated schematically in Figure 2A. Amongst the three FCN models tested ( Figure 2B, supplementary material, Figure S1), the MS U-Net model performed the best at an input tile magnification of Â10 when features were learned at a combined magnifications of Â10, Â5, and Â2.5, with 0.86 (standard error [SE] = 0.04) and 0.74 (SE = 0.05) Dice coefficients for GC and sinus segmentation respectively. The highest Dice coefficients for the single-scale U-Net and AttenU-Net models were 0.69 (SE = 0.17) and 0.62 (SE = 0.06) for GC and sinus segmentation, respectively, at an input tile magnification of Â2.5 ( Figure 2C). Given that we have previously demonstrated the prognostic utility of GC number [7], we calculated an F1 score of predicted GC count and showed that the MS U-Net model achieved 91%. To test the model's ability to generalise across various staining and acquisition protocols, we applied smuLymphNet to five LN WSIs obtained from two other hospitals (Barts, Tianjin). We observed that the Dice coefficients of GC segmentation were 0.78 (SE = 0.02) and 0.64 (SE = 0.09) for the Barts and Tianjin scanned LNs, respectively. For the smuLymphNet-captured sinuses, the Dice coefficients decreased to an average of 0.55 (SE = 0.04) and 0.64 (SE = 0.05) for the Barts and Tianjin LNs, respectively ( Figure 2D). Next, smuLymphNet was applied to digitised WSIs of LN sections from sentinel LN biopsies from six breast cancer patients diagnosed at Guy's Hospital, demonstrating its capability to capture GCs and sinuses on SLNs, now more commonly assessed in the current standard of care management of invasive breast cancer patients (supplementary material, Figure S3).

Interpathologist concordance assessment of GCs and sinuses
Pathologist annotations provide the gold standard to which DL models are compared. To contextualise smuLymphNet's performance, we determined the interpathologist variability in assessing GC and sinuses. was found (range, 0-92, 0-39 and 0-87, Wilcoxon rank sum test, p < 0.001, Figure 4). Involved LNs displayed GCs with larger mean areas of 0.065 mm 2 compared to 0.056 mm 2 cancer-free LNs in LN-positive patients (Wilcoxon rank sum test, p < 0.01, Figure 4), but this did not differ significantly from the mean area in cancer-free LNs of LN-negative patients. GC area was highly correlated with mean GC numbers (Pearson's r = 0.83, p < 0.001, data not shown). The GC circularity was highest in cancer-free LNs of LN-negative TNBC, followed by cancer-free and then involved LNs of LN-positive TNBC (Wilcoxon rank sum test, p < 0.001, Figure 4). Overall, involved LNs had, on average, more GCs, with larger surface areas and more irregularities than cancer-free LNs.
Having previously shown that manual assessment of GCs in LNs carries prognostic value in LN-positive TNBC [7], we asked whether smuLymphNet-quantified GCs held prognostic value for these high-risk patients. In the Guy's cohort, LN-positive TNBC patients had   Table 1 and supplementary material, Figure S5C). A similar association was observed in LN-negative TNBC patients (cancer-free LNs HR = 0.14, 95% CI: 0.03-0.63, p = 0.002; Table 1 and supplementary material, Figure S5C). In multivariate models, when adjusted for age at diagnosis, histological grade, and number of involved LNs, the binary cut-off for GCs in cancer-free LNs remained   Figure S5C). When assessing GC circularity, regular GC formation in involved LNs of LN-positive TNBC patients was associated with a superior prognosis (univariate HR = 0.34, 95% CI: 0.13-0.85, p = 0.03, and adjusted for age at diagnosis, histological grade, and number of involved LNs HR = 0.26, 95% CI: 0.08-0.86, p = 0.03; Table 1). Taken together, patients with LNs harbouring fewer GCs, and as such smaller areas and of irregular shapes, had a higher risk of developing distant metastases.
To reduce the risk of biased performance estimation [27] of our smuLymphNet methodology, we next evaluated its performance in an external cohort of 174 involved and cancer-free LNs from 95 LN-positive TNBC patients of the Dutch-N4plus trial [28]. In this trial, breast cancer patients with at least four involved LNs but without distant metastases at diagnosis had been randomised to conventional 5-fluorouracil-epirubicincyclophosphamide (FEC) chemotherapy or the same therapy but whose last course was replaced by high-dose cyclophosphamide-thiotepa-carboplatin (CTC) chemotherapy with autologous stem cell support. Although only 18/95 (19%) TNBC patients of the Dutch-N4plus trial had >2 GCs in all of their patients' available LNs, these had a superior DRFS; however, due to the limited cohort size, this did not reach statistical significance (HR = 0.52, 95% CI: 0.24-1.13, p = 0.097; supplementary material, Table S3A and Figure S5D).
Increased smuLymphNet-captured sinus areas are present in LNs of patients with longer time to distant recurrence Utilising the smuLymphNet framework, we assessed the sinus areas in cancer-free and involved LNs. Amongst all assessed LNs of TNBC patients, the normalised sinus area was on average 0.14 mm 2 (range, 0-0.41 mm 2 ) ( Figure 5A); however, this increased significantly when LNs displayed any GC formation ( Figure 5B, Wilcoxon rank sum test, false discovery rate-adjusted p < 0.001). In HER2-positive/ER-negative and ER-positive breast cancer patients, normalised sinus areas were more variable, which may be inflated due to the small cohort sizes (supplementary material, Figure S6A,B). Nevertheless, LNs with GCs displayed larger normalised sinus areas (supplementary material, Figure S6C).
Next, we tested whether a normalised sinus area across all assessed LNs was associated with prognosis in TNBC patients. As shown in Figure 5C, in the Guy's cohort LN-positive TNBC patients with involved LNs with normalised sinus area >0.13 mm 2 had a better DMFS in univariate analyses (HR = 0.32; 95% CI: 0.15-0.67, p = 0.002; Table 2A; the optimal cut-off curves are shown in supplementary material, Figure S7). In multivariate models, when adjusted for mean GC count, age at diagnosis, histological grade, and number of involved LNs, this binary cut-off for sinus area in involved LNs remained statistically associated with DMFS (HR = 0.391; 95% CI: 0.16-0.95; p = 0.039, Table 2A). In the Dutch-N4plus trial, external validation cohort, TNBC patients with LNs displaying smuLymphNet-quantified sinus area greater than 0.13 mm 2 had an overall superior DRFS in univariate analyses (HR = 0.44; 95% CI: 0.22-0.90, p = 0.024; supplementary material, Table S3B, Figure 5D) and added prognostic value to stromal TILs in multivariate analyses (HR = 0.50, CI: 0.25-1.02, covariate p = 0.056 and likelihood p = 0.043; supplementary material, Table S3B). Of note, the predefined cut-off of 0.13 mm 2 sinus area derived originally from LNs of the Guy's cohort was used for sinus area assessment in the LNs of the Dutch-N4plus trial.

Pathologists' assessment validates prognostic value of sinus area
To orthogonally validate the prognostic value of smuLymphNet-captured sinuses, the width of the SCS, as a surrogate of the sinus area, was manually assessed by a pathologist in an independent set of LN-positive TNBC patients from the previously examined Tianjin cohort [7]. SCS located beneath the LN capsule reflects on the overall LN conduit activities. A width heuristic was calculated using four positions of the LN to establish an average SCS width for each LN ( Figure 5E). The statistics of the manual assessment of SCS are shown in supplementary material, Table S2, and detailed in the methods. An increased SCS width of ≥20 μm across all the assessed LNs was associated with a superior prognosis (Table 2B, Figure 5F). In multivariate models, when adjusted for total GC count, patient age, pathological tumour size (pT), LN stage (pN), the presence of lymphovascular invasion and of tertiary lymphoid structure, and stromal tumour-infiltrating lymphocytes, which had been shown to be associated with DMFS in this cohort [7], the binary cut-off for SCS widths remained statistically associated with DMFS in involved LNs (HR = 0.33, 95% CI: 0.13-0.89, p = 0.029, Table 2B) and cancer-free LNs (HR = 0.21, 95% CI: 0.06-0.69, p = 0.01, Table 2B).

Discussion
In this retrospective study, we developed a fully supervised DL framework, smuLymphNet, demonstrating that a multiscale U-Net architecture could robustly capture morphological immune structures from digitised images of routine H&E-stained slides from both axillary and

384
G Verghese, M Li, F Liu et al sentinel LNs with high accuracy comparable to interpathological assessments. In alignment with our published studies [7,8], our smuLymphNet framework recapitulated the finding of the prognostic value of the assessment of GC formation in LN-positive TNBC and has now extended this association to LN-negative TNBC patients. We revealed, for the first time, the prognostic significance of the morphological assessment of intranodal lymphatic sinuses in involved LNs in two independent TNBC cohorts, both by our DL-based methodology and by manual assessment. Lastly, we demonstrated that these morphological features in LNs During cancer-induced immune responses, LNs enlarge and remodel, featuring GC formation and growth of lymphatic sinuses (lymphangiogenesis), even before metastatic deposits are detected [7,29]. In our study, both increased total sinus surface area and SCS width in LNs were associated with a better prognosis. A layer of CD169+ macrophages lines the SCS and is strategically positioned at the lymph-tissue interface to capture pathogens as they enter the LN [30]. This impedes the systemic dissemination of pathogens [31] and allows the presentation of intact antigens to B cells that reside directly underneath the SCS macrophage layer for the initiation of humoral immune responses and, in turn, to initiate GC formation. A high density of CD169+ macrophages in the LN sinus has been shown to be predictive of better clinical prognosis in some tumours [32]. During tumour progression, SCS macrophages become depleted and dissociate from the SCS [33]. Consequently, the width of the SCS narrows, in alignment with our observation that the SCS width was narrower in involved compared to cancer-free LNs. Exploring the lymphatic sinus further with computational pathology approaches could provide additional clinically relevant diagnostic information and complement micro-CT-guided lymphangiography in breast and other cancers.
Segmentation of substructure-specific morphological properties, such as sinuses, from H&E-stained images is challenging for artificial intelligence (AI)-based methods. The multiscale CNNs are trained by integrating surrounding context, morphology, and cellular information originating from different magnification levels in the learned feature representation of the network [34]. We showed that the multiscale architecture Another key challenge in computational pathology, which has slowed the adoption of this new paradigm in a clinical context, is the curation of large-scale datasets that capture the inherent technical variability of WSIs from multiple institutes. This is particularly challenging in the face of supervised methods where the necessity of obtaining detailed annotated data for the development and validation of neural networks is not feasible [25]. As such, methods that use weak supervision to obviate the laborious task of obtaining detailed pathologists' annotations may provide a more efficient way to train these models on large scale datasets. To evaluate our model's performance fairly, we included the subjectivity of four pathologists' manual assessments and observed moderate interobserver agreement to recognise these polymorphic substructures, illustrating the difficulty of defining the ground truth for such tasks. Potentially, methods for generating crowd-sourced noisy labels at scale and more sophisticated machine learning techniques to leverage them may provide new opportunities [35].
Since the detection of LN metastasis is critical for the diagnosis and staging of many solid tumours, pathology modernisation programmes in hospitals have started to evaluate AI-based software tools with the aim of supporting pathologists' workload and improving the speed of assessment of microscopic examination. In the era of immunotherapy as a treatment choice for TNBC [36], the LN can potentially be used as an observation window for the patient's systemic immune responses of prognostic value [7,8]. Currently, computational assessment of TIL counts is proposed as a prognostic biomarker for TNBC [37] by themselves or integrated into nomograms with established prognostic variables [38] through a public Grand Challenge organised by the International Immuno-Oncology Biomarker Working Group (www. tilsinbreastcancer.org). Although we have demonstrated the generalisation of our models and have orthogonally validated our observations in an external cohort, further large-scale validation of these findings, ideally in clinical trials in neoadjuvant and adjuvant settings, potentially facilitated at scale by federated or swarm-learning-based approaches [6,39,40], would be invaluable.

Conclusions
Multiscale DL approaches are well suited to capture and quantify cancer-associated alterations in axillary LNs on digitised WSIs. By assessing LNs above and beyond the presence and size of cancer cell deposits, our end-to-end smuLymphNet framework provides a tool to advance TNBC patient stratification, management, and prognostication and has the potential to benefit clinical practice. responsible for review and editing. AS was responsible for conceptualization, methodology, software, validation, formal analysis, investigation, data curation, review and editing, visualisation, and funding acquisition. AG was responsible for conceptualization, methodology, software, validation, formal analysis, investigation, data curation, original draft preparation, review and editing, visualisation, and funding acquisition.

SUPPLEMENTARY MATERIAL ONLINE
Supplementary materials and methods Figure S1. smuLymphNet pipeline and network architecture for FCN models Figure S2. Correlation between sinus area and total LN area  Table S1. Clinical characteristics of patients from Dutch-N4Plus TNBC cohort (n = 95) Table S2. Clinical characteristics and immune features in LNs of patients from Tianjin TNBC cohort Table S3. Univariate and multivariate Cox proportional hazard analyses in Dutch-N4Plus TNBC cohort DL framework captures prognostic morphological lymph node features 389