Lack of reproducibility of histopathological features in MYC-rearranged large B cell lymphoma using digital whole slide images: a study from the Lunenburg lymphoma biomarker consortium

of consensus guidelines, the results show a high degree of discordance among the eight expert pathologists. Approximately 50% of the cases lacked a majority score, and this discordance spanned all six histopathological features. Moreover, none of the histological variables aided in prediction of MYC single versus double/triple-hit or immunoglobulin-partner FISH-based designations or clinical outcome measures. Conclusions : Our ﬁndings indicate that there are no speciﬁc conventional morphological parameters that help to subclassify MYC -rearranged LBCL or select cases for FISH analysis, and that incorporation of FISH data is essential for accurate classiﬁcation and prognostication.

Lack of reproducibility of histopathological features in MYC-rearranged large B cell lymphoma using digital whole slide images: a study from the Lunenburg lymphoma biomarker consortium Aims: Subclassification of large B cell lymphoma (LBCL) is challenging due to the overlap in histopathological, immunophenotypical and genetic data. In particular, the criteria to separate diffuse large B cell lymphoma (DLBCL) and high-grade B cell lymphoma (HGBL) are difficult to apply in practice. The Lunenburg Lymphoma Biomarker Consortium previously reported a cohort of over 5000 LBCL that included fluorescence in-situ hybridisation (FISH) data. This cohort contained 209 cases with MYC rearrangement that were available for a validation study by a panel of eight expert haematopathologists of how various histopathological features are used. Methods and results: Digital whole slide images of haematoxylin and eosin-stained sections allowed the pathologists to visually score cases independently as well as participate in virtual joint review conferences. Standardised consensus guidelines were formulated for scoring histopathological features and included overall architecture/growth pattern, presence or absence of a starry-sky pattern, cell size, nuclear pleomorphism, nucleolar prominence and a range of cytological characteristics. Despite the use of consensus guidelines, the results show a high degree of discordance among the eight expert pathologists. Approximately 50% of the cases lacked a majority score, and this discordance spanned all six histopathological features. Moreover, none of the histological variables aided in prediction of MYC single versus double/triple-hit or immunoglobulinpartner FISH-based designations or clinical outcome measures. Conclusions: Our findings indicate that there are no specific conventional morphological parameters that help to subclassify MYC-rearranged LBCL or select cases for FISH analysis, and that incorporation of FISH data is essential for accurate classification and prognostication.
Keywords: diffuse large B cell lymphoma, digital whole slide image, FISH, high-grade B cell lymphoma, MYCrearrangement

Introduction
The precise subclassification of large B cell lymphoma (LBCL) is challenging due to the overlap in histopathological, immunophenotypical and genetic data. Diagnostic criteria are carefully formulated in the revised 4th and 5th editions of the World Health Organisation (WHO) classification, as well as the International Consensus Classification (ICC), based on investigations in the field. 1-20 LBCL are designated as high-grade B cell lymphoma (HGBL) based on morphological criteria. If a MYC rearrangement (MYC-R) is accompanied by BCL2 and/or BCL6, a diagnosis of HGBL with MYC and BCL2/BCL6 [also known as double-hit (DH) or triple-hit (TH) lymphoma] is made. 1 This nomenclature has been further refined in the 5th edition of the WHO and the ICC classifications. [2][3][4] In the absence of MYC-R or when MYC-R is present as a single-hit (SH), a diagnosis of either DLBCL-not otherwise specified (NOS) or HGBL-NOS is made. The criteria to distinguish DLBCL-NOS from HGBL-NOS, however, are difficult to apply in practice.
The Lunenburg Lymphoma Biomarker Consortium (LLBC) previously reported a large series of 5117 LBCL. 21 Fluorescence in-situ hybridisation (FISH) data were available for this cohort and included a substantial number of cases (264; 11%) with MYC rearrangement (MYC-R). An adverse prognostic impact was manifest only in those with DH/TH and an immunoglobulin (IG)partner, but not in those with MYC-R single-hit (SH). 21 In the clinical setting, different morphologies are often attributed to have prognostic implications. Current literature does not provide clear guidance to separate DLBCL and HGBL based on conventional cytomorphological parameters, and leave too much room for arbitrary interpretation. For example, HGBL morphology is assumed to do worse than DLBCL morphology, and such patients may even receive different treatment. Data on the reproducibility of morphological criteria are lacking, and therefore the value of assumed clinical correlations cannot be assessed objectively.
Leveraging the large number of MYC-R LBCL in the LLBC cohort, we undertook a validation study, by a panel of eight expert haematopathologists, of MYC-R LBCL using digitised whole slide images (WSI). The primary goal of this study was to assess whether predefined histopathological criteria could aid in subtype-specific classification of MYC-R LBCL. A secondary goal was to assess whether any histopathological parameter identified by the expert panel would correlate with clinical outcome.

W S I
Haematoxylin and eosin (H&E)-stained slides of MYC-R cases in the LLBC cohort were identified. Of the original 264 MYC-R cases, 209 H&E slides were available for the current study and were digitised at 940 using the NanoZoomer 210 (Hamamatsu, 161 WSI) or the ScanscopeXT (Aperio, 55 WSI) system ( Figure S1). WSI were coded using unique LLBC study numbers only. Clinicopathological characteristics of the LLBC cohort were previously reported and the same cohort was used in the current study. 21 Ethics committee/institutional review board  approvals were obtained from all participating institutions.
Scoring of WSI was conducted independently without prior knowledge of clinical features, international prognostic index (IPI), immunophenotype or FISH data. There was a wash-out period of 6 months between rounds of scoring. During initial review, pathologists were requested to score a pilot set of WSI using histological features (architecture/ growth pattern, starry-sky appearance, cell size, nuclear pleomorphism, prominence of nucleoli, blastic/blastoid morphology and cytological features), according to existing guidelines employed in routine clinical practice. 1 Given the poor concordance, consensus guidelines were jointly formulated (Table 1). Using these guidelines, the pathologists independently scored the study cohort of 209 cases, which was then correlated with FISH and clinical variables.

S T A T I S T I C A L A N A L Y S I S
Categorical variables were described with numbers and percentages at the patient level for patients and disease characteristics, and at the scoring level for the histological features. T-distributed stochastic neighbour embedding (t-SNE) 22 was performed per pathologist to describe the 209 slides and identify possible clustering. As a preliminary step, interpathologist agreement was estimated by kappa values for each feature and scoring round. 23 For the primary goal, MYC status was dichotomised as SH versus DH/TH. To study the impact of DH-BCL6-R reclassification, the SH category was successively defined as SH alone then combined with DH-BCL6-R. The same analyses were performed for both definitions and later by combining MYC and IG status to compare DH-TH-IG individuals and others (SH and DH-TH-no-IG). Mixed-effects binomial logistic models were fitted on each histopathological feature, including the MYC status and the IPI score in two categories (0-1-2 versus 3-4-5) as fixed-effect variables, and individuals as a random variable, to consider the correlation between scores from the same slide. 24 To further explore the association between MYC status and histopathological features, data were visualised using t-SNE and then random forests 25 were fitted per pathologist. Variable importance was reported for each clinical and histopathological feature. In addition, for each pathologist, clustering was realised using k-means, 26 k-modes 27 and hierarchical clustering. 28 The results were compared to MYC variable repartition using the adjusted Rand index (ARI), 29 a measure of similarity between two data-clusterings by considering all sample pairs and counting the pairs that are assigned to the same or different clusters in the predicted and actual clusters. Similar analyses were performed for the secondary outcomes. The analyses were performed using R and Python software. 30,31

Results
The case cohort consisted of 209 MYC-R patients, of whom 173 (82%) yielded successful additional FISH results ( Figure S1). The remaining cases were not informative by FISH analysis. The IPI scores were consistent across these categories (Table S1). Eight pathologists scored 209 cases for a total of 1672 scores (Table S2).
The kappa coefficient showed wide variability among pathologists' scores for the six histopathological features and ranged from 0.09 for nuclear pleomorphism to 0.48 for starry sky. Inclusion of missing categories did not change the agreement evaluation (range = 0.10-0.47). No variable among the six histopathological features showed a degree of concordance that emerged as a reliable feature in recognising SH versus DH/TH with or without BCL6-R (Figure 1 and Figure S2). Despite this study not being designed to address concordance, comparing the scoring rounds with and without the use of the consensus scoring guidelines, interobserver variability showed only marginal improvement. None could identify which MYC-R cases harboured additional rearrangements is BCL2 and/or BCL6 for the designation of HGBL DH/TH. In view of the high interobserver variation, correlations were made with scores of each pathologist separately rather than a consensus score for each histological parameter evaluated.
Neither t-SNE visualisation nor more advanced clustering models on MYC-R-SH versus DH/TH managed to reveal any clusters or structures in the scored slides for any pathologists (Figures S3 and S4). The ARI scores of the MYC variable partitions and the used model outcomes ranged from À0.0049 to 0.0194 for SH versus DH/TH and from À0.0119 to 0.0145 for SH/DH-BCL6-R versus DH-BCL2-R (Table S2). These results revealed a high degree of discordance between the pathologists' classification and the clustering analysis approaches. The presence Ó 2023 The Authors. Histopathology published by John Wiley & Sons Ltd., Histopathology, 82, 1105-1111. of an IG-partner in MYC-R cases was evaluated and yielded similarly discordant results ( Figure S4).
Binomial logistic regression showed a high degree of variability between individual (slide) scores. Indeed, the individual variability had a larger effect than that of different variables, particularly for the starry sky and nucleoli variables (Table S2). The second analysis set, combining SH with DH-BCL6-R versus DH-BCL2-R with TH, and the third set combining DH-TH-IG versus others (SH and DH-TH-no-IG), again showed similar results for all variables evaluated ( Figure S5).

Discussion
The current study is based on a clinicopathologically well-annotated LLBC patient cohort which allowed a comprehensive assessment of histopathological variables to be performed. To avoid any bias during the scoring process, clinical characteristics, immunophenotype and FISH results were not provided to the pathologists until after all scoring was completed. Digital WSI afforded independent as well as joint review and enabled the formulation of standardised consensus guidelines for scoring six histopathological variables, which were considered the most relevant for the subclassification of LBCL.
Our results show that, despite standardisation of scoring criteria, there was a high degree of variability in evaluating histopathological features. Approximately 50% of the cases had no majority score and spanned all six histopathological features, which was most pronounced in the categories of architecture, starry sky and cytological features. We found no associations that allowed subtype-specific classification and indicate that histopathological criteria alone cannot be reliably used to select cases for FISH, predict FISH status or choose alternative treatments. Subclassification therefore must be based on objective/measurable criteria such as molecular alterations including FISH, select disease-defining genomic alterations (e.g. 11q aberrations) or specific mutational or gene expression profiles. As the conventional histopathological evaluation in this validation study showed no associations with subtypes or clinical outcome, it will be important to establish whether artificial intelligence-based methods, such as those reported by Vrabac 32 and Swiderska-Chadaj, 33 can offer transformative insights in the future. In cases where FISH information cannot be obtained for technical reasons, or in resourceconstrained settings where FISH may be unavailable, histological and immunophenotypical data may allow a family (class) level only diagnosis of LBCL.
In conclusion, our findings show that conventional histopathological features alone do not reliably predict LBCL categories, and that FISH/genetic data are essential for accurate subclassification.

Conflicts of interest
The authors declare no conflicts of interest associated with this work.

Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Table S1. IPI score (in 2 categories, <3 and ≥3) distribution in the different MYC status categories Table S2. Unsupervised clustering results using the k-means model. Figure S1. Flowchart of the 209 large B cell lymphomas in the study cohort Figure S2. Representative examples of concordant and discordant histological features. Figure S3. Representation of MYC-SH versus MYC-DH and MYC-TH groups using (A) unsupervised clustering using t-sne. Figure S4. Representation of MYC-SH and MYC-DH-BCL6-R versus MYC_DH_BCL2-R and MYC-TH groups using (A) unsupervised clustering using t-sne. Individual samples are color-coded as follows: red refers to MYC-SH and MYC-DH-BCL6-R, blue refers to MYC-DH_BCL2-R and MYC-TH, and green refers to the missing classes on the original data; (B) Random forest feature importance. Figure S5. Representation of DH-TH-Ig versus SH and DH-TH-no IG groups using (A) unsupervised clustering using t-sne. Individual samples are colorcoded as follows: red refers to DH-TH-Ig, blue refers to SH and DH-TH-no IG, and green refers to the missing classes on the original data; (B) Random forest feature importance.