Primary Central Nervous System Lymphoma: Clinical Evaluation of Automated Segmentation on Multiparametric MRI Using Deep Learning

Precise volumetric assessment of brain tumors is relevant for treatment planning and monitoring. However, manual segmentations are time‐consuming and impeded by intra‐ and interrater variabilities.

P RIMARY CENTRAL NERVOUS SYSTEM LYM-PHOMA (PCNSL) accounts for 2% of primary intracranial tumors, with an increasing incidence over the past decades. 1 PCNSL represents an aggressive type of extranodal lymphoma without coexisting systemic disease at diagnosis, separating it from CNS involvement occurring in systemic lymphoma (secondary CNS lymphoma). [2][3][4] Magnetic resonance imaging (MRI) using gadoliniumbased contrast agents represents the most sensitive imaging modality in the diagnosis of PCNSL. 5 In general, PCNSL are of supratentorial location (75-85%) and appear as solitary (60-70%) or multiple (30-40%) homogenous contrastenhancing brain masses in immunocompetent patients, frequently exhibiting restricted diffusion. 6,7 Usually located in the periventricular white matter and adjacent to subarachnoid and ependymal surfaces, PCNSL lesions are often surrounded by edema. 6,7 However, atypical PCNSL, as in immunocompromised patients, may appear differently, with central necrosis, ring-like enhancement, and hemorrhage. 8,9 Precise assessment of size is clinically relevant for therapy planning, prognosis, and monitoring in many brain tumors, such as glioblastoma, meningioma, or PCNSL. 5,[10][11][12] Although recent studies indicated that 2D conventional measurements proved to be less reliable than volumetric assessment, manual brain tumor segmentation is time-consuming and suffers from relatively high intra-and interrater variabilities. [13][14][15][16] On the contrary, automated segmentation of PCNSL might facilitate precise and objective assessment of tumor burden and response in longitudinal imaging, which is often difficult to determine given its multifaceted appearance consisting of multiple scattered lesions. [5][6][7] Previous studies have already demonstrated the feasibility of deep-learning models (DLMs) for completely automatic brain tumor segmentation, eg, for meningioma and glioblastoma, yielding excellent reproducibility. 10,15,16 Recent technical innovations have further improved DLM-based tumor detection and segmentation, implying readiness for clinical application. 10,[15][16][17] In a previous work, the use of a DLM, which was initially trained on gliomas and validated on glioblastomas, 17 on meningiomas proved feasible and provided good segmentation performance for this tumor. 18 Given these results, we postulated that the same DLM might provide accurate segmentations for a different tumor entity, namely, PCNSL, as well.
The purpose of this study was to evaluate the efficacy of a DLM initially trained on gliomas for fully automated detection and segmentation of PCNSL on multiparametric MRI including heterogeneous imaging data from multiple scanner vendors and several study centers.

Materials and Methods
The local Institutional Review Board approved this retrospective, single-center study (reference number: 19-1208) and waived the requirement for written informed consent for the patient cohort.

Patient Population
MRI scans of patients treated for PCNSL at our tertiary-care university hospital between November 2010 and July 2019 were reviewed using our institutional image archiving system and clinical information system.
Eighty-one scans of 53 patients, regardless of initial or followup imaging, were identified applying the following inclusion criteria: 1) a complete MRI-dataset, consisting of T 1 −/T 2 -weighted, T 1weighted contrast-enhanced (T 1 CE), and fluid-attenuated inversion recovery (FLAIR) sequences, 2) histologically proven PCNSL, and 3) a clearly identifiable presence of tumor, defined as a minimum volumetric threshold of 0.25 cm 3 for tumor core. Exclusion criteria were: a) severe artifacts (seven scans of five different patients), b) extensive leukoencephalopathy (Fazekas III, three datasets of three different patients), and c) second intracranial tumor (meningioma: two scans of two different patients) as determined by a board certified neuroradiologist (J.B.) with 13 years of experience in neurooncologic imaging. This senior neuroradiologist also assessed PCNSLs regarding their imaging characteristics (multifocal tumor spread, supra−/infratentorial location, necrosis, and hemorrhage) and potential tumor recurrence. Applying the aforementioned exclusion criteria, 69 examinations from 43 patients (mean age at initial imaging: 62.6 ± 13.3 years, 23 female) were included, consisting of 36 scans at primary diagnosis and 33 follow-up scans. Twenty-two patients received primary imaging only, 14 patients both primary and follow-up, and seven patients follow-up imaging only.
Included datasets (unenhanced T 1 -and T 2 -weighted, T 1 CE, and FLAIR images) were anonymized and exported to IntelliSpace Discovery (ISD, v. 3.0, Philips Healthcare, Best, The Netherlands). The following data were collected from the patients' medical charts: procedure (stereotactic biopsy or surgery) to obtain material for histopathological diagnosis of PCNSL, available additional imaging and bone marrow biopsy to rule out systemic lymphatic disease, infection with human immunodeficiency virus (HIV), and treatment received.

Imaging
MRI was performed at 1.0 (n = 3), 1.5 (n = 30), and 3.0T (n = 36) with examinations either performed at our institution (n = 54) or referring centers (n = 15). All scans were conducted for clinical purposes and included unenhanced T 1 -and T 2 -weighted, T 1 CE, and FLAIR images. Table 1 provides detailed data on the scanners used and Table 2 lists the imaging parameters for each field strength. The standardized imaging protocol at our institution included automatic intravenous administration of gadolinium-based contrast agent (Dotarem, 0.5 mmol/mL, 1 mL = 279.3 mg gadoteric acid = 78.6 mg gadolinium; Guerbet, Villepinte, France) with a concentration of 0.1 mmol/kg body weight. In referring institutions, contrast medium application was not standardized.

Reference Standard
A radiologist and a neurosurgeon (L.P., L.G.), each with 4 years of experience in neuro-oncologic imaging, together assessed PCNSLs on unenhanced T 1 -and T 2 -weighted, T 1 CE, and FLAIR images on ISD. To establish the reference standard, both readers performed segmentations on ISD simultaneously (using consensus reading).
Segmentation was conducted interactively using a 2-step, semiautomatic approach. First, a rough segmentation of tumor tissue was obtained by 3D voxelwise regional thresholding. Then the initial segmentation was further edited manually using a combination of 2D editing tools. Contrast-enhancing tumor and iso-to hyperintense tumor, including surrounding hyperintense edema, were segmented separately in T 1 CE and FLAIR. The tumor core was defined as the contrast-enhancing tumor volume in T 1 CE. Total tumor volume (TTV) was defined as the union of tumor volumes in T 1 CE and FLAIR, including solid contrast-enhancing tumor parts, necrotic tumor parts, and surrounding FLAIR-hyperintense tumor as well as edema.

Deep-Learning Model
In this study, a 3D convolutional neural network (CNN) based on DeepMedic (Biomedical Image Analysis Group, Department of Computing, Imperial College London, UK), trained on an independent dataset of 220 glioma cases available through the Brain Tumor Image Segmentation (BRATS) 2015 challenge, was used. 19 As defined in the BRATS benchmark, the DLM performs voxelwise classifications of four tumor components (edema, contrast-enhancing tumor, necrosis, and nonenhancing tumor). 19 The network consists of a deep 3D CNN architecture with two identical pathways that apply different image resolutions, which helps in capturing short-and long-range characteristics of tumor appearances. 3D image-segments (patches), centered at the same image location, provide inputs to the two pathways. However, for the second pathway, the image was downsampled to a third of its original size. The model comprises 11 layers with kernels of size 3 3 . The model also consists of residual connections for layers 4, 6, 8, and 10, whereas layers 9 and 10 are fully connected.
During training of the glioma data, 17 on-the-fly image augmentation was employed by flipping the images along their axes. The training batch size was set to 15; batch normalization was applied and a parametric rectified linear unit was used as the activation function. A Dice similarity coefficient (DSC) was provided as the loss function and the number of training epochs was set to 35. Each epoch consisted of an additional 20 subepochs, for which around 1000 3D image-segments of size 25 3 where randomly extracted from the training data. To ensure class-balance, the extracted patches had a distribution of 50% between background and tumor.
During inference on the PCNSL data, the size of the 3D image-segments was set to 45 3 to increase the kernel receptive field and reduce the inference time. Before the multiparametric scans of the PCNSLs (unenhanced T 1 -and T 2 -weighted, T 1 CE, and FLAIR images) were passed to the CNN, automatic preprocessing was applied. The image preprocessing pipeline included 1) skull stripping employing a brain mask, 2) coregistration of unenhanced T 1 -and T 2 -weighted sequences as well as FLAIR to the reference space defined by T 1 CE, 3) bias field correction of all four sequences applying a proprietary method (Philips Healthcare) for brain MRI, 20 4) resampling to isotropic resolution of 1 × 1 × 1 mm 3 , and 5) normalization to zero-mean and a standard deviation of one. 17

Statistical Analysis
Statistical analysis was performed using JMP Software (release 14, SAS Institute, Cary, NC), with P < 0.05 considered statistically significant. Volumes of tumor components (TTV and tumor core) are given as mean ± standard deviation. Successful detection of PCNSL was defined if the DLM obtained a spatial overlap with the manual segmentation of the tumor core (at least one voxel, DSC >0). Detection sensitivity was calculated applying the following formula: With TP being true positive and FN false negative. To evaluate the segmentation (S) performance of the DLM compared to manual segmentations, the DSC, which provides a measure of spatial overlap for each voxel of segmented tissue, was calculated for each tumor component 17,18,21 : where S RS is segmentation provided by human readers (reference standard) and S DLM is segmentation provided by the DLM.
Resulting DSCs are reported as median with 10/90 percentiles. Comparison of DSCs obtained in initial and follow-up scans was performed with Wilcoxon rank-sum test. Corresponding 3D rendering images of automated and manual tumor components were created using ISD. To evaluate volumetric agreement between manual and automated segmentations for TTV and tumor core, Pearson's correlation coefficient (r) was calculated and Bland-Altman analysis

Characteristics of Patients and PCNSL
The diagnosis of PCNSL was confirmed by stereotactic biopsy in 40 cases and by surgical specimen in three. Systemic lymphatic disease was ruled out by additional imaging and bone marrow biopsy in all cases. All patients were HIVnegative, with the majority (95.3%) receiving chemotherapy. For six patients, a recurrence of the tumor was noted during follow-up imaging.
The majority (94.2%) of the PCNSLs showed supratentorial location and multifocal tumor spread (69.6%). Table 3 provides detailed information regarding patients' treatment. Table 4 lists the localization of PCNSLs and additional imaging characteristics. Mean TTV and tumor core volume, based on manual segmentation, were 77.16 ± 62.40 cm 3 and 11.67 ± 13.88 cm 3 , respectively.

Evaluation of the DLM
DETECTION. The DLM detected 66 of 69 PCNSL correctly, referring to a sensitivity of 95.7%. The missed PCNSLs were small (average core volume of 1.1 ± 0.55 cm 3 ) and were located directly adjacent to the ventricles. Figure 1 shows an example of a missed PCNSL.

Discussion
In this study we evaluated the performance of a DLM initially trained on gliomas for fully automated detection and segmentation of PCNSL on clinical routine MRI. Although PCNSL inherently shows a heterogeneous and complex tumor structure on MRI, 6-8 the DLM we used detected and segmented PCNSL with high segmentation performance independent of scan timing (initial vs. follow-up).
Apart from atypical forms, glioblastoma and PCNSL often present with distinct differences in MRI features enabling potential differentiation. 22-24 PCNSL usually consists of several scattered homogenous enhanced tumor parts,      whereas glioblastoma mostly presents as a tumor with a ringlike zone of contrast-enhancement surrounding necrosis and intralesional hemorrhage. [22][23][24][25] However, PCNSL and gliobastoma show similarities in their features, such as contrast-enhancing tumor, potential central necrosis, FLAIRhyperintense tumor parts, and surrounding edema. [22][23][24][25] Therefore, we hypothesized that a DLM that had already performed well in detection and segmentation of  glioblastoma 17,19 might also be applied to PCNSL. The potential to apply the DLM trained for glioblastoma on a different tumor type (meningiomas) has previously been demonstrated. 18 While Laukamp et al 26 showed that dedicated retraining on meningioma cases allowed for a significant improvement of accuracy, this retraining may not be necessary for PCNSL since glioblastoma tends to be more similar in appearance on MRI to PCNSL than meningioma. In this study the DLM provided high segmentation performance, with the results being comparable to research focusing on automated glioblastoma and glioma segmentation. 17,19,27,28 In this context, Perkuhn et al reported DSCs between 0.62 and 0.86 for the differing tumor components of glioblastoma while using the same DLM as in the present study. 17 28 Given the comparable results of the study by Perkuhn et al 17 and the present study, a DLM trained on gliomas and achieving accurate segmentation of glioblastomas can be applied on PCNSL, despite their often different appearance and overall complex tumor structure without noticeably decreasing segmentation performance. Additionally, the DSCs we observed are comparable to those reported by Laukamp et al, in which the same DLM we used was applied to meningiomas without any training (DSC between 0.78 and 0.81). 18 Further, segmentation performance of the DLM in the present study is comparable to interrater variabilities of human readers for glioblastoma, which were reported to be as high as 20-30%, 15,16 which corresponds to a DSC between 0.70 and 0.80.
Automated detection of cerebral masses, being possible with the proposed DLM of the present study, has high clinical relevance, as it may enable preselection of lesions and/or serve as a control mechanism for the radiologist and treating physician. 10,29,30 Following stereotactic biopsy, the procedure of choice for histopathological diagnosis, standard treatment of PCNSL consists of different approaches, including chemotherapy based on high-dose methotrexate with or without radiotherapy, high-dose chemotherapy combined with autologous stem-cell therapy, or immunotherapy using rituximab. 9,31,32 However, an optimal standard of treatment has not been established for PCNSL. The ability to precisely assess tumor response is necessary to study the efficacy of various PCNSL therapy regimes, and the DLM we studied here may be useful for longitudinal assessment of tumor response in oncological follow-up imaging.
2D diameter measurement methods underestimate slow and multifocal tumor growth or shrinkage, the latter being typical for PCNSL, with possible multiple scattered small enhancing tumor parts in periventricular localization. 11,22,23,33 In contrast, volumetric assessment of tumor size proved to be of higher sensitivity to detect tumor progression or regression. 13,14 Further, it can be used for planning of stereotactic biopsy and evaluation of tumor features based on radiomics. 10,11,[33][34][35] Additionally, reliable automated segmentation would lead to increased availability of volumetric data by omitting time-consuming manual volumetric segmentations. Automated tumor segmentation also improves objectivity and offers far higher reproducibility of tumor margins, thereby improving comparability of measurements in oncological follow-up imaging . 10,15,33 Further, we evaluated whether the segmentation performance of the DLM would differ on initial scans compared to follow-up imaging of treated PCNSL, which may include smaller and potentially hemorrhagic tumors, as well as postoperative/postinterventional. Remarkably, the DLM performed independently of scan timing. Additionally, by including datasets from referring institutions acquired on different scanners, we demonstrated the robustness of the DLM on multivendor, heterogeneous data with differing protocols considering contrast media application and field strength, therefore indicating its applicability in clinical routine.
The combined approach of radiomics and machine learning has shown potential to differentiate between PCNSL and glioblastomas, which is clinically especially important for atypical forms of PCNSL due to their similarity to glioblastomas and vice versa. 22,[36][37][38] However, manual segmentation for radiomics analysis and/or deep-learning-based characterization of brain tumors suffer from inter-and intrarater variabilities as well as lower reproducibility, both being potentially highly problematic for these kind of analyses. 15,16 Feature extraction from automated segmentations of brain tumors, eg, PCNSL and glioblastoma, might overcome this challenge by omitting inter-and intrarater variabilities, hence leading to increased robustness of tumor characterization.

Limitations
Given its retrospective study design, whether segmentation performance was sufficient for clinical needs and further investigation focusing on dedicated clinical tasks and parameters is needed. We used a DLM initially trained on gliomas for segmentation of PCNSL. This needs to be considered as a potential drawback of this study design with a newly designed/trained DLM potentially achieving superior results. Given the fact that 22% of included scans were acquired at referring institutions with unknown protocols of imaging acquisition and contrast media application, the DLM should be investigated in a multicenter setting evaluating its true performance on different scanners and protocols.

Conclusion
Deep learning offered comparable detection and segmentation of PCNSL on routine clinical MRI compared to manual segmentation, despite the complex and multifaceted appearance of this tumor type and applied DLM being initially trained on gliomas. Performance of the DLM was good for both scans at initial examination and scans after therapy, implying potential for this method to be used in follow-up assessment of PCNSL.