Fully Automated MR Detection and Segmentation of Brain Metastases in Non-small Cell Lung Cancer Using Deep Learning

Background: Non-small cell lung cancer (NSCLC) is the most common tumor entity spreading to the brain and up to 50% of patients develop brain metastases (BMs). Detection of BMs on MRI is challenging with an inherent risk of missed diagnosis. Purpose: To train and evaluate a deep learning model (DLM) for fully automated detection and 3D segmentation of BMs in NSCLC on clinical routine MRI. Study Type: Retrospective. Population: Ninety-eight NSCLC patients with 315 BMs on pretreatment MRI, divided into training (66 patients, 248 BMs) and independent test (17 patients, 67 BMs) and control (15 patients, 0 BMs) cohorts. Field Strength/Sequence: T 1 -/T 2 -weighted, T 1 -weighted contrast-enhanced (T 1 CE; gradient-echo and spin-echo sequences), and FLAIR at 1.0, 1.5, and 3.0 T from various vendors and study centers. Assessment: A 3D convolutional neural network (DeepMedic) was trained on the training cohort using 5-fold cross-validation and evaluated on the independent test and control sets. Three-dimensional voxel-wise manual segmentations of BMs by a neurosurgeon and a radiologist on T 1 CE served as the reference standard. Statistical Tests: Sensitivity (recall) and false positive (FP) ﬁ ndings per scan, dice similarity coef ﬁ cient

segmentations achieved a median DSC of 0.72 and a good volumetric correlation (r = 0.95).In the control set, 1.8 FPs/scan were observed.Data Conclusion: Deep learning provided a high detection sensitivity and good segmentation performance for BMs in NSCLC on heterogeneous scanner data while yielding a low number of FP findings.Level of Evidence: 3 Technical Efficacy Stage: 2 J. MAGN.RESON.IMAGING 2021;54:1608-1622.
3][4] Therefore, current guidelines recommend MRI of the head to screen for BMs in advanced NSCLC. 5 Due to the rising number of MRI scans, the workload of radiologists has increased steadily, bearing the risk of missed findings with serious consequences, in particular regarding oncological patients especially in a screening setting. 6Given their inconsistent appearance with numerous scattered lesions of varying contrast enhancement, 7 accurate detection of BMs in NSCLC is challenging and the satisfaction of search effect can lead to an oversight of small additional lesions. 8,9However, the exact assessment of the number of BMs is relevant because it guides the most suitable type of treatment, together with patient's characteristics. 10Automated detection of BMs could provide a preselection of lesions and facilitate the assessment of the extent of disease for radiologists and treating physicians.
Thus, the purpose of this study was to train a DLM for fully automated detection and segmentation of BMs in NSCLC and validate its performance on independent test and control sets using heterogeneous scanner data from multiple vendors and study centers.

Materials and Methods
The institutional review board approved this monocentric study (reference number: 19-1208).Given the retrospective design, the requirement for written informed consent was waived.

Patient Population
The authors reviewed the institutional image archiving system at a tertiary care university hospital between January 2012 and March 2020 for MRI examinations of the head of NSCLC patients receiving dedicated oncological treatment for BMs.Patient demographics and treatment of patients were retrieved from the patients' medical charts.
Ninety patients were identified applying the following inclusion criteria: (I) MRI scans at primary diagnosis of BMs; (II) BM-specific therapy after diagnosis, eg, stereotactic radiosurgery or resection; and (III) a complete multiparametric MRI-dataset, consisting of unenhanced T 1 -/T 2 -weighted, T 1 -weighted contrastenhanced (T 1 CE), and fluid-attenuated inversion recovery (FLAIR) sequences.Seven patients were excluded due to the following criteria: (I) severe MRI artifacts reducing image quality (n = 3); (II) insufficient application of intravenous (i.v.) contrast agent (n = 3); (III) incomplete coverage of the brain in one or more aforementioned sequences (n = 1).Using an 80-20% training-test split, the included 83 patients (53 females, mean age of 57.4 AE 11.0 years at the time of MRI) were randomly allocated into a training set with 66 patients and a test set with 17 patients.
To evaluate the performance of the trained DLM in terms of FP findings in patients without BMs, a control set (n = 15 patients, 10 females, mean age of 61.2 AE 13.2 years at the time of MRI) was established by randomly selecting MRI scans of patients treated for NSCLC at the same tertiary care university hospital.These patients received head MRI to screen for BMs between January 2012 and March 2020 (T 1 -/T 2 -weighted, T 1 CE, and FLAIR sequences) but did not present BMs.While randomly selecting these scans, datasets were not chosen when they presented the aforementioned exclusion criteria.
There were no overlapping data between the three sets; the test and control sets were only used for independent evaluation of the DLM after training.After anonymization, MRI datasets (T 1 -/ T 2 -weighted, T 1 CE, and FLAIR sequences) were exported to IntelliSpace Discovery (ISD, v3.0.6,Philips Healthcare, Best, The Netherlands).

Imaging
MRI was performed at 1.0 (n = 6), 1.5 (n = 38), and 3.0 T (n = 54) using scanners from different vendors.All scans were conducted for clinical purposes only and consisted of T 1 -/T 2 -weighted, T 1 CE, and FLAIR sequences.Sixty-five scans were acquired at our institution following a standardized protocol using i.v.administration of Gadolinium (Dotarem, Guerbet GmbH, Roissy, France; 0.5 mmol/mL, 1 mL = 279.3mg gadoteric acid = 78.6 mg gadolinium) with a concentration of 0.1 mmol/kg body weight.The remaining 33 patients received MRI scans at referring constitutions without standardized application of contrast agent.
Table 1 provides detailed data on MRI scanners in training, test, and control sets with Tables 2-4 listing the imaging parameters for each field strength of training, test, and control sets (including names of T 1 -/T 2 -weighted and T1CE sequences).

Reference Standard
To establish the reference standard (RS), a neurosurgeon (S.T.J.) with 3 years and a radiologist (L.P.) with 4 years of experience in neuro-oncological MRI defined the reference standard of BMs on aforementioned sequences of BM positive and negative cases in consensus.To this end, they reviewed the original radiology report and performed a 2-fold assessment of the included MRI scans as well as prior/follow-up imaging.Additionally, the readers assessed the localization of BMs (parenchymal, pachymeningeal, supratentorial, and infratentorial) in consensus.In case of uncertainties, a neuroradiologist (J.B.) with 14 years of experience in neuro-oncological imaging was consulted.
By assessing T 1 -/T 2 -weighted, T 1 CE, and FLAIR sequences on ISD, manual segmentation of BMs was performed by the aforementioned neurosurgeon and radiologist on T 1 CE in consensus (including CE-and non-CE tumor parts).The initial segmentation was provided by the neurosurgeon and then presented to the radiologist to define the final segmentation of BMs.Segmentation of BMs was conducted interactively using a two-step semiautomatic approach.By 3D voxel-wise regional thresholding, a rough segmentation of tumor tissue was obtained, which was further edited manually using 2D editing tools if required.

Deep Learning Model
MODEL ARCHITECTURE.In this study, a 3D CNN based on DeepMedic (Biomedical Image Analysis Group, Department of Computing, Imperial College London, London, UK) was employed.
The network comprises a deep 3D CNN architecture with two identical pathways.Three-dimensional image patches serve as an input to the two pathways.For the first pathway, original isotropic patches are employed, whereas for the second pathway, the patches are down-sampled to a third of their original size.This approach enables the increased depiction of contextual information.The deep CNN comprises 11 layers with kernels of size 3 3 .The model consists of residual connections for layers 4, 6, and 8 with each layer followed by batch normalization and a parametric rectified linear unit as the activation function.In contrast, layers 9 and 10 are fully connected.The last prediction layer has a kernel size of 1 3 and employs sigmoid as the activation function. 22EPROCESSING.Before the multiparametric scans (T 1 -/T 2weighted, T 1 CE, and FLAIR) of the BMs were passed to the DLM, automatic preprocessing of data was performed.The preprocessing pipeline included: (I) skull stripping using a brain mask, (II) coregistration of T 1 -/T 2 -weighted and FLAIR sequences to T 1 CE, (III) bias field correction of all four sequences employing a proprietary method (Philips Healthcare) for MRI of the head, 33 (IV) resampling to an isotropic resolution of 1 mm Â 1 mm Â 1 mm, and (V) intensity normalization to zero-mean with a standard deviation of one. 21AINING.The training of the DLM was performed on the dedicated training set (n = 66 patients) using a 5-fold cross-validation approach without overlapping data, which resulted in five trained models.For training, multichannel 3D image patches with a size of 25 3 were passed to the CNN.These image patches were extracted by employing a distribution of 50% between background and BMs to ensure class balance.In order to increase the number of training samples, image augmentation was used by randomly flipping the image patches along their axes.To compare the spatial overlap, the dice similarity coefficient (DSC) was used as the loss function and the root mean square propagation as the optimizer.Adaptive learning rate schedule was employed in which the initial learning rate was divided equally every time the accuracy did not improve for more than three epochs.Training batch size was set to 10 with the number of training epochs being fixed at 35.

EVALUATION.
The evaluation of the DLM was performed on the independent test set (n = 17 patients) by applying the five individual models from the 5-fold cross-validation training approach.To this end, 3D image patches of size 45 3 were extracted to reduce the time spent during inference on the test data.To reduce the detection of FP findings, the segmentation results from each of the five DLMs were fused using a majority voting scheme. 34To assess for FPs in BM negative cases, the fused DLM was additionally applied on the control set (n = 15 patients).
Automatically detected lesions <0.003 cm 3 during inference on training, test, and control sets were, by default, regarded as image noise and discarded.This threshold was based on the resolution of T 1 CE sequences (in which a volume of 0.003 cm 3 is about two voxels).Given the limitation of scan resolution, readers cannot accurately segment lesions smaller than this volume.

TIME REQUIRED.
The average time required including image preprocessing to run a complete pipeline on a dataset was about 8 minutes: (I) bias field correction <1 second, (II) co-registration and skull stripping <7 minutes, (III) image standardization <1 minute, and (IV) inference on the imaging data 10 seconds (using a Tesla-P100 GPU card (Nvidia, Santa Carla, USA)).

Statistical Analysis
Statistical analysis was performed using JMP Software (release 14, SAS Institute, Cary, NC).Tumor volumes are presented in mean AE standard deviation and as the median with 10/90 percentile.The DSC is reported as the median with 10/90 percentile.The Wilcoxon rank-sum test was used to evaluate for statistical difference between the volumes of the BMs in training/test sets and of detected/missed lesions.The Wilcoxon signed-rank test was employed to assess for statistical difference of the volumes of detected BMs by the DLM and the reference standard.Statistical significance was set to P < 0.05.To evaluate the detection performance of the DLM, sensitivity (recall), precision (positive predictive value), and the F1-score were computed.Automated detection of a BM was achieved if the DLM obtained a spatial overlap of at least two voxels with the RS.Additionally, subanalysis of the detection performance of the fused DLM was performed in terms of MRI specifics (examination location, field strength) and localization of BMs (parenchymal, pachymeningeal, supratentorial, and infratentorial).
To investigate the segmentation performance of the DLM, automatically obtained segmentations were compared with the manual annotations by computing the overlap measure between the segmentations using the DSC.To compare quantitative volumetric measurements of both manual and automatic segmentations, the Pearson's correlation coefficient (r) was calculated and Bland-Altman analysis was performed.
Furthermore, an additional subanalysis was performed to evaluate the detection and segmentation performance of the fused DLM for lesions >0.15 cm 3 .A subanalysis of the detection performance of the fused DLM in terms of MRI specifics and localization of BMs is provided in Table 7.
SEGMENTATION PERFORMANCE.For the detected BMs, the fused DLM achieved a median DSC of 0.72 (10/90 percentiles: 0.20-0.92)compared with manual segmentations.When comparing the volumetric assessment for detected BMs by automated (0.81 AE 1.96 cm 3 ; median: 0.12 cm 3 , 10/90 percentiles: 0.01-1.70cm 3 ) and manual (1.11 AE 2.61 cm 3 ; 0.15 cm 3 , 0.02-3.2cm 3 ; P < 0.05) segmentations, a correlation coefficient of 0.95 was observed.Figure 4 displays the volumetric correlation between manual and automated segmentations using Pearson's correlation on a lesion level.Further, Bland-Altman analysis indicated a good agreement between manual and automated segmentations with a mean difference of 0.31 cm 3 between the DLM and the RS per lesion (Figure 4).
Figure 5 shows a box plot comparing the DSC as well as the number of FP and false negative findings for the five DLMs using the 5-fold cross-validation and the combined DLM employing the majority-voting scheme.

Evaluation of the DLM on the Independent Control Set
On the control set, the fused DLM showed 27 FP findings, which translate to 1.8 FPs per scan.Illustrative examples of FP lesions by the DLM are given in Fig. 6.FP findings were, as in the test set, mostly related to other contrast-enhancing intracranial structures, eg, the choroid plexus and blood vessels as well as to artifacts at the skull base.

Discussion
In this study, we trained a DLM for automated detection and segmentation of BMs from NSCLC and validated its performance on independent test and control sets.Despite the small size and inconsistent appearance of the BMs, the trained DLM achieved a high detection sensitivity and a good spatial overlap with manual segmentations on heterogeneous scanner data while obtaining a low number of FPs/scan in BM positive and negative cases.Following classical machine learning, different DL-based techniques were recently proposed for fully automated detection of BMs, which included CNNs (DeepMedic, 24,32 U-Net, 25,28 GoogLe-Net, 26,29 CropNet, 27 AlexNet, 29 and Faster region-based 31 ) and single-shot detector algorithms. 30[27][29][30][31] Nevertheless, BMs comprise the most frequent malignant brain tumor. 35Regarding their appearance on MRI, BMs substantially differ between the individual primary tumors and within the same tumor entity (especially in NSCLC 7 ).When applying a recently proposed DeepMedic DLM, which was trained on BMs in malignant melanoma (MM, 32 ), to the BMs of NSCLC of training and test sets of the present study, a noticeably decreased detection and segmentation performance could be observed.More details are provided in the Supplementary Information.In the current study, in which the DeepMedic architecture received a dedicated training on BMs from NSCLC rather than from MM, the sensitivity and median DSC were higher at a similar FP rate.These differences are most likely due to the distinct appearance of BMs in MM (often showing hyperintense signal on unenhanced T 1 -weighted sequences and a strong homogenous contrast-enhancement) compared with NSCLC (often necrotic and of various enhancement patterns). 7,36onsequently, BMs from different primary tumors should rather be considered as individual entities and therefore, in order to obtain a sufficient detection and segmentation performance, dedicated DLMs are required (eg, for NSCLC or MM).
In the present study, the DLM provided a high detection sensitivity (85%) on an independent test set, which was similar (eg, Bousabarah et al: 77-82%, 25 Zhou et al 81%, 30 Grøvik et al: 83%, 26 Dikici et al: 90%, 27 and Charron et al: 93% 24 ) or superior (eg, Deike-Hofmann et al, who investigated the detection of BMs in MM: 73% 28 ) to previously reported results.Of note, the average size of the BMs (0.96 cm 3 ) was comparable to (eg, Bousabarah et al: 1.3-1.9cm 325 ) or smaller than in recent studies (eg, Charron et al: 2.4 cm 324 ), which most likely explains the slightly higher sensitivity of the latter study.Furthermore, the lack of a dedicated test set in the studies by Dikici et al 27 and Grøvik et al 26 needs to be considered when analyzing the reported sensitivities.Interestingly, the DLM of the present study showed a higher detection sensitivity on MRI datasets acquired at our institution, which might in part be explained by the standardized imaging protocol considering in-house examinations and in turn a potentially lower image quality at referring sites.Furthermore, the DLM obtained a higher sensitivity for parenchymal and supratentorial BMs compared with pachymeningeal and infratentorial localizations where automated detection is impaired by the high signal intensity of the dura mater and the presence of artifacts at the skull base, respectively.
In order to achieve a sufficient detection of lesions, the resulting high number of FP findings represents a major drawback in automated segmentation of BMs and has been reported to be as high as 6 (Zhou et al 30 ), 8 (Charron et al, 24 Grøvik et al 26 ), 9 (Dikici et al 27 ), 5-13 (Deike-Hofmann et al 28 ), or 20 (Zhang et al 31 ) per scan.In contrast, 1.5 FPs/scan in the test set were observed in the present study by combining the results from the trained five DLMs using the majority-voting scheme.Further, when setting a threshold for a lesion size of 0.15 cm 3 (which translates to a diameter of 6.6 mm), the sensitivity improved to 100% while FP findings were reduced to 0.1 per scan.Zhou et al reported a comparable sensitivity of 98% for BMs larger than 6 mm, whereas the number of FPs was higher (3-4 per scan). 30Additionally,  the DLM showed a low number of FPs/scan in the control set without BMs, therefore emphasizing the model's utility for general diagnostic usage as a screening tool.Of note, the FP findings were associated with obvious contrast-enhancing structures that are generally straightforward to identify by an experienced radiologist.
During the past decades, the incidence of NSCLC has steadily increased. 37At the same time, the survival rates have improved due to advances in anticancer treatments, 13 resulting in increased numbers of primary and follow-up imaging for assessment of BMs.The evaluation of these scans is tiresome and bears an inherent risk of missed diagnosis, in particular for subtle lesions. 6,8,9In this context, the proposed DLM can provide assistance to the physicians by automated detection of BMs leading to a reduced workload and counteract human errors that result from fatigue and the satisfaction of search effect phenomenon. 89][30] In contrast, the DLM of the present study provides 3D voxel-wise segmentations of BMs with a performance (DSC of 0.72), which is in line with the present literature (0.67-0.79 25,26 ) and comparable to DL-based segmentation of larger malignant tumors, eg, glioblastoma (0.62-0.86 21 ) or primary central nervous system lymphoma (0.73-0.76 20 ).In this context, automated segmentation of BMs might be useful for lesion contouring in stereotactic radiosurgery, leading to a reduction of time effort as well as of intra-and inter-rater variabilities. 17,18Further, the automated segmentations may facilitate the accurate assessment of tumor burden by providing volumetric data of BMs, which are superior to conventional linear measurements of lesions since these are not entirely spherical in shape. 38hereby, automated segmentations have the potential to facilitate treatment decisions and to improve the outcome of the patient.
Automated detection and segmentation of BMs in NSCLC pose the following challenges: (I) multifocality, (II) heterogeneity of BMs in terms of size and appearance due to the underlying mutation of the primary tumor, its stage, previously administered treatments, and (III) inhomogeneous imaging data. 7,15,19,26In this context, the majority of recent studies investigating deep learning-based detection of BMs used homogenous imaging data, mostly a standardized protocol consisting of a distinct 3D T 1 CE sequence at a single institution for planning of radiosurgery, 25,[29][30][31] which limits their generalizability and questions the usefulness in clinical routine.In contrast, the DLM applied in the present study provided a high detection sensitivity on heterogeneous "reallife" imaging data acquired on scanners from different vendors, generations, and study centers with resulting divergent scan parameters and unstandardized application of contrast media.
The findings of this study indicate that the application of the DeepMedic network, which was originally developed for other neuro-oncological entities (gliomas), 21,22 may prove to be successful on BMs, but preceding training for the individual primary tumor is mandatory.
In the past, the combined approach of radiomics and machine learning based on manual segmentations of BMs in NSCLC has shown potential for the evaluation of the epidermal growth factor receptor status of the primary tumor, 39 which may provide a clinical benefit since the application of targeted therapy in these patients has shown promising results. 13However, the manual segmentation of tumor tissue for radiomics analysis in aforementioned studies suffers from a low reproducibility as well as inter-and intra-rater variabilities. 15,16In this context, feature extraction from automated segmentations of BMs might lead to an increased robustness of tumor characterization and should be investigated further.

Limitations
Given its retrospective design, an evaluation of whether detection and segmentation performances are sufficient for clinical needs was not possible.Additionally, this study neither investigated the potential long-term effects of detection by human vs. human and artificial intelligence nor if the DLM results influence the treatment of patients.However, a short processing time in combination with a high sensitivity and a low number of FP detections may facilitate clinical implementation and the conduction of prospective studies assessing the direct influence on patient care.It should also be noted that while this study included imaging data from referring institutions, it does not approximate to a true multicenter  approach.Besides, after excluding 8% of scans for the abovementioned criteria (eg, insufficient administration of contrast agent), the true number of patients not suitable for the application of the DLM may be underestimated.Additionally, the small number of patients and BMs in the test set limits the generalizability of the presented results, including the performed subanalysis of BM detection.Further, in this study, only MRI data at primary diagnosis of the BMs were included with unknown performance of the DLM after therapy.However, the DeepMedic architecture has already shown its potential for application in longitudinal tumor imaging in primary central nervous system lymphoma. 20Finally, compared with other deep learning based approaches to automatically detect BMs, 25,30 the DeepMedic network requires multiparametric MRI datasets (FLAIR, T 1 -/T 2 -weighted, and T 1 CE).Nevertheless, these sequences are part of clinical MRI tumor assessment across most institutions.

Conclusion
Despite their small size, the proposed DLM detects BMs in NSCLC with high detection sensitivity at a low false positive rate while obtaining a good segmentation performance.support, Philips Healthcare.The remaining authors disclose no relevant relationships.

FIGURE 1 :
FIGURE 1: Examples of true positive findings of the DLM in a 59-year-old female patient with NSCLC and multiple (n = 5) supra-and infratentorial brain metastases.Despite the central necrosis of the lesions, the DLM (yellow) detects the metastases (arrowheads) of the right temporal (a, volume based on manual segmentations: 0.90 cm 3 ) and left parietal (b, volume: 1.16 cm 3 ) lobes accurately and provides to manual annotations (red) comparable segmentations (DSCs of 0.91 and 0.84).

FIGURE 2 :
FIGURE 2: Examples of true positive findings of the DLM in a 55-year-old female patient with NSCLC and multiple (n = 10) supra-and infratentorial brain metastases.Despite the small lesion size, the DLM (yellow) detects and segments the metastases (arrowheads) of the left temporal (a, volume: 0.18 cm 3 ) and left cerebral (b, volume: 0.02 cm 3 ) lobes as well as of the vermis of the cerebellum (c, volume: 0.02 cm 3 ) accurately (DSCs of 0.81, 0.81, and 0.84).Red = manual segmentations.

FIGURE 3 :
FIGURE 3: Examples of false negative (a-e) and false positive (f-h) findings of the DLM (arrowheads) in the test set.The DLM missed brain metastases of a small volume (0.01-0.07 cm 3 ) and of low contrast enhancement.False positive lesions (yellow) were mostly associated with variations in brain tissue contrast (f) and the choroid plexus (g and h).

Table 7 .
Detection Performance of the Fused DLM in the Test Set Stratified by MRI Specifics and Localization of Brain Metastases (BMs)

FIGURE 5 :
FIGURE 5: Performance of the five different deep learning models (DLMs) resulting from the 5-fold cross-validation training and the combined DLM employing the majorityvoting scheme on the independent test cohort.Purple circles indicate the number of false positives (FP) and blue circles represent the number of false negative (FN) findings.DSC: dice similarity coefficient; CV: cross-validation; MV: majority voting.

FIGURE 4 :
FIGURE 4: Volumetric correlation between manual and automated segmentations of brain metastases using Pearson correlation (r) on a lesion level (a).The shaded area displays the 95% confidence interval of the fitted line.y = regression equation.Volumetric agreement between manual and automated segmentations for brain metastases using Bland-Altman analysis on a lesion level (b).The middle solid line displays the mean difference between the segmented volumes, whereas the dotted lines represent the 95% limits of agreement.RS: reference standard; DLM: deep learning model.

FIGURE 6 :
FIGURE 6: Examples of false positive findings of the DLM (arrowheads) in the control set (a-d).False positive lesions (yellow) were mostly associated with the choroid plexus (a and b), skull base artifacts (c), and vascular structures (d, right transverse sinus).

Table 1 .
MRI Scanner Models (Manufactured by Philips Healthcare, Best, The Netherlands, GE Healthcare, Chicago, IL, USA, and Siemens Healthineers, Erlangen, Germany), Field Strength, Examination Location, and 3D/2D Acquisition of T 1 -Weighted Contrast-Enhanced (CE) and FLAIR (Fluid-Attenuated Inversion Recovery) Sequences in Training, Test, and Control Sets

Table 2 .
Continued SD: standard deviation; CE: contrast-enhanced; FLAIR: fluid-attenuated inversion recovery; IR: inversion recovery.*The repetition time (TR) has been extracted as listed in the respective Digital Imaging and COmmunications in Medicine (DICOM) Tag.Since MRI datasets from different vendors have been included in this work (with different definitions of the TR), there is a broad range of TRs among examinations.

Table 3 .
MRI Scan Parameters and Sequence Names for Each Field Strength in the Test Set

Table 3 .
Continued The repetition time (TR) has been extracted as listed in the respective Digital Imaging and COmmunications in Medicine (DICOM) Tag.Since MRI datasets from different vendors have been included in this work (with different definitions of the TR), there is a broad range of TRs among examinations.

Table 4 .
MRI Scan Parameters and Sequence Names for Each Field Strength in the Control Set 3esultsPatient CharacteristicsThree-hundred-fifteen BMs were segmented on ISD and comprised the RS.Thirty-one patients presented with a single BM.The highest number of BMs per patient was n = 29 in the training and n = 15 in the test set.3,10/90percentiles: 0.01-0.14cm 3 ), whereas the detected BMs presented a volume of 0.96 AE 2.4 cm 3 (0.15 cm 3 , 0.02-3.20 cm 3 ) (P < 0.05).The individual DLMs showed findings of 3.3-5.7 FPs per scan.By employing the majority voting scheme, the fusion of the five DLMs reduced the FP findings to 1.5 per scan.Consequently, fusion of the DLMs increased the precision (68.7%) and the F1-score (0.76) compared with the individual DLMs (39.0-50.8% and 0.55-0.64,respectively).Illustrative examples of detected and missed BMs as well as FP lesions by the DLM are given in Figs.1-3.

Table 4 .
Continued SD: standard deviation; CE: contrast-enhanced; FLAIR: fluid-attenuated inversion recovery; IR: inversion recovery.*The repetition time (TR) has been extracted as listed in the respective Digital Imaging and COmmunications in Medicine (DICOM) Tag.Since MRI datasets from different vendors have been included in this work (with different definitions of the TR), there is a broad range of TRs among examinations.

Table 5 .
Patient Demographics As Well As Treatment, Metastatic Pattern, Number, and Volume of Brain Metastases (BMs) in Training and Test Sets in Absolute and Relative Values SD: standard deviation; WBRT: whole brain radiotherapy.

Table 6 .
Detection and Segmentation Performance of the Five Individual Deep Learning Models After 5-Fold Cross-Validation and Their Fusion Applying the Majority Voting (MV) Scheme (in Bold) on the Independent Test Cohort