MRI Follow‐up of Astrocytoma: Automated Coregistration and Color‐Coding of FLAIR Sequences Improves Diagnostic Accuracy With Comparable Reading Time

MRI follow‐up is widely used for longitudinal assessment of astrocytoma, yet reading can be tedious and error‐prone, in particular when changes are subtle.

A STROCYTOMA is the most common histologic subtype of glioma 1 and can be subdivided into circumscribed or diffuse astrocytic tumors on the one hand and low-or high-grade astrocytoma on the other. According to the 2007 WHO classification, low-grade astrocytomas predominantly arise in younger adults and show a rather slow growth rate, 2 while high-grade astrocytomas primarily occur in middle-aged individuals and usually grow more rapidly. 3 Subsequently, it could be demonstrated that predominantly the isocitrate dehydrogenase (IDH) mutational status is the most important determinant of prognosis. 4,5 Magnetic resonance imaging (MRI) is the modality of choice for imaging of brain tumors 6 and for follow-up of patients in the pre-and posttherapeutic stage. 7 For the latter, contrast-enhanced T 1 -weighted images are key, as higher tumor grades are strongly associated with gadolinium contrast enhancement. 8 However, contrast enhancement may also be absent in high-grade glioma, ie, following antiangiogenic treatment, and is uncommon in low-grade tumors. 8,9 Therefore, fluid attenuated inversion recovery (FLAIR) sequences are integral for MRI follow-up of astrocytoma: according to the Response Assessment in Neuro-Oncology (RANO) criteria, a significant increase in FLAIR hyperintensity warrants determination of disease progression for high-grade glioma. For low-grade glioma, FLAIR is the key sequence to determine disease state, as they rarely show contrast enhancement. 10,11 Assessing repetitive MRI examinations, however, is time-consuming, and comparability may be hampered by discrepant acquisition protocols or the use of heterogeneous MRI scanners. Thus, techniques for subtraction and automated coregistration of such repetitive follow-ups have been developed and reported to be useful, particularly for follow-up of multiple sclerosis (MS) with high lesion burden. 12,13 It was reported that along with the use of these tools, diagnostic accuracy could be increased, while required reading was reduced. 14 However, for the purpose of follow-up imaging of heteromorphic brain tumors as astrocytoma, for which longitudinal evaluation of disease is paramount, these tools have not been systematically investigated yet. Our hypothesis was that an automated coregistration approach could yield comparable advantages for astrocytoma follow-up, as previously shown for MS patients. 12,13 Hence, the study purpose was to compare diagnostic accuracy, required reading time, and diagnostic certainty between longitudinal assessment of FLAIR sequences with and without automated coregistration in patients with low-or high-grade astrocytoma.

Patient Acquisition
The Institutional Review Board approved this study and waived informed consent. Eligible patients were retrospectively identified with a combined database query to the radiological information and picture archiving and communication systems (RIS/PACS).
Included were patients diagnosed with supratentorial astrocytoma as confirmed by an unequivocal neuropathologic report as lowor high-grade astrocytoma either with stable disease or progressive disease according to consensus reading and ground truth correlation. Moreover, these patients needed to have undergone two cerebral MRI scans between 01/01/2010 and 10/30/2018 accessible through PACS, either performed in our institution or provided from referring institutions. Patients who were under 18 years, who did not have a FLAIR sequence available in one of the scans, or whose follow-up MRI had been performed less than 12 weeks after irradiation/surgery were excluded, the latter to minimize the risk of including cases with pseudoprogression. 10 A detailed patient acquisition workflow is shown in Fig. 1.

Image Acquisition
The provided MRI data were obtained from different vendors, scanner types/generations, and included different field strengths (1.0T/1.5T/3.0T). Table 1 provides an overview of the detailed scan parameters. In total, 94 datasets were included, of which 87 were acquired in our institution and 7 in referring institutions.

Ground Truth Annotation
Ground truth (ie, disease progression with increasing FLAIR signal hyperintensity or stable disease with constant FLAIR signal hyperintensity) was determined in consensus by two radiologists, of which one was a senior, board-certified neuroradiologist (S.L., 3 years of experience, J.B., 11 years of experience). The annotation was primarily carried out in PACS and substantiated by information from the automated coregistration (AC) software. To warrant consistency between this imaging-based diagnosis and actual disease status, all clinical and additional imaging data were accessible. If a case was determined as progression, this progression was to be confirmed based on either a further, disease-related increase in FLAIR hyperintensity in subsequent follow-up MRI or neuropathologic confirmation of disease progression after subsequent biopsy/ resection.

Subgroup Analysis
To facilitate subgroup analysis of marginal and unequivocal progression of nonenhancing tumor burden, FLAIR signal hyperintensity was measured bidirectionally and the product of these bidirectional measurements was calculated as suggested for evaluation according to the RANO criteria. Marginal and unequivocal disease progressions were considered as the lower and upper quartiles of the proportional change of the referring products between follow-ups, respectively. Additionally, disease state according to the RANO criteria was recorded: for low-grade astrocytomas, the diagnosis was based on the calculated change of the product of bidirectional measurements of FLAIR hyperintensity between the two examinations, 15 whereas for high-grade astrocytomas disease state was determined visually. 10 Figure 2 shows examples for stable and progressive disease as determined by the RANO criteria.

Subjective Assessment
Three radiologists with different levels of expertise in neuroimaging (3, 6, and 13 years of experience) who were blinded with regard to patient characteristics, examination date, tumor grade, and any other clinical data independently assessed the included 47 MRI study pairs in a two-step procedure: First, the images were evaluated using the conventional reading (CR) approach. Second, after a time interval of 6 weeks that was set to reduce recall bias, the images were assessed again in the AC session using a different, randomized order. In both sessions the subjective assessors determined disease status (progressive disease / stable disease) and indicated diagnostic certainty for each decision. For both reading sessions, a timekeeper recorded the time in seconds needed for the individual cases (from the point at which the images were completely loaded until both decisions had been reported). Additionally, loading times encountered for the AC software and the PACS were determined.

Conventional and Automated Coregistration Reading
For the CR approach, the local PACS software was used, which is the standard tool for follow-up MRI assessment in clinical routine at our institution (Impax EE R20, XVII SU1, Agfa Healthcare, Mortsel, Belgium). The axial FLAIR sequences for the different timepoints were arranged side-by-side in advance. The raters were free to manually link the images at corresponding slices and to adjust window settings, as in clinical practice. AC reading was carried out with a CE-certified and FDAapproved AC tool (LoBI, Philips Healthcare, Best, Netherlands) which is integrated in the vendor's image viewer (Intellispace Portal 11, Philips Healthcare). The AC tool comprises a multistep image processing pipeline to correct for differences between the two datasets that are supposed to be compared: First, it executes a bias field correction, after which the bias field-corrected images undergo a rigid coregistration. Last, an intensity scaling is carried out to correct for possible differences in signal intensity that would impede depiction of actual, disease-related changes in signal hyper-or hypointensity. These changes are then color-coded so that a progression of FLAIR hyperintensity is indicated as red, while a regression is marked blue on the referring overlay map (Fig. 3).
After selection of the follow-up study pairs, the color-coded overlay map is presented to the radiologist in a side-by-side setup along with the two original gray-scaled images so that each possible change in signal that is depicted on the overlay image can be correlated with the original images on the same anatomical level.

Statistical Analysis
Statistical evaluations were performed with JMP software (v. 14, SAS Institute, Cary, NC) and Microsoft Excel (Microsoft, Redmond, WA). The Wilcoxon test was used to determine statistical differences between reading time and diagnostic certainty. The McNemar mid-p test was used to account for differences in diagnostic accuracy, sensitivity, and specificity. 16 Statistical significance was defined as P < 0.05. Agreement among the three subjective readers was evaluated using the intraclass correlation coefficient (ICC) and interpreted as suggested earlier 17

Patients
In total, 41 patients were included, of which 21 were men and 20 were women. Thirty-five patients contributed one study pair each, whereas six contributed two study pairs each, resulting in 47 study pairs that were finally included. The mean age was 44.5 AE 15.5 years. Of the 41 patients included, 25 were diagnosed with high-grade astrocytoma and 16 with low-grade astrocytoma. In 24 of the 47 study pairs, a progressive FLAIR signal hyperintensity were diagnosed in CR, while in 23 follow-ups the FLAIR signal hyperintensity was determined stable. In four cases of lowgrade astrocytoma that were determined as "stable" according to the RANO criteria, a progressive FLAIR signal hyperintensity were found in CR that was associated with a true tumor progression following ground truth correlation. Table 2 gives an overview of underlying diseases.

Subjective Assessment
Overall sensitivity and diagnostic accuracy for progression of FLAIR signal hyperintensity were significantly higher in the reading supported by AC than in the CR approach (sensitivity/ accuracy: 0.86/0.84 vs. 0.75/0.73; P < 0.05; Fig. 4). Specificity was markedly higher for AC (0.83) than for CR (0.71) as well, yet without statistical significance (P = 0.12). Sensitivity for marginal progressive diseases significantly increased from 0.58 in CR to 0.78 in the AC reading (P < 0.05). In contrast, sensitivity for unequivocal progressive diseases was on a similarly high level in both reading approaches (0.92 vs. 0.94; see also Fig. 5). Diagnostic accuracy found in the AC reading for follow-up pairs with a slice thickness difference greater than 2 mm was 0.79. Overall interobserver agreement regarding detection of disease progression was poor (ICC = 0.32) in the conventional reading and markedly improved in the AC approach, where it was good (ICC = 0.72).
Diagnostic certainty for all cases was higher in AC reading (5 (4-5)) than in CR (4 (3-5); P < 0.05). For the subgroup of unequivocal progressive cases, overall certainty was very high, so that no significant difference between both reading approaches could be observed (both 5 (5-5), P = 1.0). In the subgroup of marginally progressive cases, there was a tendency towards improved certainty scores in the AC reading (AC: 4 (3.25-5) vs. CR: 4 (3-5), see Fig. 6) which, however, was not statistically significant (P = 0.7). Diagnostic accuracy at the AC reading found in study pairs comprising scans with a slice thickness difference greater than 2 mm (n = 13) as well as different scanners/field strengths (n = 11) was on a comparable level as overall diagnostic accuracy found for all cases (0.79 and 0.85, respectively, vs. 0.84 in all study pairs; see also Table 3).
Averaged over all cases, reading time for the conventional approach was 33.5 AE 18.9 sec, which was significantly longer than an average of 21.6 AE 16.8 sec that was required in the AC (P < 0.001; Fig. 7). For cases with marginal disease progression, reading time required for conventional assessment was highest with 38.5 AE 16.7 sec, which was reduced to 26.2 AE 18.8 sec in the reading supported by the AC software (P < 0.05). However, these time savings were neutralized by the longer average loading time of 16

Discussion
In the era of precision oncology, it is an ongoing challenge for diagnostic radiologists to provide a high level of accuracy with the limited time available due to steadily increasing exam numbers. 18 Among others, MRI follow-up of tumors of the central nervous system has become increasingly important to provide reliable endpoints for both existing and future therapies. 19,20 Regarding longitudinal assessment of patients with astrocytoma, a frequent brain tumor of high clinical significance, 1 assessment of FLAIR signal hyperintensity is important for follow-up of low-and high-grade tumors, since progression of nonenhancing tumor tissue is known to occur earlier than clinical progression. 10,21 Changes in nonenhancing tumor burden as depicted in FLAIR or T 2weighted images can be subtle, making precise assessment of referring follow-up MRI exams difficult and time-consuming. Hence, in this study we aimed to investigate whether automated, color-coded coregistration (AC) of FLAIR sequences could improve diagnostic accuracy in the assessment of astrocytoma and to determine its influence on reading time and diagnostic certainty. AC of astrocytoma follow-up study pairs allowed readers to improve their sensitivity and diagnostic accuracy for disease progression as indicated by increasing FLAIR signal hyperintensity compared to the conventional reading. Notably, interreader agreement increased from a rather poor level for the conventional reading approach to a good one in the AC reading, which supports the improved diagnostic performance and which would be beneficial when being applied clinically. This study comprised examinations from different MRI scanners and with different slice thicknesses, which can result in partial volume effects that may hamper image postprocessing methods as investigated in this study. However, we found that both for study pairs with a slice thickness difference greater than 2 mm (n = 13) and with different field strengths (1.5T/3T, n = 11), the diagnostic accuracy in the AC reading was on a similar level as when averaged over all examinations (0.79 and 0.85, respectively, vs. 0.84 for all study pairs). This indicates that coregistration and resampling still work acceptably in these cases, which would facilitate clinical application at which comparing scans from different scanners is a commonly encountered scenario. Yet the influence of differences in scan parameters on coregistration quality should be investigated in subsequent studies with a larger cohort to retrieve more robust data in this regard.
Automated coregistration tools for follow-up MRI have been investigated before, yet the main focus of previous studies was set on the longitudinal assessment of multiple sclerosis (MS) lesions. [12][13][14]22 In contrast, studies on the application of such tools in the field of neuro-oncology are lacking. The : Overall sensitivity, specificity, and diagnostic accuracy for progression of FLAIR hyperintensity attained by the three readers in the assessment of supratentorial astrocytoma were significantly higher in the automated coregistration reading (orange bars) than in the conventional reading approach (blue bars). Asterisks indicate statistical significance. initial data we present on the longitudinal assessment of astrocytoma using AC software indicate comparable advantages as revealed by the referring studies on MS follow-up, in which an increased accuracy for disease progression was reported. However, in contrast to a previous study on MS follow-up by Galletto Pregliasco et al, in which the reading time was reported to be significantly reduced, we found a comparable reading time between the AC and conventional reading approach. 12 While this might be due to software-related differences, Zopfs et al likewise reported a significant reduction of reading time in MS follow-up applying the same AC software as was used in our study. 14 Hence, it seems warranted to conclude that the potential time savings provided by AC are greater for numerous lesions such as is prevalent in MS follow-up as it is for longitudinal assessment of single, heteromorphic tumor structures, as investigated in our study.
According to the RANO criteria, for low-grade glioma a proportional increase in FLAIR signal hyperintensity has to be ≥25% to diagnose disease progression, while any significant increase in FLAIR signal hyperintensity warrants diagnosis of progression in high-grade glioma. These criteria are primarily meant to be applied in clinical trials and to prevent unwarranted exclusion or therapy discontinuations of patients with moderate FLAIR progressions, which might be due to prior radiation or antiangiogenic therapy. 9,23,24 However, in clinical routine, subtle changes in FLAIR signal hyperintensity may indicate disease progression and lead to changes of therapy or surveillance. Accordingly, in our study four cases of low-grade astrocytoma that were determined as "stable" according to the RANO criteria showed an increase in FLAIR signal hyperintensity that was associated with a true tumor progression. We found that sensitivity for these Pertaining to all cases, certainty was significantly higher in the automated coregistration reading than in the conventional reading approach, whereas for unequivocal and marginal disease progressions, no significant difference could be found, although there was a tendency towards higher certainty scores in the AC reading for the latter. marginal disease progressions was significantly higher when using the AC software. Opposingly, in the case of unequivocal disease progression, there was, as expected, no difference between the two reading approaches, yet a significantly higher overall assessment time for the AC reading. This shows that the particular strength of such AC tools lies in the depiction of rather subtle cases, while unequivocal cases can be detected in the PACS with similar accuracy in a shorter overall assessment time (Fig. 7). However, averaged over all cases, there was no significant difference in overall reading time between both approaches.

Limitations
Apart from its retrospective character, this study has limitations that need to be addressed. First, the sample size included in this study was relatively small, as rigid exclusion criteria were applied to warrant inclusion of true tumor progressions exclusively. While we did not find differences in scan parameters to have a negative impact on AC quality in our cohort, larger-scale studies in this regard should be encouraged. Second, despite the latency period of 6 weeks that was kept between the CR and AC reading, a certain recognition bias cannot entirely be excluded. Third, while FLAIR evaluation plays an important role in assessing brain tumors, our study did not investigate multiparametric image assessment in clinical routine; hence, the advantages shown for the follow-up assessment of FLAIR hyperintensity may not be applicable to other sequences of interest, such as contrast-enhanced image sequences. Fourth, our study only comprised adult patients, which limits the generalizability of our results to infants and children. Last, the technique we investigated does not address the issue of differentiation between true tumor progression and therapy-related changes in FLAIR signal associated with pseudoprogressions, for which more advanced image interpretation techniques such as radiomics may be advantageous. 25 However, clinical integration and validation of such methods can be considered more difficult than for the technique we investigated, which clearly improved visualization of subtle changes in FLAIR hyperintensity and may therefore be beneficial for detecting such changes in clinical routine.

Conclusion
This study showed that automated, color-coded coregistration facilitated longitudinal assessment of FLAIR sequences in astrocytoma patients, leading to a significantly higher sensitivity and diagnostic accuracy for progression of nonenhancing tumor with comparable reading time. Hence, especially in the light of markedly improved detection of marginal disease progressions, clinical implementation of such tools seems advantageous for longitudinal assessment of nonenhancing tumor burden of astrocytoma patients. : Time required to examine cases in the conventional reading (CR) as compared to the image reading with automated coregistration (AC) including the time to load (blue striped) and coregister (orange striped) MRI study pairs. Higher loading time encountered with the AC tool neutralized significant savings in reading time observed for all cases and marginal disease progressions, leading to comparable overall assessment times. For unequivocal disease progressions, overall assessment time was significantly higher when using the AC software, as the savings in reading time provided by the AC software were minimal.