Improvements in Between-Vendor MRI Harmonization of Renal T 2 Mapping using Stimulated Echo Compensation

Background: T 2 mapping is valuable to evaluate pathophysiology in kidney disease. However, variations in T 2 relaxation time measurements across MR scanners and vendors may occur requiring additional correction. Purpose: To harmonize renal T 2 measurements between MR vendor platforms, and use an extended-phase-graph-based ﬁ tting method ( “ StimFit ” ) to correct stimulated echoes and reduce between-vendor variations. Study Type: Prospective. Subjects: 8 healthy “ travelling ” volunteers (37.5% female, 32 (cid:1) 6 years) imaged on four MRI systems across three vendors at four sites, 10 healthy volunteers (50% female, 32 (cid:1) 8 years) scanned multiple times on a given MR scanner for repeatability evaluation. ISMRM/NIST system phantom scanned for evaluation of T 2 accuracy. Field Strength/Sequence: 3T, multiecho spin-echo sequence. Assessment: T 2 images ﬁ t using conventional monoexponential ﬁ tting and “ StimFit. ” Mean absolute percentage error (MAPE) of phantom measurements with reference T 2 values. Average cortex and medulla T 2 values compared between MR vendors, with masks obtained from T 2 -weighted images and T 1 maps. Full-width-at-half-maximum (FWHM) T 2 distributions to evaluate local homogeneity of measurements. Statistical Tests: Coef ﬁ cient of variation (CV), linear mixed-effects model, analysis of variance, student ’ s t -tests, Bland – Altman plots, P -value <0.05 considered statistically signi ﬁ cant.

M RI T 2 mapping is sensitive to edematous changes and ischemia. 1In the kidney, it has shown potential in the evaluation of autosomal dominant polycystic kidney disease, 2 renal cell carcinomas, 3 ischemia-reperfusion injury, 4,5 and renal transplants. 6,7Mapping of absolute T 2 values can potentially enable a more objective study of disease-related changes over time than T 2 -weighted MRI.Although T 2 is an inherent tissue property, quantitative assessment of tissue T 2 relaxation time is dependent on various factors including pulse-sequence type, radiofrequency (RF) pulse profile, acquisition parameters, MRI hardware capabilities, and subjectspecific influences of coil loading and transmit/receive gain settings. 8The accuracy and reproducibility of T 2 measurements should be investigated, particularly when combining data across MR vendors and platforms. 9uantitative T 2 maps can be acquired using various pulse sequences, including multiecho spin echo (MESE), gradient and spin echo, 10 T 2 -prepared single-shot balanced steady-state free precession, 11 and driven equilibrium single-pulse observation of T 2 sequences. 12The MESE pulse sequence is widely used due to its commercial availability across all MR vendors. 13However, challenges arise from B 1 field inhomogeneities, imperfect slice selection pulse profiles, and transmit calibration errors, causing deviations from the nominal 180 flip angle refocusing pulses. 8he resulting stimulated and indirect echoes cause T 2 values to be overestimated, 14 particularly for body imaging at 3T.This bias can vary between scanners with different hardware, RF pulse shapes, and protocol imlementations, 15 which is problematic for any multicenter clinical trials.
8][19][20] The effects of indirect echoes on multivendor and multicenter performance, and whether such biases can be corrected, remains unexplored.
The United Kingdom Renal Imaging Network-MRI acquisition and processing standardization (UKRIN-MAPS) project 21,22 was set up to develop harmonized renal MRI protocols across MR vendors, which are in-line with the recent consensus guidelines regarding patient preparation, hardware, acquisition parameters (for T 2 mapping: >5 echo times, maximum echo time >120 msec at 3T) and data analysis. 13,23A preliminary investigation found a large cross-vendor variation in renal T 2 when using a monoexponential fit, despite using a standardized MESE sequence across MR vendors with harmonized parameters. 15his study aims to evaluate the consistency of renal T 2 measurements obtained across 3T MR platforms from different vendors (GE, Philips, and Siemens) using an EPG-based fitting method.

MRI Data Acquisition
Experiments were performed at 3T on four MRI systems from three different vendors (Discovery MR750, General Electric [GE] Healthcare, Waukesha, WI, USA; Ingenia, Philips Healthcare, Best, Netherlands; two MRI systems tested-Prisma and Skyra-Fit, Siemens Healthcare, Erlangen, Germany) at four imaging sites.Scanners were equipped with a dual-channel transmit system, except for the GE Discovery MR 750 that used a single-channel system.
A respiratory-triggered MESE sequence was harmonized across vendors as part of the UKRIN-MAPS renal MRI protocol. 21Key parameters included a minimum repetition time (TR) = 3 sec, echo time (TE) = 12.9-129.0msec in 12.9 msec steps, nominal refocusing flip angle = 180 , field of view (FOV) = 38.4cm, acquisition matrix = 128 Â 128, five slices with thickness/gap = 4.5/1.0mm, parallel imaging factor = 3, and acquisition time = 43 breaths.The approximate acquisition time for collection of the T 2 mapping was 3 minutes dependent on breathing rate.The GE product MESE sequence was customized to enable controllable echo spacing.Detailed parameters for UKRIN-MAPS and National Institute of Standards and Technology (NIST) reference protocols are shown in Table 1.

Phantom Experiments
The International Society for Magnetic Resonance in Medicine/ National Institute of Standards and Technology (ISMRM/NIST) system phantom 29 was used to evaluate the accuracy of T 2 measurements against the T 2 -array reference values provided by the manufacturer.The phantom was scanned three times on each scanner using the harmonized UKRIN-MAPS MESE protocol and the NIST reference protocol.Reference T 2 values were temperature-corrected based on the recorded temperature using a linear regression model. 30o evaluate the accuracy of T 2 measurements, mean absolute percentage error (MAPE) was calculated by comparing the mean T 2 measurements from all pixels (T 2 x,y ð Þ) in spheres and repeats against reference values (T 2 ref ):

Â100%
The cumulative MAPE was calculated for those spheres with reference T 2 values in the physiologically relevant T 2 range (45-1286 msec).

In Vivo Experiments
Participants fasted for 2 hours prior to their scan session to limit dietary and hydration variability.As shown in Fig. 1, in vivo experiments consisted of two studies: 1) a "Travelling Kidney study" in which volunteers travelled and underwent scans at different imaging sites to assess intervendor variation; and 2) a "Repeatability study" in which volunteers were scanned multiple times at a single site.
The "Travelling Kidney study" was performed on eight healthy volunteers (five males/three females, age 32 AE 6 years (mean AE SD)), who were each scanned on all three vendors.For Siemens, the participants were scanned on either a Skyra Fit (five participants) or Prisma scanner (three participants) at two different imaging sites.
In the "Repeatability study," 10 healthy volunteers (five males/five females, age 32 AE 8 years (mean AE SD), five from the "Travelling Kidney" study group) were repeatedly scanned on a given scanner over a period of 2-6 months.Four participants were scanned two times on the Philips scanner, two participants were scanned four times on the GE scanner, and two participants were scanned four times on the Siemens scanners at two sites.
In both studies, the harmonized UKRIN-MAPS MESE protocol was used, which included the T 2 mapping acquisitions, B 1 mapping, T 2 -weighted images, and MOLLI scans, the results of which are presented here.Whole kidney masks were automatically segmented from the T 2 -weighted images using a convolutional neural network 28 (https://github.com/alexdaniel654/Renal_Segmentor).An operator (HL) with 10 years of experience in MRI manually segmented the cortex and medulla on the T 1 MOLLI maps, using an interactive graphical interface developed in MATLAB (R2019a, MathWorks Inc., Natick, MA).
The whole kidney, cortex, and medulla masks were applied to the T 2 maps with minor manual adjustments to correct for motion between acquisitions.This allowed for evaluation of the mean values and full-width-at-half-maximum (FWHM) local homogeneity of the T 2 distribution of voxels.

Fitting with Stimulated Echo Compensation
The "StimFit Toolbox," 8,27 based on the EPG algorithm, was used to model stimulated-echo compensation in phantom and in vivo datasets.The EPG algorithm provides a system of equations that simulate the response to RF pulses with arbitrary flip angles including T 1 and T 2 relaxation effects. 14Vendor-specific RF pulse shapes a GE NIST reference T 2 -mapping protocol was modified to a MESE to match Siemens timings, since the NIST recommended protocol of three repeats of 2D spin-echo sequence was found to be inaccurate and take a long scan duration (41.5 minutes).b Philips NIST reference uses a 2D/SE T 2 -mapping protocol with a composite broad band refocusing pulse rather than the sinc-shaped slice-selective refocusing pulse used in the UKRIN-MAPS MESE protocol.
and the nominal spatial width of excitation and refocusing pulses were input to "StimFit" to calculate the flip angle distributions across the slice profile, so that the effect of imperfect RF slice profiles could be accounted for.Magnetization evolving in alternate coherence pathways was assumed to experience negligible T 1 relaxation. 8urthermore, T 2 and B 1 values were estimated by a nonlinear leastsquares algorithm with an objective function of an aggregate decay curve integrated over the slice profile. 27Due to the symmetry of the spin-echo signal at refocusing angles surrounding 180 , "StimFit" precluded an estimated relative B 1 above unity (refocusing angle >180 ) such that 0 ≤ B 1 ≤ 1.For comparison, vendor-specific acquired B 1 maps were converted to the range [0, 1], i.e., converted B 1 (cB 1 ) = (1abs (FA nominal À FA actual )/FA nominal ).

Statistical Analysis
Statistical analysis was performed in R software (version 4.2.2;https://www.r-project.org/) with packages "lme4" and "lmerTest." For the repeated phantom measurements, a random-intercept linear mixed-effects (LME) model was utilized to account for the data hierarchy.The data were entered as proportions relative to the temperature-corrected reference T 2 values.A categorical variable describing the fit method was studied as the fixed effect, and the intercept for specimen was modeled as the random effect.The P-values of the fixed effect were calculated using the Satterthwaite's degrees of freedom method.
For in vivo measurements, a one-way analysis of variance (ANOVA) was performed to test for significant variations in T 2 measurements between vendors, and intervendor and intravendor coefficients of variation (CVs) were calculated.Paired student's ttests were performed to compare T 2 mean and FWHM values, and CVs between the monoexponential fit and "StimFit."Unpaired student's t-tests were calculated to compare intervendor CVs (eight volunteers each scanned on three MR vendors) and intravendor CVs (10 volunteers each scanned by 1 MR vendor multiple times).Bland-Altman plots were generated to assess the consistency between each pair of vendors.P-values <0.05 were considered statistically significant for all analyses.

Phantom Experiments
Figure 2 and Table 2 show the T 2 measurements of the ISMRM/NIST system phantom across the different MR vendors and sites with reference values using a monoexponential fit and "StimFit" for both the UKRIN-MAPS and NIST reference protocol.
Compared with the monoexponential fit, "StimFit" reduced the MAPE of UKRIN-MAPS T 2 MESE measurements across the four sites (three vendors) from 4.9%, 9.1%, 24.4%, and 18.1% to 3.3%, 3.0%, 6.6%, and 4.1%, respectively.For the NIST reference protocol, "StimFit" reduced as compared to an exponential fit the MAPE for the GE and the two Siemens scanners from 4.5%, 10.8%, and 20.2% to 3.0%, 2.9%, and 5.5%, respectively, while for the Philips NIST protocol that uses composite pulses a MAPE of 2.1% was found for the exponential fit as compared to 4.9% for StimFit.
Significant differences between T 2 measurements and reference T 2 values were found in all measurements using the monoexponential fit.In contrast, no significant difference was found between T 2 measurements and reference T 2 values for "StimFit" (Siemens Skyra Fit: NIST protocol P = 0.34, UKRIN protocol P = 0.48; Siemens Prisma: NIST protocol P = 0.08, UKRIN protocol P = 0.73; GE NIST protocol P = 0.1, UKRIN protocol P = 0.82) except for Philips which had a significant difference for the NIST protocol (composite pulses) and UKRIN protocol (with a small but consistent bias).
The correction of measurement bias by "StimFit" can also be observed in the histograms shown in Fig. 2b, which show the distribution of normalized T 2 measurements from all voxels across the different spheres in the T 2 array for the exponential fit and StimFit.

Example in Vivo Images
Figure 3 shows example monoexponential and "StimFit" in vivo T 2 maps, together with the estimated cB 1 maps from "StimFit" and the cB 1 maps computed from the separately acquired B 1 mapping sequences.The displayed maps are all from the same healthy volunteer collected across the three vendors.In regions where the flip angle was close to the nominal value, the T 2 maps agreed well between fitting methods and between vendors.However, flip angle variations due to non-ideal B 1 (cB 1 values with a discrepancy from 1) caused an overestimation in the monoexponential fit.This can be seen in the right kidney for data collected on the Philips scanner, and in the upper left kidney for the GE scanner, and in both kidneys for the Siemens dataset.For Siemens in particular, a widespread low B 1 caused a global overestimation of T 2 in all subjects.These overestimations in T 2 were largely corrected using "StimFit," which resulted in much more consistent T 2 values between kidneys and between vendors.Also, the cB 1 maps estimated by "StimFit" showed a similar pattern of features to the measured B 1 maps (but absolute values were not directly comparable due to the different RF pulses used in the T 2 mapping and B 1 mapping sequences).
Figure 5 shows scatterplots of the T 2 measurements in the left and right whole kidneys from different vendors.Differences between left and right kidneys can be observed in the monoexponential fit results caused by B 1 inhomogeneity across the two kidneys.However, "StimFit" improved both the local T 2 homogeneity and T 2 variation across vendors.
The T 2 measurements in the cortex and medulla from eight healthy adult volunteers are summarized in Table 3 and Fig. 6a.The results from two Siemens scanners (Skyra Fit and Prisma) were combined due to their similar performance regarding measurements of the ISMRM/NIST phantom and similar B 1 field (average measured B 1 field of nominal flip angle: 82.3% vs. 82.7%).The T 2 measurements were significantly higher for monoexponential fit than "StimFit" in all vendors, particularly for Siemens with a widespread low B 1 .For the monoexponential fit, significant differences were found in both cortex and medulla between vendors, whereas no significant difference was observed between vendors when using "StimFit" (P = 0.86 and P = 0.92).For the monoexponential fit, Siemens showed significantly higher T 2 measurements than the other two vendors, but "StimFit" results were consistent.The intervendor CVs were significantly reduced from 8.0% (cortex) and 7.1% (medulla) with exponential fit to 2.6% and 2.8% with "StimFit." Repeatability Study: Intervendor and Intravendor Evaluation Table 4 summarizes the measures of repeatability for mean T 2 values in the cortex and medulla.Specifically, "StimFit" reduced the intravendor CVs for most vendors compared to the monoexponential fit.

Assessing the T 2 Distribution in the Kidney
Figure 6c shows the FWHM of the T 2 distribution measured from cortex, medulla, and whole kidney.The FWHM was significantly lower for "StimFit" compared to exponential fit in the cortex (StimFit: 6.9 AE 2.5, Exp.fit: 7.6 AE 2.8) and for the whole kidney (StimFit: 5.9 AE 2.5, Exp.fit: 6.3 AE 2.4).

Discussion
In this study, we demonstrated a large variance in renal T 2 mapping across MR vendors, despite using a harmonized MESE scheme with monoexponential fit on the same group of volunteers.By employing an EPG-based method (i.e., "StimFit"), the intervendor CVs were reduced to the same level as intravendor CV (<3%), so that no significant difference was found in the "StimFit" T 2 measurements between vendors.
It is worth noting that the monoexponential fit remains the default option on the vendor platforms evaluated in this study, and correction methods for T 2 mapping have not yet been recommended by the current consensus statements. 13,23hen using the monoexponential fit, the measured T 2 values of cortex and medulla differed by up to 32 and 28 msec between vendors.This variation seems to be comparable to pathological T 2 changes reported in previous studies, such as 132 AE 22 msec and 97 AE 12 msec for high-grade and low-grade renal cell carcinomas, 3 and an increase from 77 AE 7 msec to 90 AE 6 msec after ischemia-reperfusion injury (in rabbits). 5These findings suggest that the variability among vendors when using monoexponential fit may substantially impair the ability of T 2 mapping as a potential disease biomarker across multisite studies.
The inaccuracy and variance of measurements were mainly attributed to an imperfect B 1 field, which was revealed by both separate B 1 mapping acquisitions and the cB 1 maps estimated directly from the "StimFit" calculation.The B 1 field problems observed in this study included local B 1 inhomogeneities for the GE and Philips scanners, and overall B 1 miscalibrations for the Siemens scanners.Specifically, "StimFit" corrected these problems, resulting in accurate and homogeneous measurements.The improvement in local homogeneity of T 2 measurements was also demonstrated by a significant reduction in the FWHM for the cortex and whole kidney.Therefore, to address B 1 field problems, we recommend using EPG-based methods with B 1 correction instead of monoexponential fitting in multicenter studies.Additionally,   collecting a separate map of the transmit B 1 field to confirm this is advisable.Stimulated echoes can be suppressed by composite rectangular pulses with optimized gradient crushers, which are less sensitive to changes in B 1 .In this study, composite refocusing pulses were employed in the NIST reference protocol of Philips, which resulted in accurate T 2 measurements in the phantom (MAPE = 3.9% by exponential fit).
However, it should be noted that composite pulses are not suitable for EPG models like "StimFit," as they destroy stimulated echoes due to the presence of large crushers 31 ; hence, the lack of improvement when applying StimFit to the Philips NIST protocol as this uses composite pulses.Furthermore, compared with apodised sinc pulses, composite pulses cause a considerable increase in specific absorption rates.Main field (B 0 ) inhomogeneity effects were not addressed by the EPG model in StimFit, StimFit corrects only the T 2 inaccuracy due to the transmit field (B 1 +) heterogeneity, with B 0 issues neglected in the models of the Bloch simulation.However, T 2 values have previously been shown to be robust to B 0 inhomogeneities, as well as variations in T 1 relaxation time and magnetization transfer. 18,20mitations A limitation of "StimFit" is that it requires the waveforms of excitation and refocusing pulses to be known, which are vendor-specific and may not be accessible for all scanners.Future studies will further need to investigate if a simpler and more general method can be effective for harmonization across vendors.Another limitation of this study is its small sample size, which only includes healthy subjects and does not investigate patients with relevant diseases.Future research will include groups of patients, including the planned 400 chronic kidney disease (CKD) patients collected in the AFiRM study (Application of Functional Renal MRI to improve assessment of CKD https://www.uhdb.nhs.uk/afirm-study/), to expand the investigation.In addition, we mainly focused on intervendor variations, but the possible variation between scanners within the same vendor has not been fully investigated.This study only included different scanners from one vendor (Siemens) at two different sites.The two scanners showed similar performance in T 2 measurements and B 1 homogeneities, and therefore their results were combined in the statistical analysis of in vivo results.More detailed evaluations are needed to investigate whether the interscanner variations originated from the differences between MR vendors or other configuration issues such as MR models, MR system versions, and transmit system types.

Conclusion
Variations in quantitative T 2 measurements in the kidney were observed across scanners and vendors despite using a harmonized MESE protocol, due to variability in the B 1 field.An EPG-based fitting method (i.e., "StimFit") reduces the B 1 -associated errors and intervendor variations of measured renal T 2 values.

FIGURE 1 :
FIGURE 1: MR vendor, model, site information, and corresponding numbers of data sets (repeats) collected for healthy adult volunteers (vols) in the "Travelling Kidney" study and repeatability study.

FIGURE 2 :
FIGURE 2: T 2 measurements computed using the exponential fit (Exp.) and "StimFit" from the ISMRM/NIST system phantom for different MR systems.Both the UKRIN-MAPS and NIST reference protocol were evaluated.(a) Average T 2 measurements (in msec) of each sphere against reference values (c) in the physiologically relevant range (45-286 msec, also indicated by the red rectangular boxes in a and c).The black boxes on the top show the MAPE.Apart from the Philips NIST protocol using composite pulses, "StimFit" reduced the MAPE of all measurements.(b) Histograms of T 2 measurements from all voxels within different spheres, normalized using corresponding NIST reference values.The red line in the center represents the baseline.(d) Example source image, T 2 map (in msec) and B 1 map from the phantom.(e) Example fitting curves of the exponential fit and "StimFit," noting the stimulated echo in the signal.

FIGURE 3 :
FIGURE 3: Example T 2 maps (in msec) processed by monoexponential fit (Exp.Fit) and StimFit from the same volunteer on the three MR vendors.Converted B 1 maps (cB 1 ) estimated by StimFit and acquired cB 1 maps are provided, these can be seen to show similar patterns and normalized intensities.Nonideal B 1 and corresponding overestimation by the monoexponential fit can be seen in the upper left kidney for data collected on GE (red arrows), in the right kidney for the Philips dataset, and in both kidneys for the Siemens dataset.These issues are corrected using "StimFit."

FIGURE 4 :
FIGURE 4: Bland-Altman plots showing the agreement between T 2 measurements of the monoexponential fit and "StimFit" methods between the different vendors."StimFit" sufficiently reduced the variance across vendors.Each point corresponds to the measurement of one kidney.

FIGURE 5 :
FIGURE 5: Scatterplots showing T 2 values in the left and right whole kidney obtained using the monoexponential fit (a) and "StimFit" (b).Different shapes correspond to the different subjects."StimFit" can be seen to improve the local T 2 homogeneity and T 2 variation across vendors, including reducing the overestimation of the left kidney for Philips and the global overestimation for Siemens.

FIGURE 6 :
FIGURE 6: Boxplots comparing in vivo T 2 measurements using monoexponential fit and StimFit: (a) Mean T 2 values for different vendors.Significant differences in T 2 measurements were found between vendors for exponential fit, but not for StimFit (ANOVA).(b) Comparison of intervendor CVs from travelling volunteer scans and intravendor CVs from repeatability scans.Intervendor CVs were significantly higher than CVs for exponential fit, but not for StimFit (unpaired t-test).(c) The FWHM of the T 2 distribution measured from cortex, medulla, and whole kidney.The FWHM of T 2 measurements was significantly lower for StimFit compared to exponential fit in the cortex and whole kidney (paired t-test).
This study was a cross-site study with MRI data collected at four imaging sites (Sir Peter Mansfield Imaging Centre, University of Nottingham; Department of Radiology, Addenbrooke's Hospital, Cambridge University Hospitals NHS Foundation Trust; Developmental Imaging and Biophysics Section, Great Ormond Street Institute of Child Health, University College London; Centre for Cardiovascular Science, University of Edinburgh, Edinburgh, UK) with participants scanned under healthy volunteer ethics approval from the local research ethics boards.All participants provided written informed consent.

TABLE 1 .
Key Parameters of the NIST and UKRIN-MAPS T 2 Mapping Protocols TE = echo time; TR = repetition time; NIST = National Institute of Standards and Technology; UKRIN-MAPS = UK renal imaging network-MRI acquisition and processing standardization.

TABLE 2 .
T 2 Measurements from the ISMRM/NIST Phantom Journal of Magnetic Resonance Imaging 15222586, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jmri.29282 by University College London UCL Library Services, Wiley Online Library on [26/02/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License The MAPE and the LME model were used to estimate the consistency of measurements with the reference values.NIST = National Institute of Standards and Technology; UKRIN-MAPS = UK renal imaging network-MRI acquisition and processing standardization.a Compared with the exponential fit, "StimFit" reduced the MAPE of all measurements except for the Philips NIST protocol, which uses composite pulses.*6

TABLE 3 .
Travelling Kidney Study: Mean T 2 Values in the Cortex and Medulla Obtained with the Monoexponential Fit and "StimFit" Averaged Over Eight Subjects for Different Vendors (msec, mean AE SD) The coefficient of variance (CV) was calculated for T 2 values across different vendors.Difference = T 2 (Exp.)À T 2 (StimFit), which are significant in all measurements (paired t-test, P < 0.001).ANOVA = analysis of variance; Exp.fit = exponential fit; cB 1 = converted B 1 (1abs (FA nominal À FA actual )/FA nominal ).