Determining the longitudinal accuracy and reproducibility of T1 and T2 in a 3T MRI scanner

Abstract Purpose To determine baseline accuracy and reproducibility of T1 and T2 relaxation times over 12 months on a dedicated radiotherapy MRI scanner. Methods An International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) System Phantom was scanned monthly on a 3T MRI scanner for 1 year. T1 was measured using inversion recovery (T1‐IR) and variable flip angle (T1‐VFA) sequences and T2 was measured using a multi‐echo spin echo (T2‐SE) sequence. For each vial in the phantom, accuracy errors (%bias) were determined by the relative differences in measured T1 and T2 times compared to reference values. Reproducibility was measured by the coefficient of variation (CV) of T1 and T2 measurements across monthly scans. Accuracy and reproducibility were mainly assessed on vials with relaxation times expected to be in physiological ranges at 3T. Results A strong linear correlation between measured and reference relaxation times was found for all sequences tested (R 2 > 0.997). Baseline bias (and CV[%]) for T1‐IR, T1‐VFA and T2‐SE sequences were +2.0% (2.1), +6.5% (4.2), and +8.5% (1.9), respectively. Conclusions The accuracy and reproducibility of T1 and T2 on the scanner were considered sufficient for the sequences tested. No longitudinal trends of variation were deduced, suggesting less frequent measurements are required following the establishment of baselines.


INTRODUCTION
Quantitative magnetic resonance imaging (qMRI) utilizes MR methods that allow for measurements of physiological changes in physical units. Longitudinal (T 1 ) and transversal (T 2 ) relaxation times are examples of physical properties able to be measured using qMRI. The T 1 of a tissue is generally measured in parallel with dynamic contrast-enhanced MRI sequences to TA B L E 1 Acquisition parameters utilized for the three sequences tested. This includes T 1 -inversion recovery (T 1 -IR), T 1 -variable flip angle (T 1 -VFA) and multi-echo spin echo (T 2 -SE). Note FA = flip angle, TE = echo time, TR = repetition time, TI = time of inversion, FOV = field of view, FE/PE/SE = frequency/phase/slice encoding, respectively determining the qMRI methods' technical performance on a specific scanner. The International Society of Magnetic Resonance in Medicine/National Institute of Standards and Technology (ISMRM/NIST) System Phantom is commercially available for the execution of qMRI QA programs. It can measure both clinical scanner properties (e.g., SNR and geometric distortions) and a wide range of human relevant T 1 and T 2 relaxation times. [8][9][10][11] In past longitudinal studies, this phantom was scanned repeatedly (up to 100 days) to monitor changes in T 1 and T 2 over time using Magnetic Resonance Fingerprinting. 12,13 Variability in T 1 derived from traditional spin echo (SE) methods has also been assessed using the phantom: single center results found that accuracy and reproducibility of T 1 varied pre-and post-scanner upgrade. 14 Further, a multi-site study found these properties to be dependent on sample T 1 relaxation time, magnetic field strength, and sequence choice. 1 Previous longitudinal studies assessing T 1 and T 2 accuracy and reproducibility acquired measurements at infrequent and/or over short time periods, 14,1,15 or assessed relaxation times relevant to specific anatomy at different magnetic field strengths. 16 This study aimed to deduce scanner baseline accuracy and reproducibility for a wide range of T 1 and T 2 times by longitudinally monitoring the parameters on a 3T MRI scanner. This was completed by imaging the System Phantom over the course of 1 year at monthly intervals using standardized sequences. Quantifying the changes in parameters such as T 1 and T 2 over time is essential for advancing the use of qMRI clinically (e.g., in treatment response monitoring). 7

Data acquisition
An ISMRM/NIST System Phantom (serial#: 130-0111: CaliberMRI, Colorado, United States) was imaged monthly for 1 year (at least 2 weeks apart) using a 3T MRI scanner (Siemens Healthineers, MAGNETOM Skyra, Erlangen, Germany). Imaging was completed using a 20 channel Head/Neck coil. T 1 -weighted inversion recovery (T 1 -IR) and variable flip angle (T 1 -VFA) sequences were utilized for T 1 -mapping, while a multiecho spin echo (T 2 -SE) sequence was used for T 2mapping. Sequence and parameter selections (outlined in Table 1) were based on recommended protocols in the phantom manual. 17 All phantom setups, image acquisitions, and analyses were completed by one user (physicist with 3 years of MRI experience).

ISMRM/NIST system phantom
The phantom has a spherical geometry with a 200 mm inner diameter (ID) shell ( Figure 1). The T 1 and T 2 sequences imaged the T 1 -and T 2 -arrays embedded within the phantom, respectively. Each array contains 14 spherical (15 mm ID) vials filled with high purity water doped with either varying concentrations of NiCl 2 (T 1 ) or MnCl 2 (T 2 ). Nuclear magnetic resonance (NMR) IR methods have been used by NIST in the past to characterize these solutions under 3T and 20 • C conditions (reference values provided in Table 2). 8,9 Note that vials 1 and 5 from the T 2 -array were not included in the analysis at the recommendation of the manufacturer.
TA B L E 2 T 1 and T 2 values both measured by National Institute of Standards and Technology (NIST) and experimentally derived using T 1 -inversion recovery (IR), T 1 -VFA or T 2 -SE methods. Note that the NIST values were obtained from the phantom manual, 8

F I G U R E 1
The system phantom on top of a 3D-printed holder, fitted to the head/neck coil. 3D-orthogonal markings were drawn to assist with external laser alignment The phantoms' arrays covered a large range of relaxation times, including those found in the human body at 3T: T 1 = 121 ms to 1884 ms and T 2 = 30 ms to 79 ms. 8,11,18,19 The physiologic range of relaxation times was of particular interest, and thus results were separated into two categories: full vial range and, a subset, human vial range.
Temperature of the surrounding deionized water in the phantom was measured both before and after each scanning session using an NIST-traceable thermometer (supplied with the phantom).

Image analysis
All image processing was completed using a consistent software platform and analysis method. ImageJ (National Institutes of Health, Maryland, USA) was initially used to manually identify the center pixel locations of each array vial on the shortest time of inversion (TI) (35 ms) image of the T 1 -IR dataset. These locations along with all datasets were imported into an inhouse-developed Python script. This automatically positioned a 10-pixel (∼9.8 mm) diameter circular region of interest (ROI) to be at the center of each vial and on the central slice of the respective T 1 and T 2 array. The average signal for each ROI was calculated and fit to the corresponding signal equations for the respective pulse sequences (see Equations S1-S3 and the fitting parameters provided in Table S1).
To assess the accuracy error of the measured T 1 and T 2 times, the %bias was calculated for each monthly acquisition and each vial using a comparison to the NIST measured reference vial value: (1) To assess the reproducibility over the range of T 1 and T 2 times in the phantom, a coefficient of variation (%CV) was calculated using the individual vials' mean ( ) and standard deviation (SD) ( ), calculated over the F I G U R E 2 Coefficient of Variation (CV) calculated for each vial and each sequence from all monthly acquisitions. Note that vial 14 corresponded to the shortest vial reference T 1 and T 2 times, while vial 1 had the longest reference time 12 monthly repetitions:

RESULTS
The phantoms' T 1 and T 2 arrays were imaged monthly using the respective T 1 -IR/T 1 -VFA and T 2 -SE imaging sequences described in Table 1. The average interval between imaging sessions was 4 weeks. The mean T 1 and T 2 value derived for each vial and their SDs are listed in Table 2 and were calculated using all months' measurements. Table 2 also includes the NIST reference vial values, as reported by the manufacturer. 8,9 Figures S1-S3 display examples of model fits used to calculate these parameters. A wide range of accuracy errors in T 1 and T 2 measurements existed over the full vial range. Visualization of this variability can be seen in Figure S5. T 1 -IR had the smallest bias when all vials were included (median = +3.6%), compared to T 1 -VFA (+5.0%) and T 2 -SE (+5.8%) sequences. In terms of reproducibility, CV's over the full vial range (Figure 2) for the same sequences respectively were 2.5%, 4.3%, and 2.2%. The largest CV's ( Figure 2) and accuracy deviations were found to occur in the shortest reference time vials. For example, the T 1 -IR bias in vials 13 and 14 was, respectively, +21% and +71%, and −14% and −35% for The human vial range omitted results from the vials with the shortest reference times. Consequentially, the median bias fluctuated between approximately −20% and +20% (Figure 3), and IQR's and CV's were reduced compared to the full vial range (Table 3).
There was a strong linear correlation between NIST reference times and all measured relaxation times. The coefficient of determination, R 2 , was calculated by plotting the reference times against those measured ( Figure 4). R 2 for T 1 -IR, T 1 -VFA and T 2 -SE was found to be 0.999, 0.999, and 0.998, respectively.
Monthly changes in T 2 measurements over the 12 months can be seen in Figure 5, along with temperature fluctuations. Each month/vial is presented with its respective errors, generated from the SD of the fit. This was calculated using the square root of the diagonals of the covariance for the parameter. Similar results for F I G U R E 3 Bland-Altman plots for (a): T 1 -IR, (b): T 1 -VFA, and (c): T 2 -SE. The % difference can be observed between measured and reference T 1 and T 2 times for vials within the human range. Median biases (and lower-upper quartiles) are displayed and include: +2.0% (−1.6-+4.5), +6.5% (+0.7-+10.8) and +8.5% (+2 -+12.7) for T 1 -IR, T 1 -VFA and T 2 -SE, respectively F I G U R E 4 A strong linear correlation was found between (full vial range) reference and measured T 1 and T 2 times. This was true for (a) T 1 -IR, (b) T 1 -VFA and (c) for T 2 -SE sequences. Note that all axes have employed a logarithmic scale T 1 -IR and T 1 -VFA can be seen in Figures S8 and S9. On average, the initial and final temperature recorded each month was 20.1 • C ± 1.5 • C and 20.8 • C ± 1.0 • C, respectively. The change in temperature over individual imaging sessions was generally less than ±0.5 • C. Correlation coefficients (ρ) were calculated between recorded temperature and measured T 1 for IR and VFA sequences (ρ = 0.003 and ρ = -0.001, respectively), and for T 2 (ρ = 0.007). Similar calculations showed that there was no clear relationship between systematic variations over time with the T 1 or T 2 measurements (ρ < |0.001|).

DISCUSSION
According to the quantitative imaging biomarkers alliance (QIBA), the accuracy and precision of a quantitative parameter determine its reliability to diagnose disease or monitor a tissue response. 7 This study was F I G U R E 5 Monthly fluctuations observed in T 2 -SE measurements, with overlaid average temperature readings. Error bars were generated from the standard deviation of each vial (calculated from the parameter fit) designed to assess the reliability of T 1 and T 2 relaxation time parameters, derived using a 3T MRI scanner. This study expands knowledge in the field of qMRI by longitudinally monitoring samples with a wide range of T 1 and T 2 relaxation times at monthly measurement intervals.Accuracy and reproducibility results were comparable to previous studies completed using the same phantom type and similar sequences when including all vials. 12,14,1 Note that the advice to remove specific vials from the T 2 analysis was at the recommendation of the manufacturer. They believe there were probable mixing or labelling errors that occurred during the manufacturing of vial 5. The issues with vial 1 most likely derived from the storage of the vials' MnCl 2 solution prior to manufacturing the phantom: It was stored in glass, and It is suspected that the Mn within plated onto its glass storage bottle. 20 Vial 1′s solution has a low concentration of Mn and reducing this further would result in an anomalously longer T 2 than expected. These issues have since been resolved by the manufacturer; however, this highlights the need for monitoring qMRI systems and phantoms.
Due to limited scanner time availability, imaging could not be completed on the same day of each month. Instead, a time constraint of at least 2 weeks between imaging sessions was implemented, achieving a 4-week average spacing. Temperature variations between 18 and 22 • C had no observed effect on the measured T 1 and T 2 times. This was expected for the NiCl 2 solutions in the T 1 -array, with known minimal fluctuations within these temperature ranges. 11 There was a 1.6%/ • C linear dependence expected for the MnCl 2 T 2 -array solutions. 11 However, due to the small temperature fluctuations recorded in this study (averaged within 1 • C of the NIST reference conditions), no significant relationship was observed (ρ = 0.007).
Reproducibility was improved for T 1 and T 2 measurements in the human vial range of the phantom compared to the full range. The CV of T 1 and T 2 in this range for all sequences tested was less than 5%. Further, Bland-Altman plots in Figure 3 showed the bias of these parameters ranged between approximately −20% and +20%, with an average parameter overestimation. The average of the biases (+2.0%, +6.5% and +8.5% for T 1 -IR, T 1 -VFA and T 2 -SE, respectively) were far less in magnitude (<20%) than those likely to cause erroneous outcomes if used in applications like tissue discrimination (e.g., benign vs. malignant). 2,19 T 1 -IR, a gold standard T 1 -mapping method, had a greater accuracy and reproducibility compared to T 1 -VFA, in agreement with the literature. 1,19 This was expected as VFA methods are known to overestimate T 1 , along with have increased sensitivity to B1-inhomogenieity effects compared to IR and often require a correction technique. 1 Clinically, T 1 -VFA with 2-3 flip angles is preferred over IR methods due to shorter acquisition times. 7,10 This study aimed to follow QIBA guidelines by utilizing a common imaging protocol that was open source and could allow for prospective multi-site investigations. 7,17 Note that no B1-corrections were implemented in this study as there is currently no commonly used correction technique available. 21 A future study would utilize department-specific patient imaging protocols for T 1 -mapping and compare scanner baseline %bias and reproducibility.
There was a strong linear correlation between the reference and measured vial relaxation times (R 2 > 0.997). It can be seen in Figure 4 and Figure S5 that the largest deviations in %bias and reproducibility occurred for vials with the smallest relaxation times. This can be partially explained by the acquisition parameters utilized. For example, in the T 1 -array, vials 13 and 14 had reference times of approximately 30 ms and 21 ms, respectively; shorter than that of the first TI (35 ms) used in the T 1 -IR pulse sequence. Similarly, for the T 2 -array, vials 13 and 14 had reference times of 7 ms and 5 ms; less than the shortest TE (10 ms) used in the T 2 -SE sequence. Detecting shorter T 1 and T 2 times is often a challenge to scanner's gradient hardware and available sequence acquisition parameters. 12 However, these sequences were utilized as they are commonly available and designed to capture the wide range of relaxation times in the phantom.
For the T 2 -array, signals for shorter T 2 vials often approached the noise floor ( Figure S3). Also, the monoexponential fitting applied to the T 2 -SE signal,replicating methods used in the majority of clinical and preclinical studies, is known to be susceptible to inaccuracies generated by B1-inhomogeneities. 13,19,22 This can lead to imperfect refocusing flip angles, especially for the first echo ( Figure S4) and can contribute noise. 22,23,5 For these reasons, the first TE and signal were discarded from the fit, and a noise factor was introduced in the model fitting procedure (see Equation S3).
During post-processing of the T 1 -VFA magnitude images, the average signal from ROI's in vials with shorter reference times (9)(10)(11)(12)(13)(14) was observed to have signal saturation. This was especially the case for larger flip angles (20 • -30 • ). Thus, only magnitude images for FA's 2 • , 5 • , and 10 • were used to calculate T 1 for the saturated vials, similar to Keenan et al. (Figures S2, S6, and S7). 14 Figures S8 and S9 and Figure 5 show no trends in variability for T 1 or T 2 accuracy measurements over the course of the 12-month study. According to the literature,major system upgrades can cause large changes to occur in T 1 measurements. 14,24,25 During this study, two hardware replacements of the Transmit-Box (containing RF transmitters) occurred between months 7 and 8 and also months 10 and 11. Although no correlation between system upgrades and relaxation times were found, in T 1 -VFA measurements, the percentage pixels with signal saturation reduced in months 8 (−3.4%) and 11 (−2.3%) ( Figure S6). However, this was not significantly different when compared to other months, and hence the cause of the reduction was not determined.
With the high repeatability of the accuracy measurements observed, similar to Ihalainen et al., it is predicted that future measurements using this scanner would yield similar results. 15 Consequentially, QA frequency recommendations to the department involved conducting testing annually and surrounding the time of any major scanner upgrades. A future investigation would conduct similar measurements at daily intervals over 1 month to determine if any fluctuations occur in between the monthly measurements.These longitudinal and frequent assessments of qMRI scanner technical performance fluctuations are especially important in the case of treatment response monitoring. 7

CONCLUSION
In conclusion, our study found high accuracy and longterm reproducibility in physiologically relevant T 1 and T 2 times on a radiotherapy dedicated 3T MRI scanner. Baseline bias (and CV [%]) for T 1 -IR, T 1 -VFA and T 2 -SE sequences were +2.0% (2.1), +6.5% (4.2), and +8.5% (1.9) respectively. Shorter sample relaxation time vials had increased measurement instability; however, no systematic variations in accuracy over time were observed. For this scanner, it was recommended that only annual qMRI QA measurements need be taken combined with before and after any major scanner upgrades following baseline establishment. Further investigations are required to determine deviations in T 1 and T 2 when using department-specific sequences and to find the cause of the signal saturation fluctuations in T 1 -VFA acquisitions for shorter reference time vials.

AC K N OW L E D G E M E N T S
Access to the 3T MRI scanner at Liverpool Hospital was enabled by staff at the Cancer Therapy Centre in collaboration with Ingham Institute for Applied Medical Research (Physics). We would also like to thank Dr. Sirisha Tadimalla for her guidance in performing the model fitting for T 1 and T 2 measurements. This work was supported by the South Western Sydney Local Health District (SWSLHD) Top-Up Scholarship (Madeline Carr, 2021) and the SWSLHD Early Researchers Program Grants (Michael Jameson, 2017; Amy Walker, 2017).

D I S C L A I M E R
Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

C O N F L I C T O F I N T E R E S T
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

AU T H O R C O N T R I B U T I O N S
Madeline Carr performed the measurements, analyzed the data, and wrote the manuscript with support from all the authors. Kathryn Keenan, Robba Rai, Peter Metcalfe, Amy Walker, and Lois Holloway were involved in experiment planning and supervising the work.