A Temporal Instability Measure for fMRI Quality Assurance

There exist several fMRI quality assurance measures to assess scanner stability. Because they have practical and/or theoretical limitations, a different and more practical measure for instability would be desirable.

stability of the MRI scanner (i.e. the data it produces) is important. 20Although the FBIRN protocol includes tests for assessing the temporal stability of the signal (i.e. the radius of decorrelation [RDC], and temporal signal-to-noise ratio [tSNR], due to their implementation and the underlying assumptions they rest upon, they cannot capture all types and causes of instability 21,22 and see Table 1).
The aim of the present work was to assess the utility of a new statistical measure, based on the eigenratio of a correlation coefficient matrix, for monitoring the temporal stability of an MRI scanner.To ensure general applicability, a large fMRI QA dataset was used that included different manufacturers (Philips, Siemens, GE) and several head coils (8ch, 20ch, 32ch, and 64ch).

Background
Intuitively, for the ability to detect the in vivo haemodynamic response during an fMRI experiment, one expects to record data that, in the absence of a "brain activation," have a sufficiently high tSNR that is also constant (i.e.stationary) in time-both upon repeated experiments and while considering partial data of a single experiment.This inherently means that both the mean and the standard deviation of the measured signal are stationary.However, deviations in tSNR do occur and are caused by either the mean signal (eg due to head movement causing the voxel content to vary), the standard deviation (eg due to components heating up or the varying heart rate) or both changing over time.Conventionally, the term, instability, is used to describe a deviation in only the mean signal and in a way that affects all voxels similarly-for example, adding a constant value (i.e.scale) the signal of all voxels at certain (not all) time-points.In other words, subtraction and/or division of consecutive image volumes would be expected to be a constant if instability manifested as constant addition/multiplication.In this sense (i.e. that the signal in neighboring voxels co-vary in time) statistical independence is reduced among the voxels.This is the kind of instability that, for example, the RDC aims to capture. 21As one Sensitive to sources of instability that do not alter the mean signal in VOI samples more data points-or more specifically, when one increases the number of voxels, N, in a square region of interest (ROI)-the RDC is taken as the asymptotic value, ffiffiffiffi ffi N p , after which the tSNR does not improve further.In the left column of Table 1, assumptions and some of the relevant recommended procedural steps for calculating RDC are summarized.Several limitations become apparent-some of which are briefly mentioned here.First, although fMRI is an inherently 3D method, RDC utilizes a 2D ROI and even that only in a single slice.As Fig. 1a illustrates, there is a clear spatial dependence on the temporal correlation of the voxel time series, where immediately adjacent voxels in a given slice are more correlated than immediately adjacent voxels in different slices.Second, the RDC protocol checks data from only in the center of the phantom, but instability can have an unpredictable spatial component that may be more emphasized near the edges of the phantom (Video S1).Finally, although the small ROI in the center of the phantom provides a relatively uniform signal in birdcage coils, this no longer holds in more modern multichannel receiver coils (Fig. 1b).Therefore, a complementary measure would be desirable that includes a 3D volume (possibly the entire phantom) in estimating stability of the scanner.
The Proposed Temporal Instability Measure Suppose an fMRI time-series of a gel phantom with T = 200 time-points and N ≈ 56,000 voxels, covering the entirety of the phantom-but only the phantom, not the background of where T is the number of time points in the time-series and from this a correlation matrix was calculated for all possible pairs of voxels.This is the conventional approach where the correlation coefficient is calculated for the entire time-series.As such, time represents the sample and space (i.e. the different voxels) the variable.Note that the correlation within the slices is higher than between slices and adjacent voxels have the highest correlation (i.e.voxels 1 step away from the diagonal line or voxels five steps away, i.e. one step away in the other dimension after the reshape step in Matlab).(b) Demonstration of the spatial non-uniformity of the voxel signal from the phantom of experiment #19 with the 32ch coil.The red box in the center illustrates the usual placement of the ROI used for QA measures.Its magnification with adjusted windowing demonstrates that the voxel intensity is far from uniform inside the ROI.(c) Signal profile of voxel intensity from a large VOI (≈ 56,000 voxels) from the first time-point of experiment #8 with the 8ch coil reshaped into a 1D-vector.On the y-axis are arbitrary intensity units.The expectation is that in an ideal dataset this undulating curve is reproduced with high fidelity at all time-points.
the image volume.Conventionally, we think of individual voxels and consider the temporal evolution (i.e.time-series) of their signal amplitude, because that conveniently relates to the temporal evolution of the voxel-wise hemodynamic response.Pairwise assessing the temporal correlation of the voxels would require a 56,000 Â 56,000 matrix where each entry records the correlation coefficient between two of the 56,000 voxels and evaluated across the 200 time-points in the time-series-similar to Fig. 1a that was calculated for a much smaller 5 Â 5 Â 5 volume of interest (VOI).An ideal but realistic dataset (i.e. one with constant signal with additional small and uncorrelated noise) would have ones on the diagonal and close to zeros in all the off-diagonal positions of the correlation coefficient matrix.
Conversely, we can think of all the voxels that contain the object of interest at any given time as our "signal evolution" (Fig. 1c) and testing the pairwise correlation of the different time points in the time-series.In this case, one would obtain a 200 Â 200 matrix, where each entry provides the correlation coefficient between two time-points and evaluated across all the 56,000 voxels.Now, the same ideal but realistic data set would result in ones on the diagonal and close to one in all of the off-diagonal entries of the matrix (cf.Fig. 2a).
To formalize this idea mathematically, consider the data to be a matrix X ℝ N ÂT , where each column x !i , i 1, …, T f g contains the N voxels (eg N ≈ 56,000) and represents one of the T discrete time-points (eg T = 200 with repetition time [TR] = 3 seconds) in the echo-planar-imaging (EPI) timeseries, 23 Let b X be the related matrix where each column, x !i , is centered by its own mean and divided by its own standard deviation.Entries of the T Â T matrix b X t b X , where the superscript t represents the matrix transpose, contain the correlation coefficient between two time-points in the time-series.An eigendecomposition of this correlation coefficient matrix provides the eigenvectors as the columns of P and the eigenvalues as the diagonal elements of D, that is, The ratio of the largest eigenvalue to the sum of all the eigenvalues (i.e., the eigenratio), max is a new measure of how well the signal of any voxel (regardless of its mean intensity) is reproduced across the different time-points. 24To cast it in a more useful form, 1-eigenratio is proposed as the temporal instability measure (TIM), which increases in value as the level of instability increases.If all entries of the correlation coefficient matrix were close to 1 (i.e., an ideal dataset), the eigendecomposition would provide λ 1 ≈ 1 and λ i ≥ 2 ≈ 0 and hence TIM would be ≈0.Conversely, if the correlation coefficient matrix was approximately the identity matrix, that is, the covariance of the signal between two different time-points was very small, one would have λ i ≈ 1 for all i 1, …, T f g.Hence, the eigenratio would be ≈ 1 = T and TIM would be nearly 1. Generically speaking, TIM >0 indicates some sort of instability in the time-series, be it thermal or systematic and regardless of whether it is intermittent or present throughout the experiment.
In the context of principal component analysis (PCA), the goal is a dimensionality reduction-in the present case describing the original data with T = 200 time-points with a linear combination of less than T components. 25For a fMRI dataset where the different EPI image volumes are nearly identical there would be a single eigenvalue ≈1 and all other eigenvalues ≈0 (i.e.TIM ≈ 0), which means that the entire experiment of 200 time-points could be adequately described with a single EPI volume.
Note that the eigenvalue decomposition of the correlation coefficient matrix in Eq. 2 is related to the singular value where U ℝ N ÂN and V ℝ T ÂT are unitary matrices and the diagonal entries of Σ ℝ N ÂT are the singular values. 26y plugging Eq. 4 into Eq. 2 gives where the fact that U is unitary (i.e.U t U ¼ 1) was used to simplify the expression.Thus, the squared singular values of Σ 2 in Eq. 5 are the eigenvalues of the correlation coefficient matrix, D, in Eq. 2.
Ratio of eigenvalues is also commonly used in statistical physics when calculating the correlation length of translationally invariant spin-spin correlation functions using transfer matrices. 27or a high-quality dataset that is expected from a modern 3 T MRI scanner, the eigenratio of the correlation coefficient matrix will be close to 1 (i.e.TIM will be close to zero).For easy practical implementation and useful comparisons, these values are provided in parts-per-million (ppm).
Although values closer to zero for TIM indicate a more stable measurement, because background noise is unavoidable, it would be unreasonable to expect that TIM could reach zero.Instead in a time-series measurement that is corrupted by random noise with a small amplitude but otherwise stable, one expects TIM to be low but strictly above zero.

Reference Temporal Instability Measure (TIM 0 )
To experimentally simulate data for a reference bound the signal of the ≈ 56,000 voxels was taken from the first EPI volume of each dataset and used to synthesize a realistic time-series by adding Gaussian noise to each of 200 copies of this signal.The standard deviation of the Gaussian was estimated from the noise slice (i.e. the 41st slice) of the same EPI volume.The correlation coefficient matrix and its eigendecomposition were calculated identically to that of the actual data but to emphasize that it is a simulated reference value TIM 0 was used to denote 1 minus its eigenratio.Finally, the difference between the temporal instability measure TIM and the reference bound TIM 0 is denoted by TIM Î for each dataset.The ^symbol is used to explicitly indicate that this value for the instability is an estimate, whose non-zero numerical value is only an indication that an instability, above that expected from stationary random noise alone, exists.However, its magnitude is only an approximate indicator of the magnitude of the instability because a linear disentanglement of purely systematic noise and purely random noise is not generally feasible.

Data Acquisition
This study involves a spherical gel phantom that was prepared in accordance with the FBIRN protocol and by following the recipe in Friedman and Glover. 19The data were collected in 20 sessions, 10 in the morning and 10 in the afternoon.In each session, three fMRI runs were collected on a 3 T Achieva System (Philips Healthcare, Best, Netherlands) scanner with both an 8ch and a 32ch coil from the same vendor, a total of 120 fMRI runs, 60 per coil.Each run included 200 time-points (in addition to five dummy scans) and utilized a 2D EPI sequence with 3-mm isotropic voxels, 41 slices, echo-time (TE)/repetition time (TR) = 30/3025 msec and flip angle = 90 . 11Forty of the 41 slices covered the phantom, while the 41st slice was positioned outside of the phantom, with an offset of 10 cm from the isocentre, to collect background noise.The coil order in each session was pseudo randomized.In addition to the 120 EPI time-series collected on the Philips scanner, 29 similar datasets were borrowed from two other sites (Cornell University and University College London).[30]

Data Analysis
Data analysis was performed with custom-made Matlab (version 2021a; The MathWorks, Inc., Natick, MA, USA) scripts and SPM12. 31First, each of the 120 + 29 time-series were rigid-body aligned to its first EPI volume and subsequently the voxel-wise signal evolution was detrended using second-degree polynomial regression.Next, based on the eigendecomposition of the correlation coefficient matrix, TIM, TIM 0 and TIM Î were calculated in a large, and automatically detected VOI covering almost the entire phantom and containing ≈ 56,000 voxels, in each of the 120 + 22 EPI data sets and ≈ 150,000 voxels in each of the seven Multiband datasets.
For comparison, the tSNR was calculated in a cubic VOI with a side length of 11 voxels approximately at the center of the phantom and RDC was calculated in a square ROI with the same side length, that was positioned in the slice that covered approximately the center of the phantom as prescribed by the original publication. 21

Simulation for Testing Sensitivity of QA Measures
To compare the sensitivity of the three different QA measures (tSNR, RDC and TIM) a realistic dataset with a tunable level of instability was simulated by the following procedure in Matlab: 1. Choose a good dataset, that is, a dataset with a low TIM and/or high tSNR.α according to the respective protocol, that is, in a large VOI encompassing almost the whole phantom for TIM, in a small VOI for tSNR and in a small ROI for RDC and plot the values as a function of α.
Since Δ x,y,z, t ð Þis computed through the original data itself, the resulting instability, that is, α*Δ, naturally provides a realistic spatio-temporally varying time-series in which the strength of the instability (i.e.signal variability that remains even after detrending the voxel time-series by a second-order polynomial) is controlled by the parameter α.

Statistical Analysis
All statistical analyses were performed in Matlab using various methods of nonparametric bootstrap resampling algorithms. 32Pvalues <0.05 were considered significant.
ESTIMATING RELIABILITY OF TIM.The reliability of the empirical determined TIM values were assessed by calculating a 95% confidence interval (CI) via resampling the individual centered and normalized reshaped EPI volumes, that is resampling the voxels while preserving the temporal structure and hence the intrinsic instability of the original EPI time-series.Each resampling step provides a new correlation coefficient matrix of which a TIM value is calculated (see Fig. 3).It is important to note that resampling, as described earlier, will jumble the voxels and therefore the resampled MRI image (and thus the undulating curve from Fig. 1c) will look different for each bootstrap re-sample.A total of 1000 bootstrap TIM samples were generated, and their 2.5% and 97.5% quantiles were estimated resulting in a 95% CI for the empirical TIM values.In addition, the confidence interval lengths in % units of their respective TIM values, denoted as L CI , and the resulting mean of L CI across all experiments were calculated.Because there exist 56,000!possible rearrangements of the voxels (i.e.any realistic bootstrap procedure can only resample a small fraction of all possibilities), bootstrap resampling was run twice for all datasets to assess the stability of L CI , where numbers in superscript (1), (2) will indicate the corresponding bootstrap run.

COMPARING 32CH AND 8CH COIL PERFORMANCE OF PHILIPS DATA.
To assess the stability of the EPI time-series of the 32ch and 8ch coils of the locally acquired datasets (excluding the four outliers), nonparametric bootstrap two-sample t-tests with 1000 resamples and test-statistic , where for the QA values (TIM, tSNR, and RDC) were performed. 32SESSING SPATIAL UNIFORMITY OF THE TEMPORAL INSTABILITY.To assess whether a spatially uniform scaling/ addition were present in the 149 data sets, which TIM would not be able to detect, three analyses were performed, based on the following equation: where i 1,…, N f g counts the voxels while k, l 1, …,T f grepresent two of the T EPI image volumes in the time-series.Evaluating the entire sum, across all the N voxels provides a mean scaled difference (also given in % units) for all possible unique pairs of EPI volumes in the time-series and can be displayed as an upper triangular matrix (i.e.l > k) for easy visualization.To quantify the average present spatially uniform addition/scaling across all experiments, a 95% FIGURE 4: TIM, TIM 0 , and TIM Î across all experiments.Top: Proposed instability measure with experimentally determined values TIM (red markers), reference bound TIM 0 (green markers) and the difference of both values, TIM Î (inset blue markers), for the 32ch coil.Bottom: Same measures but for the 8ch coil.The value of TIM 0 is determined by taking the signal from the first EPI volume (≈56,000 voxels) and adding normally distributed noise (with variance level estimated from the noise slice) to each of the 200 copies of this signal.TIM 0 is included to signify the practically attainable lowest TIM value-that is, one where both the background noise level and the signal level are stationary throughout the experiment.The black error bars on the red markers show the 95% confidence intervals computed via bootstrap (cf.Fig. 3). 32Note that the y-axis for all four plots is scaled in ppm (parts per million), and additionally broken for the TIM and TIM Î values of the 8ch coil between 1600 and 10,000 ppm and 1000 ppm and 9000 ppm, respectively.CI of the means of the upper triangular matrix C k,l ð Þ was estimated.

Results
Figure 2a,b provides a visual example of the correlation coefficient matrix for a good and bad dataset, respectively.The blue stripes in Fig. 2b clearly show that a disturbance occurred around the 60th time-point (see also Video S1, TIM = 10,780 ppm).In contrast, Fig. 2a provides the correlation coefficient matrix of a stable QA scan (TIM = 182 ppm).
Figure 4 shows TIM (red dots, mean TIM 8ch = 753.75ppm, mean Tim 32ch = 211.26ppm) including the 2.5% and 97.5% quantiles (vertical black lines, mean L (1) CI = 2.96%, mean L (2) CI = 2.99%), TIM 0 (green dots) and TIM Î (inset blue dots) for all of the 120 Philips QA scans (i.e. 60 for both a 32ch and an 8ch coil), with a higher stability for the 32ch coil (top) compared to the 8ch coil (bottom) (observed two-sample t TIM = 26.3).Although experiment #1 and #10-12 with the 8ch coil were all unstable, for the latter three the increase in TIM was accompanied by an increase in TIM 0 , but for experiment #1 TIM 0 remained close to 0. In Supplementary Video S1, the data from experiment 1 are shown, which correspondingly indicate that the spatially variable signal deviation imparts a different level of instability at the edge compared to the center of the phantom.
A comparison between the RDC and the TIM measure across the 120 Philips datasets is provided in Fig. 5.As outlined in the previous paragraph the 32ch coil performs better than the 8ch coil based on TIM (see also inset in red frame).It turns out that the other two measures are not able to separate the coils as effectively (observed two-sample t tSNR = À0.2,P tSNR = 0.58, observed two-sample t RDC = À6.2).Furthermore, when tSNR decreases from ≈180 to ≈120, RDC increases (i.e.indicating an improvement) from ≈2.2 to ≈3.1, while TIM increases from ≈300 ppm to ≈1800 ppm, that is, RDC fails to identify an apparent instability.
Figure 6 shows the mean scaled signal differences between pairs of EPI image volumes at different time-points (upper triangular matrices on the left) and additionally two representative histograms of the signal differences (middle histogram), that is, the numerator in Eq. 6 and of the scaled signal differences (rightmost histogram), that is, total summand in Eq. 6, for the 32ch (a) and 8ch (b) coils.In this particular comparison, the 8ch coil has a higher mean signal difference (μ = 5) than the 32ch coil (μ = 2.6), but both of them are small in comparison to the standard deviation of their respective histograms (σ 8 = 25, σ 32 = 16).The upper triangular matrices on the left are representative of all experiments, where the 95% CI of means of the upper triangular matrices were [À0.0079%,À0.0019%] and [À0.0355%,À0.0012%] for the 120 and 29 datasets, respectively.
Figure 7 provides the TIM results along with the 2.5% and 97.5% bootstrap quantiles (vertical colored lines, mean L (1) CI = 2.16%, mean L (2) CI = 2.19%) across the 29 additional experiments on the borrowed QA data and thereby demonstrates more general applicability (i.e. for other vendors and at other sites) of the proposed measure.
Figure 8 shows the simulation results for the three QA metrics of interest as a function of the simulated instability severity ranging from 0 to 2. Increasing the severity of the simulated instability from 0 to 2 decreases tSNR from 181 to 174 (decrease by ≈ 4%), increases TIM from 195 to 275 ppm (increase by ≈ 41%) and increases RDC nonsmoothly from 2.53 to 2.6 (increase by 3%), where the latter result provides an example for the inverse fallacy between RDC and tSNR (see Table 1).

Discussion
A new fMRI QA measure, which is based upon the eigenratio of a correlation coefficient matrix, was proposed and tested.In the present implementation, each entry of this matrix is a correlation between the signal of all the voxels at two different time-points in the time-series rather than the conventionally used correlation of different voxels across the entire length of the time-series.The measure has several advantages, which together make it a sensitive and reliable measure of fMRI data quality.
The voxel signal time-series is often modeled as separate independent components of signal, random noise and instability, the latter of which is assumed to affect all voxels similarly at any given time-point.Even if such uniformly scaled instability existed and could be modeled as a constant addition at some time-point, its level was found to be negligible in the 149 datasets included in this study.Moreover, such a model does not cover all scenarios adequately because in practice thermal noise and instability can manifest similarly and act simultaneously and in a dependent manner.For example, RF instability with a specific nonuniform spatial pattern may affect nearby or distant voxels differently.As such, observing the time-series of any given voxel in isolation it would be difficult, if not impossible, to disentangle such instability from a purely thermal noise source.It is also difficult to practically implement such a model.For example, the radius of decorrelation, RDC, relies on all time-points in a time-series to obtain a reference SNR, against which the temporal SNR is compared.First, it includes data with instability into the reference SNR measure and secondly signal variation is not the only source that can affect SNR.Rather, SNR could also change when the noise variance is not stationary throughout the experiment.Similarly, the method by Greve et al proposes to use two different flip angles to separate instability from noise-but this inherently confounds the repetition of the experiment and the instability it tries to separate. 33In other words, it must be assumed that the instability occurs identically in the two acquisitions.
Conversely, the proposed TIM uses a single time-series and in principle all voxels within the gel-phantom can be used so any instability, regardless of whether it is global or local can be detected.In principle, this extends the applicability of TIM to structured phantoms for quality control, such as the International Society of Magnetic Resonance Imaging/ National Institute of Standards and Technology (ISMRM/ NIST) system phantom. 34However, the ISMRM/NIST phantom is best suited for quantitative magnetic resonance imaging (qMRI) to measure T1 relaxation and applying a standard EPI sequence, that is usually used in fMRI, tends to result in signal dropouts and distortions which confound the results obtained by TIM and makes the phantom less suitable for fMRI quality control.Instead, one might rather rely on more specialized fMRI phantoms that were recently developed, such as the FUNSTAR phantom (https://www.goldstandardphantoms.com) or the BrainDancer dynamic phantom, where the latter induces T2* changes to artificially create resting-state brain signals. 35t is possible to integrate TIM into a variety of quality control protocols.A newly developed quality control tool was able to predict the image quality using a binary accept/exclude classifier with 64 input features with 76% accuracy. 36Future studies may incorporate TIM as an additional feature with wide applicability to improve the prediction accuracy of image quality from unseen sites and different vendors.
Note that TIM is a single measure that does not attempt to identify the source of instability (although when TIM 0 also increases for an experiment, it is likely that the background noise level is elevated).Rather, the recommendation is that when the reproducibility of the voxel signals during the experiment is found to be sub-optimal, the cause is investigated separately with specific tests.For example, subtraction of the consecutive images in the time-series can be a telling diagnostic.In the absence of any disturbance, the subtracted images should provide the leftover noise, while gradient instability tends to create variability in the Nyquist ghosts and RF instability can be suspected if the phantom itself is apparent in the subtraction.][39] An important criterion for TIM was that it had to be a widely applicable and practical QA measure that could be implemented at most fMRI sites-even clinical ones and those without more general access to hardware and software which is usually only attainable via special research agreements with the vendor.For example, this affects the estimation of the noise variance which was obtained from magnitude images.Better estimates could be extracted from the proper combination of the complex k-space data from the individual coil channels, but because that would limit the applicability of the method and because similar magnitude data are used for the statistical analysis of in-vivo fMRI data the decision was made to use the magnitude images.

Limitations
First, because the correlation coefficient is invariant to scalar multiplication and constant addition, TIM is not sensitive to instability in the time-series that manifests in a way as to affect each voxel identically.However, scrutiny of the data available here indicates that instability does not manifest in such uniform manner.Another aspect, that may be considered a limitation, concerns TIM 0 .Namely, although voxel intensity from the first EPI volume was taken as the "signal" and from the noise slice noise variance was estimated, in reality even the first time point is a sum of signal and some sort of error (thermal and/or systematic).However, in data with sufficiently high SNR the difference will be small.Furthermore, TIM 0 and TIM Î (=TIM À TIM 0 ) are used for visual comparison only.What is more informative is how the value of TIM changes among the experiments performed on different days or on different scanners (i.e. to identify outliers where something has gone wrong during data acquisition).Finally, the present work does not provide empirical results from in vivo data, because standard quality assurance protocols generally involve phantoms.Applying the proposed instability measure to in-vivo data would be of limited utility for scanner instability assessment due to the additional physiological noise contributions.

Conclusion
The proposed TIM is widely applicable and easy to insert into existing QA protocols that both research and clinical sites can perform and log for long-term performance checks.The processing pipelinewritten in Matlabis available upon request from the authors.

Figure 1 :
Figure 1: Background in support of the proposed temporal instability measure.(a) Correlation coefficient matrix for a 5 Â 5 Â 5 VOI placed in the center of the phantom from experiment #18 with the 32ch coil.The extracted VOI data were first reshaped into a matrix of size T Â 125,where T is the number of time points in the time-series and from this a correlation matrix was calculated for all possible pairs of voxels.This is the conventional approach where the correlation coefficient is calculated for the entire time-series.As such, time represents the sample and space (i.e. the different voxels) the variable.Note that the correlation within the slices is higher than between slices and adjacent voxels have the highest correlation (i.e.voxels 1 step away from the diagonal line or voxels five steps away, i.e. one step away in the other dimension after the reshape step in Matlab).(b) Demonstration of the spatial non-uniformity of the voxel signal from the phantom of experiment #19 with the 32ch coil.The red box in the center illustrates the usual placement of the ROI used for QA measures.Its magnification with adjusted windowing demonstrates that the voxel intensity is far from uniform inside the ROI.(c) Signal profile of voxel intensity from a large VOI (≈ 56,000 voxels) from the first time-point of experiment #8 with the 8ch coil reshaped into a 1D-vector.On the y-axis are arbitrary intensity units.The expectation is that in an ideal dataset this undulating curve is reproduced with high fidelity at all time-points.

FIGURE 2 :
FIGURE 2: Correlation coefficient matrix implemented as in the proposed instability measure.Each entry of this matrix is the correlation coefficient calculated across all voxels in the VOI (see Fig. 1c) for two time-points.As such, space represents the sample and time the variable.(a) Correlation coefficient matrix of a good dataset (experiment #5).(b) For comparison, the correlation coefficient matrix of a bad dataset (experiment #1) is also provided.Note that for the good data set, the windowing is different.

2 .
Denote D r x, y, z, t ð Þand D rd 2 x, y, z,t ð Þas the realigned and second-order detrended version of the original data, respectively.3. Separately compute D rd 3 x,y,z,t ð Þ , which is a third-order detrended version of the data.4. Compute the difference between D rd 2 x,y,z,t ð Þ and D rd 3 x,y,z, t ð Þand denote it as Δ x,y,z, t ð Þ , which represents the inherent instability that is unaccounted for by a second-order polynomial fit. 5. Define a parameter α 0, 2 ½ , controlling the level of the simulated instability, and create simulated data as D sim α x, y,z,t ð Þ¼ D rd 2 x, y,z,t ð Þþα*Δ x, y, z, t ð Þ .6. Compute the TIM, tSNR and RDC for D sim

FIGURE 3 :
FIGURE 3: Scheme of bootstrap resampling.Schematic representation for the generation of one bootstrap sample.The top half of the figure represents the ordered voxels of the original EPI volumes, where one volume contains about 56,000 individual voxels.The voxel signal intensities are indicated by small colored boxes arranged in rows.Resampling with replacement is performed on the individual voxels preserving the temporal structure of the EPI time-series of each voxel, that is, the colored boxes remain in the same horizontal order.One representative resample is shown in the bottom half of the figure.For each resample, a new realization of the voxels is achieved, a new correlation coefficient matrix is calculated and, consequently, a new TIM value is obtained and stored.From all resamples (here 1000) together we determine the 2.5% and 97.5% quantiles to find a 95% confidence interval for the empirical TIM.

FIGURE 5 :
FIGURE 5: TIM and RDC compared to mean tSNR.Top: TIM vs. tSNR scatter plot for both the 8ch and the 32ch coils of the local Philips data.The larger red box is a magnified representation of a subset of the data within the smaller red box to highlight the fact that TIM can separate the coils better than either RDC or tSNR.Note that the y-axis is scaled in ppm.Bottom: RDC vs. tSNR scatter plot for the same two coils.The tSNR was calculated in a 11 Â 11 Â 11 VOI and, as per the original publication, the RDC was calculated in a 11 Â 11 ROI placed in the center of the phantom.

FIGURE 6 :
FIGURE 6: Test for additive, multiplicative signal differences.(a) The mean scaled signal difference matrix (left) and corresponding histograms of voxel-wise additive (middle) and multiplicative (right) signal differences for the 32ch coil.(b) Same kind of plots for 8ch coil.Each entry of the upper triangular matrices is the mean scaled signal difference between any two EPI volumes in the time-series (cf.Eq. 6).The black arrows indicate the actual EPI volumes compared in the histograms, where the red lines show the respective mean and standard deviation of the distribution, and the black line marks zero.Note that the matrix entries and x-axes in the rightmost histograms are given in %.These results indicate that instability that effects each voxel similarly (in either additive or multiplicative fashion) is very small in amplitude, if it at all occurs.

FIGURE 7 :
FIGURE 7: TIM results from Cornell University and University College London.llustrative TIM results from two different sites, two different vendors and several head coils.Counting from the left on the horizontal axis, the first 15 experiments correspond to two different 3 T Siemens scanners (open circle and cross) at UCL using 20ch (red markers), 32ch (green markers) and 64ch (blue markers) head coils.The next seven experiments (green upside-down triangles) correspond to a 3 T GE scanner at Cornell University using a 32ch coil.The last seven experiments (black circles) correspond to a 3 T Siemens scanner at Cornell University using a 32ch coil and Multiband imaging sequence.The error bars for each experiment correspond to the 2.5% and 97.5% bootstrap quantiles.Note that the first 22 and last 7 experiments were scaled in ppm with respect to the left or right y-axis, respectively.The two colorcoded arrows point to the respective y-axis and the black line serves as a visual demarcation.Note that no TIM 0 values have been calculated since not all borrowed QA data included a noise slice.

FIGURE 8 :
FIGURE 8: Simulations for RDC, tSNR and TIM.From top to bottom: RDC, tSNR, and TIM values are given for simulated data as a function of the parameter α, which increases that amplitude of instability linearly.Note that the y-axis for the TIM values is scaled in ppm.The tSNR, RDC, and TIM values were calculated from data extracted from a 11 Â 11 Â 11 VOI, 11 Â 11 ROI or the whole phantom, respectively.

TABLE 1 .
RDC/TIM Assumptions and Limitations Assumes (does not check if) spherical symmetry is validDoes not assume spherical symmetry Not sensitive to sources of instability that leave the mean constant (eg rotation of RF field)