Assessment of measurement precision in single‐voxel spectroscopy at 7 T: Toward minimal detectable changes of metabolite concentrations in the human brain in vivo

To introduce a study design and statistical analysis framework to assess the repeatability, reproducibility, and minimal detectable changes (MDCs) of metabolite concentrations determined by in vivo MRS.


| INTRODUCTION
In vivo 1 H MRS allows the non-invasive detection of metabolic changes in various organs. 1 In the brain, such changes are frequently associated with diseases of the central nervous system. In certain diseases, such as cancer, it is sufficient to obtain the ratios of the metabolites of interest, as they already differ significantly from healthy tissue. 2,3 However, in neurodegenerative diseases, such as Alzheimer's 4 or Parkinson's disease, 5 or psychological disorders like schizophrenia, 6 in which the metabolite differences between healthy and diseased tissues are more subtle, it is crucial to obtain absolute metabolite concentrations. 7,8 The use of ultrahigh field (UHF) strength offers increased SNR and enhanced spectral dispersion, which enables an improved distinction between overlapping metabolites. 9 Despite the potential of UHF-MRS to measure an increased number of metabolites more reliably, and thus to examine disease-related changes of the concentration levels of metabolites, so far it remains mostly a research tool with a focus on comparisons between patient and control cohorts, 10 usually without the option for individual diagnosis based on MRS. Arguably, the most important reasons for this are: (1) the ranges of metabolite concentrations in healthy controls and patients often exhibit substantial overlap, and (2) no physically proven measurement uncertainties of concentrations obtained by MRS are available. 11 In other experimental areas, SDs and measurement uncertainties are determined by repeating the same measurement several times, and the result is ideally compared with a ground truth. Although phantoms with known concentrations can be used for repeated MRS measurements and may serve as ground truth for in vitro measurements, they still fail to mimic some technical aspects of the acquisition, the complex interplay between metabolite concentrations measured, tissue structure and partial-volume effects, and differences in the microenvironment of biological tissues. Moreover, in a clinical setting, it is usually impossible to repeat high-quality in vivo MRS measurements to estimate the SD properly. Hence, Cramér-Rao lower bounds (CRLBs), as suggested by Cassavila et al, 12,13 are commonly used to estimate the reliability of measured metabolite concentrations.
To minimize CRLBs, which are negatively impacted by a reduced SNR due to T 2 relaxation and complex frequency signatures caused by long J-coupling evolution, short TEs (≤ 10 ms) are desirable to acquire spectra in single-voxel spectroscopy (SVS) at UHF. 14,15 This facilitates the quantification of metabolites such as glutamine, glutamate, and γ-aminobutyric acid. 16 Furthermore, the RF pulses used for localization should ideally be insensitive to B + 1 inhomogeneities, as well as B 0 , to ensure proper localization. Additionally, they should yield a small chemical shift displacement (CSD) to ensure that signals from different metabolites originate from the same location. Furthermore, low pulse energy and peak power are desirable to avoid limitations related to the specific absorption rate (SAR) or hardware capabilities.
Short TEs and high SNR in SVS can be achieved using the spin-echo, full-intensity acquired localized (SPECIAL) 17,18 spectroscopy sequence. Furthermore, a comparably low CSD and low sensitivity to B + 1 is met by the adiabatic inversion pulse in SPECIAL. 19 However, hyperbolic secant (HS) 20 pulses, as proposed for the pre-excitation inversion in the original implementation of SPECIAL using a surface coil, which also have been used in several subsequent studies with this sequence, 21-26 require a relatively high peak power to reach adiabaticity compared with other adiabatic pulse types. In UHF applications in deeper brain structures, such as the hippocampus, this may lead to either the requirement of increasing the pulse duration -resulting in a reduced bandwidth (BW) and hence a larger CSD -or reaching hardware limitations of available peak power -resulting in loss of adiabaticity -, and hence an incomplete inversion and unwanted signal loss.
More recently, gradient-modulated pulses, such as the gradient offset independent adiabaticity (GOIA) 27 pulse and the wideband, uniform-rate, smooth truncation (WURST) 28 pulse, were used in MRS studies. 29 These pulses provide a substantially decreased CSD and sharper pulse profile compared with an HS pulse 29,30 of equal led to slightly improved repeatability, but overall reproducibility appeared to be limited by differences in positioning, calibration, and other day-to-day variations throughout different sessions.

Conclusion:
A framework is introduced to estimate the precision of metabolite concentrations obtained by MRS in vivo, and the minimal detectable changes for 13 metabolite concentrations measured at 7 T using SPECIAL are obtained.

K E Y W O R D S
CRLBs, measurement precision, minimal detectable change, MR spectroscopy, reproducibility/repeatability, SPECIAL duration and total pulse energy, while allowing to reduce peak power requirements for the same inversion BW. Since peak power is often a limiting factor, gradient-modulated pulses are a commonly used alternative. Furthermore, it can be expected that, in combination with slight imperfections of positioning and B 0 shimming, the decreased CSD and the sharper pulse profile exhibit a positive effect on the measurement precision.
Nevertheless, even if all the previously mentioned conditions are met and the CRLBs are minimized for a certain in vivo measurement, the CRLBs still fail to provide information on the precision of metabolite concentrations measured with a given experimental setup. 31 The present work, therefore, aims to assess the repeatability and the reproducibility of in vivo metabolite concentrations obtained by SPECIAL MRS at 7 T, as well as the impact of different adiabatic inversion pulses within SPECIAL thereupon. The SPECIAL sequence using a nonadiabatic refocusing pulse was chosen here for its ability to yield a very short TE (≤ 10 ms). Results are then used to derive the measurement precision for concentrations of 13 metabolites in the human brain, unlike CRLBs, accounting not only for the lowest possible bound of the SDs of the model fit but also for instrumental and operational influences on the spectral data. Moreover, the minimal detectable change (MDC) 32 of 13 metabolites in vivo for the used experimental setup is determined.

| Inversion pulse implementation
An HS, GOIA, and WURST pulse was designed 20,27,28 in MATLAB (The MathWorks, Natick, MA) to achieve identical pulse duration, inversion slice thickness, and pulse energy. Bloch simulations with varying B + 1 and B 0 were performed. 33 The three different pulses were then incorporated into the SPECIAL sequence, resulting in three different SPECIAL versions with otherwise identical scan parameters and timings. These variants will be referred to as HS-SPECIAL, GOIA-SPECIAL, and WURST-SPECIAL, respectively, throughout this paper. The resulting pulse sequence scheme is shown in Figure 1A.

| MR protocol and data acquisition
All experiments were performed on a 7T scanner (MAGNETOM 7T, Siemens Healthineers, Erlangen, Germany) using a head coil with a birdcage transmitter and 32 receive channels (NOVA Medical Inc., Wilmington, USA). Phantom measurements were performed to assess the performance of the three SPECIAL variants without biological or physiological noise, as described in Supporting Information Figure S1 and Table S1.

| In vivo experiments
Nine healthy volunteers (aged 39 ± 13, 1:7:1 male:female:nonbinary) were scanned after giving written informed consent according to local ethical regulations, to assess the impact of the different adiabatic pulses on the variance of in vivo measurements. To this end, an unbalanced nested study design, as shown in Figure 1C, was chosen. Each volunteer was scanned in two sessions on two different days approximately one week apart (6 to 8 days). Both sessions consisted of two measurements (M1-M4) of SPECIAL acquisitions with the HS, GOIA, and WURST adiabatic inversion pulses, each. During session one, the volunteer was repositioned between M1 and M2, whereas in session two, M3 and M4 were acquired without repositioning in between. As the ethical regulations specify a maximum time of 90 minutes per scan block, the repeatability measurements in session two were split into two scan blocks with two SPECIAL versions, e.g., HS-SPECIAL and GOIA-SPECIAL, investigated in the first scan block, and the third SPECIAL version, e.g., WURST-SPECIAL, which was examined in the second one. Note that the subject was not repositioned between the repeatability measurements of the same SPECIAL version. A schematic overview of the different measurements, scan blocks, and sessions is shown in Figure 1C,D. The order of the SPECIAL versions within the different scan blocks was cyclically permuted among the different volunteers to ensure that the performance of the pulses is not biased due to the acquisition time point within the protocol, such as due to increased likelihood of volunteer movement toward the end of each scan block. With this design, it was possible to distinguish among three scenarios: (1) the repeatability 34 (R 0 ), which refers to two consecutive measurements without repositioning the subject; (2) the reproducibility 34 between two measurements performed on the same day, including repositioning and new calibration (R 1,M for minutes in-between); and (3) the reproducibility between two measurements approximately one week apart (R 1,W for week in-between). If the reproducibility scenarios could not be assessed individually, the index was extended by a 'c' for combined.
The protocol within each scan block was as follows: The MP2RAGE 35 images acquired with the following parameters were used for voxel positioning and tissue segmentation: TE = 2.51 ms, TR = 5000 ms, TI = 900 ms, isotropic voxel size = 0.75 mm. The volume of interest (VOI) was placed in the posterior cingulate cortex (PCC) in the middle between both hemispheres and was angulated in the sagittal plane so that its lower edge coincided with the virtual line between the corpus callosum and the outer end of the parieto-occipital fissure, as illustrated in Figure 1B. A voxel-based B + 1 adjustment was performed by varying the pulse voltage and fitting the resulting amplitudes of the water peak to determine the voltage required for a 90° flip angle. First-and second-order B 0 shims settings were optimized for the VOI using a B 0 map (3 mm isotropic resolution, TE1 = 6.02 ms, TE2 = 7.04 ms, TR = 620 ms) and a MATLAB-based shim tool. 36,37 Single-voxel spectra were then acquired using the respective SPECIAL versions with the following scan parameters: VOI = (20 mm) 3 , TE = 9 ms, TR = 6500 ms, number of averages = 64, spectral width = 4 kHz, delta frequency = −2.3 ppm, and vector size = 2048. To localize the VOI along two dimensions, an asymmetric RF excitation 38 and a Mao refocusing pulse 39 (T exc = 1.28 ms, BW exc = 5.3 kHz; T ref = 3.2 ms, BW ref = 1.8 kHz) were used to form a spin-echo. An interleaved water-suppression (WS) and 3D outer-volume saturation (OVS) scheme 40 were utilized. The OVS bands were individually adapted for each volunteer by placing them 5 mm adjacent to the VOI, covering the rest of the volunteer's head. A reference measurement with four averages without water suppression was performed after every metabolite spectrum acquisition using the respective SPECIAL version.

| Spectral postprocessing
Spectra were post-processed with an in-house MATLAB tool, including the summation of the even and odd transient pairs, which were acquired with a 180° phase shift in the receive phase, to obtain the full localization. Then, weighted and phase-corrected coil-element combination, 41 frequency correction based on the Nacetylaspartate (NAA) peak at approximately 2 ppm, and averaging were performed. Spectral quality was assessed for each SPECIAL version for all subjects, both qualitatively by visual inspection, and quantitatively by calculating the width and the SNR of the unsuppressed water line.

| Metabolite quantification
The data sets were quantitatively analyzed using LCModel 42 in the range of 0.2 to 4.2 ppm. A basis set for LCModel fitting containing signatures of alanine, aspartate (Asp), ascorbate, the sum of glycerophosphocholine and phosphocholine (total choline -tCho), the sum of creatine and phosphocreatine (total creatine -tCr), γ-aminobutyric acid (GABA), glucose, glutamine (Gln), glutamate (Glu), glutathione (GSH), myo-inositol (Ins), lactate (Lac), NAA, N-acetylaspartylglutamate (NAAG), phosphoethanolamine (PE), scyllo-inositol, and taurine (Tau) was simulated in Vespa. 43 The macromolecules were modeled in one basis function derived in-house from metabolite-nulled in vivo acquisitions in healthy volunteers using the SPECIAL sequence, as recommended by a recent consensus paper. 44 The water signal was used as an internal standard to calculate concentration values.

| Segmentation and tissue fraction correction
To compare the measured metabolite concentrations within each subject and among all subjects, the MP2RAGE images for every session and every volunteer were segmented into cerebrospinal fluid (CSF), gray matter (GM), and white matter (WM) with SPM12. 45 Then, an in-housewritten python 46 tool was used to determine the GM, WM, and CSF fraction for the voxel. The LCModel output concentration was corrected (c * i,j,m ) to take the volunteerspecific and session-specific CSF fraction as well as relaxation processes 47 into account: where c i,j,m describes the concentration of the metabolite m for volunteer i and session j; T 1∕2,m indicates the metabolitespecific relaxation times; and f CSF,i,j indicates the CSF fraction for the volunteer i and session j. The correction of the F I G U R E 1 A, Pulse sequence diagram of the spin-echo, full-intensity acquired localized (SPECIAL) sequence with different adiabatic inversion pulses: hyperbolic secant (HS) 20 (blue), gradient offset independent adiabaticity (GOIA) 19 (orange), and wideband, uniform-rate, smooth truncation (WURST) 28 (green). B, Exemplary voxel position in the posterior cingulate cortex (PCC). The turquoise line indicates the connection between the lower edge of the corpus callosum and the outer edge of the parieto-occipital fissure. C, Unbalanced nested study design performed for every pulse sequence variant. The subject-wise between-session reproducibility (R 1,W ; M1 and M3), the betweenpositioning reproducibility (R 1,M ; M1 and M2), and the repeatability (R 0 ; M3 and M4) were assessed. D, Scan scheme (exemplary for the first three volunteers): On the first day in the first session (M1), SPECIAL with HS, GOIA, and WURST was measured. After repositioning the volunteer (M2), the sequences were measured in the same order as in M1. On the second day (i.e., one week later), in the first session, HS-SPECIAL and GOIA-SPECIAL were measured twice without repositioning (M3 and M4). Then, the SPECIAL sequence using the WURST pulse was measured twice without repositioning. Note that the repeatability measurements were split into two scan blocks due to time restrictions in our ethical regulations. E, Flow chart of the analysis steps. The blue boxes refer to the measured or post-processed data, while the purple boxes indicate the resulting data, which were then used for different statistical analyses (indicated by the orange boxes). The performed processing steps are listed in the green boxes. Abbreviations: BA, Bland-Altman; CRLB, Cramér-Rao lower bound; GM, gray matter; OVS, outer-volume suppression; VOI, volume of interest; WM, white matter; WS, water suppression relaxation times assumes that most of the voxel contains GM. Therefore, only the GM relaxation times from the investigated brain region are considered. However, GM and WM tissue fractions are considered to take T 2 effects of tissue water into account, which was used to obtain the attenuation factor for the water scaling performed in LCModel. T 2 relaxation times of 45 ms and 37 ms for GM and WM were used, respectively, 48 leading to a water attenuation factor of 0.8111. Voxel-position reproducibility was assessed by calculating the CSF, GM, and WM fraction for all four sessions and by determining the intra-subject coefficient of variation (CV) for the CSF fraction. In addition, an in-house-written python tool was used to determine the voxel overlap between two sessions by co-registration of two MP2RAGE images. A flow chart describing all postprocessing steps, the resulting data, as well as the resulting analysis can be found in Figure 1E.

| Statistical analysis
Intra-subject CVs were calculated for each inversion RF pulse type, subject, and metabolite 49 to assess the testretest reproducibility quantitively. Statistical differences of the paired mean for each subject between the three SPECIAL variants were determined for metabolite concentrations, CRLBs, and CVs by a non-parametric Wilcoxon signed-rank test. 50 Due to Bonferroni correction, the significance level of p < .05 was shifted to p < .001.
A summary and short explanation of the differently obtained SDs, which will be explained subsequently, can be found in Table 1.
Bland-Altman plots 51 of the spectral shape were calculated as follows: First, the real part of the compared spectra was normalized to the intensity of the NAA peak. Then, to generate the y-value of one data point, BA i,y , for subject i in the Bland-Altman plots, the absolute of the

Index symbol Explanation
Method (upper index) S SD of the Bland-Altman plots derived by the analysis of the spectral shape as described by Equations (2) and (3) BA SD of the Bland-Altman plots for the corrected metabolite concentrations, c * m REML SD obtained by REML analysis from the corrected metabolite concentrations, c * m Scenario (lower index) R 0 Repeatability scenario: investigation of two measurements performed consecutively without any repositioning or recalibration within one scan block R 1,M Reproducibility scenario "minutes between measurements": investigation of two measurements performed with repositioning and recalibration within one session R 1,W Reproducibility scenario "week between measurements": investigation of two measurements performed in two sessions approximately one week apart , follow the same pattern, with the upper index indicating the method used to determine the respective SD, and the lower index describing the scenario, for which the respective SD was calculated. The different methods and scenarios can be found in this table. Note, that 'c' in the scenario description indicates that the variance components of the scenario are combined and not considered individually. For the Bland-Altman plots of both the spectral shape and the concentrations, it is not possible to assess the components separately, only combined.
T A B L E 1 Explanation of the different symbols for the different SDs, , which are determined throughout this work. spectral intensity in the real part of the compared spectra, |x (f ) i,Ma |, was subtracted for each frequency f , and the integral within the frequency range from f min = 0.8 ppm to f max = 4.2 ppm was taken, as follows: The x-value BA i,x was calculated as the integral of the absolute of the averaged real parts of the compared spectra, , over the same frequency range, as follows: The SDs S

R1,Mc
, and S R1,Wc for the points in the Bland-Altman plots give a measure for the precision of the spectral shape within the scenarios R 0 , R 1,Mc , and R 1,Wc , respectively. The reproducibility SDs S R1,Mc and S

R1,Wc
could only be derived as a combined effect, however, as the data quality did not permit a robust separation of these contributions. The SDs for the metabolite concentrations for each scenario are also determined by a Bland-Altman analysis ( BA ), in which the difference between the concentrations obtained from two measurements was plotted over the arithmetic mean of them. Furthermore, to quantify the measurement precision of individual metabolite concentrations obtained with the different SPECIAL variants, variance components of the metabolite concentrations were extracted separately for each metabolite/pulse combination using a restricted maximum likelihood estimation (REML) 52 analysis, carried out in R version 3.6.3, 53 using the nlme package. 54 The statistical model used for the variance component extraction was: where c * m is the relaxation-corrected concentration of the metabolite m; m is the general mean of the concentration of metabolite m for each inversion RF pulse type; S is the subject effect; P is the effect of the particular inversion pulse (HS, GOIA, or WURST); the three REML Rx terms are random between-session effects, each with zero mean and variance week. Pulse and subject are considered as fixed effects for this analysis; it is assumed that subject and pulse effect comparisons are usually the target measure of a study, whereas the variances only need to be taken into account while interpreting the certainty of the results. All variances were assumed constant for each metabolite/pulse combination; the REML fit additionally assumed normality for the random effects and restricted the calculated SDs to the range of natural numbers. Because each variance estimate in a nested design has some impact on the next level 'up', the analysis and interpretation will be restricted to the within-group SDs of the repeatability REML

| Inversion pulse implementation
Details of the different pulse parameters can be found in Table 2. The magnitudes, phases, and slice-selective gradients of the three pulses are shown in Figure 2A-C. The pulse duration, inversion slice thickness, and total pulse power were fixed and chosen such that the following conditions were met for all three pulses: (1) fulfilled adiabatic condition with reasonable safety margin while not reaching peak power limitations (shown in Figure 2H-J); and (2) a minimal inversion BW of 1.2 kHz.
Bloch simulations ( Figure 2D-J) demonstrate that the BW of the gradient-modulated pulses is about 10 times higher than the BW of the HS pulse if pulse duration, FWHM of the slice thickness, and energy are fixed. This leads to a substantial reduction in CSD ( Figure 2E-G), whereas the maximum RF amplitude is reduced by 33 % (Figure 2A). Scaled with the transmitter reference voltage calibrated in the in vivo measurements, this resulted in a difference in peak voltage of about 100 V between HS and gradient-modulated pulses, whereas the CSD of the gradient-modulated pulses is reduced by 90 % compared with HS (Table 2).

| In vivo measurements
The voxel overlap between different measurements across all volunteers and all six possibilities was greater than 81 % in all cases, as indicated in Table 3 and Figure 3. The intra-subject CV for the CSF fraction was 6.6 ± 4.9 %. Table 2 also lists the spectral quality parameters averaged over all volunteers and all 36 spectra for each SPECIAL version. Width and SNR of the water peak did not differ significantly among the three different inversion pulses in SPECIAL.
The spectral quality of spectra obtained with all three SPECIAL versions was high, and hardly any differences were discernable by visual inspection ( Figure 4A). The Bland-Altman plots of replicate differences against the mean value ( Figure 4B) revealed that in the R 0 scenario (repeatability) the gradient-modulated pulses GOIA and WURST gave a smaller dispersion of the spectral-shape differences compared to the HS pulse. This difference in dispersion vanishes, however, when the reproducibility scenarios R 1,Mc and R 1,Wc were considered: HS-SPECIAL and WURST-SPECIAL were on a par, with GOIA-SPECIAL trailing slightly behind. Comparing the different scenarios, the order of the SDs was S R0 < S R1,Mc < S

R1,Wc
. The Bland-Altman plots of the concentrations are depicted in Supporting Information Figures S2-S15. Similar concentration variations were obtained for most of the quantified metabolites using the different SPECIAL versions, as shown in Figure 5A,B. However, the results from GOIA-SPECIAL and WURST-SPECIAL exhibited a higher concentration for tCr and Glu (both p < .001). For HS-SPECIAL, 14 individual concentrations (of the 468 values: 13 metabolites × 4 sessions × 9 volunteers) had to be discarded because they could not be quantified by LCModel, but only five and four metabolite concentrations for GOIA and WURST-SPECIAL, respectively. The CRLBs of Asp, NAA, and tCr were significantly higher for the measurement with HS-SPECIAL compared to the sequence variants using GOIA and WURST pulses, as shown in Figure 5B. There are no significant differences, both in concentration and CRLBs, between GOIA-SPECIAL and WURST-SPECIAL. HS-SPECIAL exhibited the highest averaged intra-subject CV for most of the metabolites, except for GABA and Lac, as displayed in Figure 5C. Correlation plots for the repeatability scenario for Glu, NAA, tCho, and tCr can be found in the Supporting Information Figure S16.

| Precision evaluation
The pulse-wise REML for every metabolite for both the R 0 as well as the R 1,Wc scenario are depicted in Figure 6A. Neither the SDs obtained from the repeatability measurements nor from the reproducibility measurements showed a consistent trend, favoring one of the investigated pulses. The individual results of the REML analysis can be found in Supporting Table S2. The MDCs, which were determined for every metabolite and the given setup, are depicted in Table 4 and Figure 6B.
The correlation between CRLBs and REML , between BA and REML , and between CRLBs and BA , averaged over all pulses, are shown in Figure 6C-E. The lowest R 2 = 0.809 is found for the R 1,Wc correlation between CRLBs and BA , whereas the highest R 2 = 0.99 is found for the R 0 correlation between BA

| DISCUSSION
In this work, an estimate of measurement precision for the given setup of in vivo metabolite concentrations for the repeatability and the reproducibility was obtained by a REML analysis and a Bland-Altmann analysis. These results were then compared with the commonly used CRLBs for the SPECIAL sequence at 7 T. It was shown that CRLBs depict only a fraction of the measurement precision, whereas the full measurement precision can be obtained by repeated measurements and statistical modeling. Furthermore, the impact of three adiabatic inversion pulses within the SPECIAL sequence, namely, the conventionally applied HS pulse and two gradient-modulated pulses, GOIA, and WURST, on the repeatability and reproducibility were assessed. The gradient-modulated pulses require a substantially lower peak RF amplitude to fulfill the adiabatic condition 19,29 than an HS pulse of the same duration, FWHM F I G U R E 2 Pulse scaling results (A-C) and results of the Bloch simulations (D-J). The pulse duration and pulse energy were fixed. A, RF amplitude, B, Phase, and C, Gradients of the three adiabatic inversion pulses. D, The inversion profiles created by the HS (blue), GOIA (orange), and WURST (green) pulses, respectively, have the same FWHM. The edges of the HS pulse profile are less defined compared with GOIA and WURST, and the HS profile shows small oscillations that are absent for GOIA and WURST. E-G, Chemical shift displacement (CSD), i.e., the position shift of the voxel, for the three pulses. The largest CSD is observed for the HS pulse. The color bar indicates the percentage of the inverted area. H-J, Inversion efficiency as a function of B + 1 normalized to the nominal B + 1 , which indicates the maximum amplitude of the pulse. Hence, here the value 1 on the y-axis does depict the pulses as they were applied within this study. The red line indicates 100 % inversion efficiency, i.e., the adiabatic condition is met for the whole voxel region. For all pulses, B 1,max was chosen such that approximately half of the chosen amplitude would be sufficient to fulfill the adiabatic condition at resonance. This safety margin ensures that the adiabatic condition is met even in a region of high B + 1 inhomogeneity, and that the performance does not substantially decrease under off-resonance conditions. of the inversion profile, and total pulse power. 28 This advantage of gradient-modulated pulses can be exploited specifically in applications in which the peak RF power or specific absorption rate are limiting factors. Nevertheless, the higher sensitivity to ΔB 0 and gradient imperfections, 57 as well as the gradient strength and slew rate limitations of the used system, need to be considered during the planning of an application study using gradient-modulated pulses, to avoid the nominal voxel size not being achieved.
The measured in vivo concentrations of the quantified metabolites are well in line with literature values from the same region 49 and are very similar for all three SPECIAL variants for most metabolites. However, the concentrations of Glu and tCr obtained with both gradient-modulated SPECIAL versions were significantly higher compared with the obtained concentrations from the HS-SPECIAL measurement, and CRLBs were significantly lower for Asp, NAA, and tCr when measured with GOIA-SPECIAL or WURST-SPECIAL compared to HS-SPECIAL. In phantom measurements, however, concentrations, CRLBs, and CVs were approximately the same for all three SPECIAL versions. Differences in the in vivo measurements were tentatively assigned to the substantially reduced CSD and sharper profile achieved by the gradient-modulated pulses compared to the HS pulse in combination with spatial variations of tissue distributions in the brain. The effect on the other metabolites is likely of a similar nature but not identifiable as unambiguous, due to the overlap of several signals in the respective frequency ranges. The smaller intra-subject CVs of metabolite concentrations, as well as the decreased number of metabolite concentrations that were not detected by LCModel, indicate that the reduced CSD and the sharper pulse profile of the inversion in SPECIAL using either a GOIA or a WURST pulse, have a positive effect on the robustness of the LCModel quantification compared to an HS pulse, especially for low-concentration metabolites. It is expected that the same effect of improved fit-robustness could be observed, if the CSD of the excitation and refocusing pulse were also reduced.
Note that due to the small sample size, which is not guaranteed to follow a normal distribution, 58 the statistical significance was assessed by a non-parametric statistical test, namely, the Wilcoxon signed-rank sum test. This approach provides more conservative estimates than the parametric pendant, the paired t-test.
The assessment of reproducibility in MRS has received increased attention lately, such as the comparability of different scanners and sites or the test-retest reproducibility evaluated with CVs. 49,[59][60][61] However, the unbalanced nested study design used in this study extends this concept to allow the estimation of realistic SDs of in vivo metabolite concentrations through Bland-Altman analysis and REML analysis for the first time.  Although the metabolite concentrations obtained after measurement, post-processing, and absolute quantification are certainly the clinically relevant results and hence, the precision of these values are important for clinical assessments, here the Bland-Altman analysis of the spectral shape was introduced to obtain a complementary measure for reproducibility of in vivo MRS. Since all spectral differences contribute to the final data point in the Bland-Altman plot, and it is not influenced by inaccuracies that might be inherent to the used quantification model, this analysis provides additional information on the reproducibility independent of the quantification pipeline. Although the visually assessed spectral quality, the water linewidth, and its SNR were similar for all three SPECIAL variants (indicating a similar performance of all investigated pulses), the Bland-Altman analysis of the spectral shape reveals subtle differences among the different SPECIAL variants in vivo. Thus, it was demonstrated that the use of the GOIA and WURST pulses for adiabatic inversion in SPECIAL did result in higher repeatability of F I G U R E 4 A, Exemplary spectra of M1 acquired with HS-SPECIAL (blue), GOIA-SPECIAL (orange), and WURST-SPECIAL (green). B, Bland-Altman plots for HS (blue), GOIA (orange), and WURST (green) for the scenario R 0 (top), R 1,Mc (middle), and R 1,Wc (bottom). Each point in the Bland-Altman plots is generated using Equations (2) and (3). Note that for R 1,W , the first measurements of the respective session were used. The red line indicates the arithmetic mean, whereas the gray ones indicate 1.96 ± SD. The plots show the difference between replicate observations within a session without repositioning R 0 , within a session with repositioning R 1,Mc , or between the first observations in sessions one week apart R 1,Wc , plotted against the mean of the two observations. the spectral shape. However, inaccuracies in VOI positioning, potential differences in calibration, and other effects appear to outweigh this benefit. The observed increase of S R1,Wc compared with S R1,Mc might have several reasons: (1) some volunteers being scanned by different operators in the first and second session, which might have influenced the combined reproducibility of voxel positioning; (2) dayto-day variation of the scanner performance; and (3) intrasubject physiological changes between the two sessions. It is usually aimed to minimize the effects caused by the first two points. Especially effects on the overall variance of the metabolite concentration, caused by differences in voxel positioning, might be mitigated by the use of automated voxel positioning routines, as described by Dou et al. 62 In this study, it was decided to perform a manual voxel positioning with reference to anatomical landmarks, to reflect the workflow in many clinical studies. 63 Although operators within this study carefully aimed to position the voxel F I G U R E 5 Metabolite quantification parameters for all three pulses. A, Absolute metabolite concentrations c * m averaged over all volunteers and measurements. B, Absolute CRLBs averaged over all volunteers and measurements. C, Intra-subject coefficients of variance (CVs) averaged across all subjects depicted. The error bars indicate the ±SD of the obtained concentrations, CRLBs, and CVs across all subjects. It should be noted that SDs of CVs with low degrees of freedom are not expected to be normally distributed. Error bars here serve only to approximately indicate the width of the distribution. Note also that the CVs were first calculated for each subject with respect to the four sessions and then averaged over all volunteers. 36 data sets (9 subjects × 4 sessions) were considered for each metabolite and each of the three pulses unless specific data sets could not be quantified by LCModel. The numbers above the mean metabolite concentrations indicate the number of measurements that were discarded, as they could not be quantified. The asterisks indicate a significant difference between HS and GOIA or between HS and WURST. Abbreviations: Asp, aspartate; GABA, γ-aminobutyric acid; Gln, glutamine; Glu, glutamate; GSH, glutathione; Ins, myo-inositol; Lac, lactate; NAA, N-acetylaspartate; NAAG, N-acetylaspartylglutamate; PE, phosphoethanolamine; Tau, taurine; tCho, total choline; tCr, total creatine as reproducibly as possible, a mean voxel overlap of only 85.2 % could be achieved, which is lower than was demonstrated to be feasible with automated voxel positioning routines, 62 and which is expected to negatively affect the precision obtained in this study. Nevertheless, this work outlines a framework that allows quantifying the influence of measures taken to reduce the measurement precision, such as the mentioned automated voxel position routines. Effects caused by actual physiological changes, on the other hand, may represent the answer to the clinical research question at hand, such as in longitudinal studies. Furthermore, as MRS progresses toward broader clinical use, the need to determine 'normal' ranges of these physiological variations arises, as well as deviations thereof, to be of use as a diagnostic tool on a single-subject basis.
It should be noted that poor B 0 shimming would lead to a larger linewidth of the metabolites, which would hamper correct quantification, as the overlap of adjacent peaks would be increased. This would result in larger CRLBs, and likely in larger SD of the calculated metabolite concentrations. However, if poor B 0 shimming is reproducible, the Bland-Altman analysis of the spectral shape would not be expected to change substantially.
It is worth noting when looking at the Bland-Altman analysis of repeatability and reproducibility of the calculated metabolite concentrations, that while BA R0 < BA

R1,Wc
, the effect is not as pronounced as found in the analysis of the spectral shape, and that no consistent trend can be observed between BA R1,Mc and BA

R1,Wc
. This indicates that inaccuracies in the used fitting model and differences in F I G U R E 6 A, Pulse-wise relative SDs (= CV) averaged over the subjects, obtained by the restricted maximum likelihood estimation (REML) analysis, for repeatability (R 0 , upper plot) and the combined reproducibility scenario (R 1,Wc , lower plot) of all quantified metabolites. No pulse substantially outperformed another one; hence, data of all three pulses were subsequently pooled to strengthen the statistical analysis with regard to REML and minimal detectable changes (MDCs). B, Mean concentrations (purple horizontal bars), ± REML R1,Wc (black vertical bars), and ± MDCs (indicated by the gray box) of metabolites. Correlation plots between relative CRLBs and REML (C), BA and REML (D), and relative CRLBs and BA (E) averaged over all three pulses and all volunteers. The R 0 scenario is denoted in black, while the R 1,Wc scenario is denoted in purple. Each point represents one metabolite: 1, Asp; 2, GABA; 3, Gln; 4, Glu; 5, GSH; 6, Ins; 7, Lac; 8, NAA; 9, NAAG; 10, tCho; 11, tCr; 12, Tau; 13, PE fit quality 'mask' the differences between the two investigated reproducibility scenarios. A similar effect can be observed in the results from the REML analysis. The REML fit prevents negative estimates, as a negative variance would not make physical sense. This, in combination with the modest degrees of freedom and relatively large within-group effects, leads to several variance contributions of both reproducibility scenarios to be estimated as nominally zero -or very close to. Note that either REML R1,M or REML R1,W of the different metabolites is estimated as zero, but never both at the same time. This does not mean that the group means from both scenarios-either R 0 and R 1,Mc if REML R1,M = 0, or R 1,Mc and R 1,Wc if REML R1,W = 0are identical; it only reflects that they are closer together than expected from the within-group variance. Hence, for most of the metabolites, the two examined reproducibility scenarios and the effect of their respective variance contribution on the measurement precision cannot be clearly disentangled-probably due to the small sample size of only 9 volunteers.
Although the results from the REML analysis exhibit a small tendency toward lower SDs in data obtained with gradient-modulated inversion pulses, these differences are neither statistically significant nor generally consistent. Hence, to strengthen the investigation on the MDCs for different metabolites, the SDs for the repeatability REML R0 and the combined reproducibility REML R1,Wc of the three investigated pulses were pooled. The variances derived as REML

R1,Wc
then allowed for the calculation of the MDC for the given setup.
As the REML analysis contains a multi-parameter fit model with multiple contributions to the total variance, it properly accounts for the unbalanced nested study design and weighs incomplete data sets without discarding the information completely. Therefore, the SDs obtained by REML analysis are considered to be a more reliable precision estimate than the SDs obtained by Bland-Altman analysis. Nevertheless, the Bland-Altman analysis provides a valuable consistency check with similar trends observed, aiding in the interpretation of the rather complex REML analysis results. Furthermore, these differences are expected to decrease for a larger sample size. Comparison of these results with the CRLBs reveals that CRLBs only account for a fraction of the measurement variance. This is not surprising, however, as CRLBs are the "lowest possible standard deviations of all unbiased model parameter estimates obtained from the data," 12 which are limited to the variance contributions resulting from the noise level, the overlap of different peaks, and the metabolite fitting model 64 ; they do not reflect the SD of data that would be obtained by repeated measurements. 31 Hence, it should be noted that the framework presented here is not aiming to replace the use of CRLBs-as repeated measurements to obtain SDs of metabolite concentrations will remain impossible in most clinical settings-but provides complementary information for a better understanding of the precision of metabolite concentrations obtained by MRS. Nevertheless, strong correlations are found among all three measures for the different metabolites. This finding strengthens and underpins approaches using the CRLBs as weights in statistical analysis, as suggested by Miller et al. 65 Although the obtained SDs and MDCs are rather large, the correlation between the two repeatability acquisitions across the subjects (e.g., Supporting Information Figure  S16) demonstrates the general capability of the used 1 H MRS method to quantify concentration differences between individual subjects.
The chosen SPECIAL localization technique required an add-subtract scheme to obtain full spatial localization. While this allows for very short TEs and at the same time retains the maximum attainable echo amplitude at a given TE, this renders the obtained spectra more susceptible to motion artifacts 66 than, for example, sLASER, which achieves full localization for each transient. This may negatively impact the precision that can be achieved compared to single-shot localization techniques, especially when looking at patient cohorts instead of healthy volunteers. Furthermore, there is an ongoing debate in the MRS community regarding the tradeoff between reduced CSD achievable with adiabatic refocusing and minimal TE, and which one is favorable. On the one hand, some experts argue that the reduced CSD of sSPECIAL 66 is expected to result in increased reliability of obtained metabolite concentrations, despite the longer TE. 67 On the other hand, it was shown that short TEs are especially beneficial for the T A B L E 4 Minimal detectable change (MDC) for each metabolite, derived by the REML analysis and Equations (5)  reliability of the determination of J-coupled metabolites. 14 While the current study did not set out to answer which one of these influences is bigger on the precision, it provides a framework that will allow future studies to compare the precision of metabolite concentrations obtained with different CSDs, TEs, and numbers of transients required for full localization. The validity of the values derived in this study is certainly limited to the specific setup and methodology used here. The numbers will likely be different for other brain regions, MR scanners, MRS sequences, sequence parameters, B 0 and B + 1 calibrations, post-processing pipelines, and fitting models. Nonetheless, this work presents a generally applicable framework to distinguish different contributions to the total measurement variance and to investigate the efficacy of specific measures aiming to reduce individual variance contributions systematically. Finally, this provides the groundwork for a broader implementation of MRS into clinical applications, as inter-subject differences in comparisons of cohorts or intra-subject differences in longitudinal studies can only be reliably distinguished from statistical fluctuations if they are larger than the MDC.

| CONCLUSIONS
This work presents a methodology to estimate the measurement precision of in vivo metabolite concentrations obtained by MRS, and consequently the MDCs for 13 metabolite concentrations in vivo for the used setup. Furthermore, a study design and statistical framework are introduced to disentangle different components of the measurement precision that are easily transferrable to a different setup and sequence parameters. This allows us to systematically investigate the efficacy of measures undertaken to reduce the measurement precision, as was demonstrated for the use case of three different inversion pulses.

SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.

FIGURE S1
Comparison of (A) obtained metabolite concentrations, (B) CRLBs, and (C) CVs for all metabolites in the phantom using the SPECIAL sequence with either HS (blue), GOIA (orange), or WURST (green) as adiabatic inversion pulses. The error bars indicate the SD FIGURE S2-S15 Bland-Altman plots for the concentration differences over the three pulses for the R 0 , the R 1,M , and the R 1,W scenario (upper, middle, and lower row, respectively). The blue points indicate the SPECIAL sequence measured with the HS pulse, while the orange ones indicate GOIA and the green ones WURST. Note that for the R 1,W scenario, the first measurements of both days were used FIGURE S16 Correlation plots for the R 0 scenario over all volunteers and RF inversion pulses for (A) tCr, (B) Glu, (C) tCho, and (D) NAA. The red line indicates the fitted curve with the R² value and the slope, while the gray lines indicate the ±SD of the fitted curve TABLE S1 Water linewidth, necessary transmitter voltage, and SNR water for the three pulses