ProFit‐1D—A 1D fitting software and open‐source validation data sets

Accurate and precise MRS fitting is crucial for metabolite concentration quantification of 1H‐MRS spectra. LCModel, a spectral fitting software, has shown to have certain limitations to perform advanced spectral fitting by previous literature. Herein, we propose an open‐source spectral fitting algorithm with adaptive spectral baseline determination and more complex cost functions.

Testing the fit accuracy directly with in vivo spectra is impossible, as the actual concentrations of the metabolites at the time of acquisition, and hence the "ground truth," are not known. Therefore, the software's accuracy was tested using simulated spectra mimicking in vivo conditions and respective data quality, with known concentrations. To ensure accurate fitting of in vivo spectra, all possible perturbations were simulated and systematically tested.
The underlying spline baseline contributions originating from experimental imperfections have been shown to affect spectral quantification. 26,27 In this work, the estimation of the necessary spline smoothness is done through the previously published method by Wilson. 24 This method was then systematically evaluated for simulated spectra with baseline contributions like data acquired in vivo to ensure accurate metabolite concentration estimates.
Fitting precision, on the other hand, was evaluated using the described simulated spectra as well as spectra acquired in vivo. Using different subsets of averages of acquired data, test-retest spectra were created to assess the in vivo precision. In this manuscript, we provide the entire data set that should allow for systematic evaluations of other fitting software.
Complementary information can help in improving fitting accuracy; therefore, different cost functions were investigated, which combined both the frequency and the time-domain information of the spectra. Because quantification of in vivo 1 H-MRS spectra can be highly impacted by noise, different spectral filtering options in the preprocessing for fitting were also investigated.

| Magnetic resonance spectroscopy model
The MRS spectra can be described as a linear combination of the spectral patterns from contributing metabolites. If the chemical shifts and coupling constants of all spins related to a number of K metabolites 28,29 are known, a basis set k can be simulated for each metabolite. These basis sets are simulated with sequence parameters used in the actual MRS acquisition, such as pulse timings, pulse shapes, the acquisition bandwidth, and the number of acquired spectral points. The resulting simulated basis set k will reflect the complex spectral pattern of a spin system of a metabolite, particularly with matching J-evolution at longer TEs.

K E Y W O R D S
MR spectroscopy, quantification, spectral fitting, spline baselines Generally, a spectrum ŷ can be defined as the FID in the time domain with the following equation: where each metabolite k contributes with a concentration c k to the full spectrum and is characterized by its specific Lorentzian line-shape parameter e,k , which depends on the metabolite transverse relaxation time, T 2,k e,k = 1 . Moreover, the spins experience microscopic and macroscopic magneticsusceptibility effects, which are generally described by a globally applied Gaussian line-broadening factor g . This factor g can also partially compensate for minor differences between . The resonance frequency of each metabolite might be shifted by a factor k = local,k + global , where local,k originates from the metabolite's environment from factors such as pH (hydrogen ions), other ions, or slight temperature differences, whereas the entire spectrum might be shifted by a factor global due to subject motion or due to some postprocessing step of the spectra after the acquisition. The MRS sequence timings lead to the zeroth-order and first-order phase ( 0 , 1 ) of the spectrum. Finally, the terms t and ppm correspond to the acquisition time and ppm vectors; O ref is the acquisition frequency; and ⊗ is convolution in the time domain (applied as a pointwise multiplication in the frequency domain). Bold notations (c, , local , and e ) are used to represent a vector over all metabolites.
In vivo spectra with short TEs have significant contributions from macromolecules (MMs), which are broad peaks attributed primarily to amino acids in peptides and proteins, underlying the sharper metabolite peaks. To achieve accurate spectral fitting of the metabolite resonances, an acquired or fully simulated MM spectral model should be added to the spectral basis set. 27,30,31,32,33 Because the used MM spectrum was acquired in vivo, it automatically includes both the Lorentzian and Gaussian broadening factors; therefore, for the MM basis set vector, the e,k and g parameters should be set to zero.
The MRS spectra originate from metabolites diffusing in water. Because water is generally not of interest in MRS, it is suppressed either through the acquisition sequence or during postprocessing. However, residual water signals might still be present in the spectra. Paired with possible outer-volume lipid contributions, these residual contaminations are characterized using a spline baseline underlying the metabolite spectrum in the frequency domain (baseline).
Finally, experimentally acquired spectra and especially spectra from in vivo measurements contain noise. Therefore, the ideal time-domain spectrum ŷ is described under in vivo conditions as y; it includes the aforementioned spectral basis sets of all metabolites of interest and MM considering actual sequence parameters, as well as a spline baseline model to model experimental imperfections and noise. In the frequency domain (Ŷ: = fft(ŷ)) it is described as follows: This spectral model is used in the ProFit-1D spectral fitting algorithm to model in vivo spectra precisely. In the algorithm, the noise, as well as any feature of the spectrum that the comprehensive spectral model does not reflect, is represented by the fit residual R (see Section 2.2.1).
Additionally, the spectral model was also used to simulate test data for the accuracy test of the fit algorithm. In these simulations, the macromolecular signal, a baseline distortion, and noise are added to the linear combination of the simulated spectral pattern of all metabolites to mimic in vivo spectral data quality (see Section 3.1).

| Cost function
A spectral-fitting algorithm aims to adjust all parametric and nonparametric parts of the spectral model to minimize a penalty function. The used penalty function is defined in Equation (5A), but the first components of this function are described in the following equations.
Let the fit residual R be where Y is the measured spectrum and Ŷ is the fitted spectrum. Furthermore, to account for the baseline from Equation (2), a vector of tensor splines B, scaled by the corresponding spline coefficients a, is introduced. For the ProFit-1D algorithm, three different options for the cost function were defined to investigate whether complementary information about the residual and the used basis set improves spectral fitting.
In the first case, the cost function R 1 is defined as the spectral fit residual in the frequency domain corresponding to a predefined frequency range of interest (FOI), typically 0.6 to 4.1 ppm: In the second case, the cost function R 2 is extended by concatenating the frequency domain residual with the fit residual in the time domain (Equation 4B). Only the first points of the FID are included before the signal decays into the noise level. The used truncation point (truncPoint) is defined in Supporting Information Appendix A. (1) (2) Y =Ŷ + noise + baseline In a third case, in addition to these components, an additional weighted spectral residual is concatenated into the cost function R 3 (Equation 4C). To minimize the fit residual only where the metabolite peaks contribute significantly and to avoid scaling based on peak tails or minor contributions (less than 25% of the metabolite maximum absorption peak), the weights for the spectral residual were introduced (see Figure 1B). Please note that each metabolite affects the spectrum Y through the real part of its basis set k only at a few chemical shifts significantly. These basis sets k are line-broadened by the starting values of g and e,k for that given iteration (except for iteration 0 and 1; these values are the fitted values from the previous iteration). These weights are calculated for each iteration, summing up only the active (fitted) metabolites in this fitting iteration ( Figure 1C).
Using any of these three cost functions (R x ), the minimization problem to find the metabolite concentrations and other spectral parameters is expressed as To avoid overfitting through the spline baseline, a regularization to assure smoothness is used, where λ is the regularization parameter, and D is the second-order difference operator expressed as The ProFit-1D algorithm is depicted with the successive optimization steps: first preprocessing steps affecting the spectrum to be fit and the fitting iterations optimizing the linear combination of the basis sets. Depending on the fitting stage, more metabolites are added, and additional local degrees of freedom are allowed while keeping the previously optimized global parameters. For the cost function R 3 a weighting of the spectral residual was introduced. B, Sample metabolite spectra (blue), with the metabolite weighting envelopes are displayed in red and are calculated according to Equation (4C). C, The resulting residual weights are calculated from the sum of the active metabolites at the given fit iteration. High weights values indicate that multiple metabolites of interest have overlapping peaks at that ppm shift. The sample spectrum (gray) is shown just as a reference. Glu, glutamate; GSH, glutathione; mI, myo-inositol; MM, macromolecule; NAA, N-acetylaspartate; tCho, total choline; tCr, total creatine 2914 | BORBATH eT Al.
All of the minimizations are performed on both the real and the imaginary part of the data.
The ProFit-1D is based on ProFit-2D-v2 20 and was written in MATLAB R2019a (The MathWorks, Natick, MA). Multiple iteration steps are used to minimize the cost function R x (see Section 2.2.3). For this, the lsqnonlin optimization function with the trust-region-reflective algorithm was applied, with starting values and bounds as described in Section 2.2.3. Within the lsqnonlin optimization, the lsqlin function is used additionally to determine the metabolite concentrations c and spline coefficients a. This combination of linear and nonlinear optimizations were adapted from the previous ProFit-2D versions. 20,21

| Spline baseline estimation
The degree of spline baseline stiffness has been shown to influence metabolite concentrations substantially. 26,27 To find the optimal regularization parameter for the spline baseline stiffness, Wilson 24 proposed the concept of the modified Akaike's information criterion (mAIC). For this, the matrix H is defined using the matrix of the tensor spline vectors (B) and the second-order difference operator D as follows: Afterward, the effective dimension (ED), calculated from the trace of H, is calculated as Using the ED, the number of data points n, and the arbitrary numeric parameter m, the mAIC is defined as Finally, the optimal spline baseline flexibility is found by choosing the minimum mAIC value over a series of possible regularization parameters, and therefore also ED values (between 2.0 and 35). See details in the calculate_opti-mal_spline_baseline_flexibility function. The corresponding spline baseline components of matrix B are defined in the createSplineBasis function.
Although Wilson 24 set m to 5 to avoid overfitting the 3T spectra, the optimal m value for 9.4 T determined in this study was 17. We assume that there is a field strength dependence of this user set parameter m.

| Iterations
The optimal solution to the minimization problem of Equation (5A) is found through multiple iterations. The ProFit-1D spectral fitting software provides a flexible setting of the fitting iterations and constraints for the individual fitting steps through a single file.
In all iterations, T 2,k relaxation times previously determined in literature are used for the determination of the initial values of the e,k lineshape parameter. Because this study was performed using spectra from the human brain acquired at 9.4 T, T 2,k relaxation times from Murali-Manohar 34 were taken. Furthermore, because the measured MM spectrum already includes both a g and e,k broadening, these are fitted as a separate fixed global parameters and are both set to 0.
To avoid overfitting and local minima solutions, which do not correspond to the physiological reality, first iterations aim to determine global parameters. In contrast, later iterations permit higher degrees of freedom for individual metabolite parameters. For the investigated data sets, the following settings with 1+4 iterations led to the best results among the investigated solutions (see algorithmic steps depicted in Figure 1A and fitting bounds and starting values in Supporting Information Table S1): Iteration 0: A well-phased and frequency-corrected spectrum is crucial for fitting. Hence, a minimization is performed using Equation (5A), considering only the main metabolite singlets: N-acetylaspartate (NAA) acetyl moiety (NAA[CH 3 ]), total choline (tCho) (includes glycerophosphoryl choline and phosphoryl choline) together with phosphoethanolamine named tCho+, the total creatine moieties tCr(CH 3 ) and tCr(CH 2 ), and the MM spectrum, while keeping all parameters global and setting a zero baseline. The resulting 0 , 1 and global values from iteration 0 are applied to correct the spectrum before the actual spectral fitting steps. These values are also stored for the final visualization and quantitative reporting of the parameters.
All further iterations adjust the basis sets to the spectrum, and the solution of each iteration is used as a starting value for the following iteration.
Iteration 1: In addition to the main metabolites used in iteration 0, the following metabolites were set as active: aspartyl moiety of NAA (NAA([CH 2 ]), glutamate (Glu), and myo-inositol (mI). These are used to determine the global values of 0 , 1 , global , and g .
Iteration 2: All metabolites are used to refine the 0 , 1 , global , and g parameters. Iteration 3: 3.1: The spline baseline flexibility is determined as described in Section 2.2.2 and Equation (6C) for each spectrum. In this spline estimation-fitting procedure, all parameters are fixed to the fit values from iteration 2, while only the metabolite concentrations c and the spline baseline coefficients a are left free. 3.2: Using the determined , the spectrum is refitted with a spline baseline, and the fitted values of 0 and 1 are finetuned using less-strict bounds for these parameters while keeping the parameters g , , and e more constrained (see Supporting Information Table S1).

Iteration 4:
The final values of all parameters are determined in this step. The most significant changes include the freedom of individual metabolites to have independent e,k and ω k , while allowing g to be adjusted to the new parameters. The global 0 and 1 are more constrained in this step.
The fitting software is freely available at https://gitlab.tuebi ngen.mpg.de/AG_Henni ng/ProFi t-1D. See brief installation instructions also in Supporting Information Appendix B.

| METHODS
The ProFit-1D fitting software underwent a comprehensive validation process. To evaluate the accuracy of the ProFit-1D fitting software, spectra mimicking in vivo data quality and different perturbations characteristic for in vivo data were simulated (see Section 3.1). Spectral fitting of these simulated data also gave insight into the precision of the software. While the fitting accuracy cannot be tested using in vivo data, the precision of the fitting algorithm was evaluated with test-retest human brain spectra created through postprocessing (see Section 3.2). The results of ProFit-1D were also compared to those of LCModel V6.3-1L. [15][16][17] The LCModel baseline flexibility was set to less flexible by setting the parameter dkntmn = 0.25 (default value is 0.15), constraining the knot spacing. 27 A spectral basis set was simulated for a semi-LASER sequence 35 in Vespa 22 using real pulse shapes, 36 and included the following metabolites and combined metabolites, as described in Murali-Manohar et al 34 : N-acetylaspartylglutamate (NAAG); γ-aminobutyric acid; aspartate; Glu; glutamine; glutathione; glycine; mI; scyllo-inositol; lactate; and taurine; and tCho+. The tCr(CH 3 ) and tCr(CH 2 ), and NAA(CH 3 ) and NAA(CH 2 ) moieties were treated as separate basis functions. Chemical shifts and coupling constants were taken from Govindaraju et al 28,29 and for γ-aminobutyric acid from Near et al. 37 Finally, a measured MM spectrum 34 was included in the basis set. This basis set was used both for creating simulated spectra and as input for ProFit-1D and LCModel fitting software.
The SNR of the spectra (SNR NAA ) was defined for the NAA(CH 3 ) singlet using the real part of the spectrum and dividing the peak amplitude by the SD of the noise between −4.0 and −1.0 ppm.
All of the simulated and preprocessed in vivo data, as well as the basis sets, are publicly available at https://edmond. mpdl.mpg.de/imeji/ colle ction/ 9jH7z RA_Kx_CZ6O3.

| Simulated spectra
Simulated test data were created using Equations (1) and (2) and scan parameters of in vivo spectra acquired at 9. ) and T 2,k relaxation times were taken from Murali-Manohar et al, 34 and the metabolite concentrations were then relaxation-adjusted according to . Respective relative-intensity weightings were applied to the simulated spectral pattern of all metabolites mentioned previously before their linear combination to yield the test spectra for the accuracy test.
To test the fit accuracy, first default values for each parameter 0 , 1 , g , ( global and local ), e , noise, and c were chosen. Then one parameter at a time was varied while keeping all others at their default values. This way, six different sets of 15 simulated spectra each were generated with linear or random variations between the minimum and maximum values for 0 , 1 , g , global , local , or e ( Table 1). The average SNR NAA was about 110. For c simulations, 100 test cases were generated, varying all concentrations simultaneously. For noise, 25 test cases were generated, while 16 different baseline simulations were created: Fourteen baselines were extracted from previous LCModel fits, and two additional baselines were created simulating a Gaussian lipid peak at 1.3 ppm, both in phase ( 0 = 0°) and out of phase ( 0 = 90 • ). The series of simulated spectra are shown in Figure 2 and Supporting Information Figure S1.

| In vivo spectra
To evaluate the precision of the fitting software, previously acquired metabolite-cycled semi-LASER spectra were used (see acquisition parameters in Section 3.1). This data set was previously published in upfield 34 and downfield 38 T 2 studies. Voxels were positioned in the occipital lobe of the human brain at 9.4 T (Siemens Magnetom 9.4T whole-body MRI scanner; Erlangen, Germany), and MRS data were measured in 11 healthy volunteers (27.8 ± 1.9 years, 3 females). The study was approved by the local ethics board, and written informed consent was given by all the subjects before the examination. Data were preprocessed as described in Murali-Manohar et al. 34 For each subject, out of all available spectral averages (number of excitations = 96), while maintaining the appropriate metabolite and 16-step phase-cycling, subset test-retest spectra were created. Two spectra with 32 averages each summing the excitations 1:32 and 33:64, and two spectra with 64 averages each summing the excitations 1:64 and 33:96, were averaged per subject. These spectra were then stored in the LCModel .RAW type file format.

| ProFit-1D preprocessing
Before the actual ProFit-1D fitting iterations, the MRS signal was enhanced through the spectral filtering described in Supporting Information Appendix A. Afterward, the spectrum was frequency-aligned to the main metabolite singlets: NAA(CH 3 ), tCr(CH 3 ), tCr(CH 2 ), and tCho trimethyl moiety (tCho[CH 3 ] 3 ). Frequency alignment was performed on the magnitude spectrum to avoid possible phasing errors.

| Evaluation of fit results
As the first step of the accuracy evaluation of ProFit-1D, the dependence of the fit accuracy for the fitted parameters 0 , 1 , g , , e , and c on input parameter variations of the parameters 0 , 1 , g , global , local , e , noise, and baseline was investigated using simulated spectra. For global parameters 0 , 1 , and g , the fitting errors were calculated by comparing the fitted value against the ground truth used during simulation of the spectra as follows: where param stands for any of the fitting parameters. For parameter vectors c, e , and , the fitting errors were calculated across all metabolites as the mean absolute values (abs): As a further step of the accuracy and precision evaluation, the influence of parameter variations in the simulated input spectra on the accuracy of the concentration estimates was investigated, and respective results from ProFit-1D were compared against LCModel results. Both the LCModel and ProFit-1D-derived metabolite concentrations were normalized to the simulated tCr(CH 3 ) concentration. The concentration differences in percent (c % k ) induced by parameter variations of 0 , 1 , g , global , local , e , noise, and baseline were determined for each metabolite and both fitting software packages as follows: The value of X k F I G U R E 2 Series of simulated spectra are shown for the variations in 0 , 1 , local , global , c, and baseline. For each simulation series, all parameters were kept at the default values as described in Table 1, while varying only the parameter mentioned in the title. For the parameter in question, the simulated variation included values between the minimum and maximum values given in Table 1. Simulations for variations in g , e , and noise are shown in Supporting Information Figure S1 As the final step of the accuracy evaluation and part of the precision evaluation, the concentrations c of the simulated spectra were varied, and respective correlation plots between input and measured concentrations were created to investigate how accurately a range of low to high metabolite concentrations in the spectra can be determined.
Bland-Altman plots 39 were used for the precision analysis of the metabolite concentrations fitted for in vivo data. For this purpose, we define the concentration of the metabolite k and the subject i for the fit ( fits1) and refit ( fits2) as c fits1 i,k and c fits2 i,k , respectively. The value of fits1 represents the first set of the subspectra with 64 or 32 averages, whereas fits2 is the second subspectra. Metabolite concentrations were normalized to the water reference. The Bland-Altman plots were calculated for changes of metabolite concentration in percent c % i,k : Reproducibility coefficients 39 (RPC % k ) for the in vivo subspectra are also reported: For comparison of the two fitting software packages and to investigate the influence of cost functions (R x ) in ProFit-1D (see Section 2.2.1), the averaged reproducibility coefficient was calculated for each software and cost function as This metric was evaluated for all metabolites RPC % all or only the main metabolites RPC % main . All simulated and in vivo data used in accuracy and precision analyses were fitted with the same fit settings for both ProFit-1D and LCModel ( Figure 1A and Supporting Information Appendix C), except when the different cost functions for ProFit-1D were compared (see Section 3.5).

| Comparison of cost functions
The different cost functions (see Section 2.2.1) were compared against each other. Equations (7C) and (8C) were used for both simulated and in vivo results. For the in vivo results, the fit-quality numbers (FQN) are reported. The FQN is defined "as the ratio of the variance in the fit residual (in the fitted frequency or time range) divided by the variance for pure spectral noise" 40,41 (see Equation 9). Because several of the in vivo data sets in the current study have small lipid contaminations, the FQN was calculated for the following ppm ranges of the residue (FOI FQN ): FOI FQN = 0.6: 4.1 (ppm) and FOI FQN = 1.95: 4.1 (ppm).
The R 3 cost function was used for ProFit-1D comparisons to LCModel as well as for all shown figures, as this cost function provided the best results for Profit-1D (see Section 4).

| RESULTS
The results of the baseline simulations are shown in Figure 3 and Supporting Information Figure S2 Figure 4 show the accuracy of each of the following fitted parameter 0 , 1 , g , , e , and c in dependence of the input variance of the parameters 0 , 1 , g , global , local , e , noise, and baseline computed according to Equations (7A,B), respectively. Noise and baseline variations influence the fit accuracy the most, but also local frequency shifts local and the e line broadening has a negative impact on the accuracy of some concentration estimates.
The accuracy of metabolite concentration estimates influenced by each of these parameter variations is summarized in Figure 5, Supporting Information Figure S3, and Supporting Information Table S3. These results are shown for both ProFit-1D and LCModel. While the concentration estimates of ProFit-1D are slightly more accurate (smaller deviation of the mean measured value from the ground truth), the LCModel results are more precise (smaller SDs). For example, in the Gaussian line-broadening g simulations, the mean concentration inaccuracy of ProFit-1D was 7.4%, whereas it was 14.0% for LCModel; however, at the same time, LCModel was more precise-the SD being just 5.4% compared with 9.0% for ProFit-1D.
The correlation plots created for the concentration variation simulations c in Figure 6 and Supporting Information Figure S4 show that, overall, both ProFit-1D and LCModel determine the true variance of metabolite concentrations well. Supporting Information Figure S5 and S6 further show, with the example of Glu and glutamine, that metabolite concentrations were varied independently, and that increases/decreases of Glu concentration affect the fitted glutamine concentrations to a lesser extent than other more inherent fitting inaccuracies. Metabolites corresponding to the most prominent spectral singlets are particularly well fitted, whereas metabolite concentrations derived from less prominent multiplets related to metabolites like aspartate, γ-aminobutyric acid, glutathione, glycine, NAAG, taurine, and scyllo-inositol show slightly lower accuracy and precision. Spectral fits with ProFit-1D and LCModel for representative in vivo spectra are shown in Figure 7. The corresponding Bland-Altman plots for the metabolite concentration testretest results according to Equation (8A) show the reproducibility of ProFit-1D and LCModel in Figure 8 and Supporting Information Figure S7. The reproducibility coefficients RPC % k and the summaries RPC % main and RPC % all are presented in Supporting Information Table S4.
A comparison of the cost functions R x are given in Table 2. There is a minor improvement of accuracy and precision for simulated and in vivo data between R 1 and R 2 . There is, however, an improvement in all of these metrics when using the R 3 cost function. Additionally, for the in vivo data, the FQN FOI FQN are reported, in which the use of the cost function R 3 led to the smallest FQN for the lipid-free area, whereas it is the highest for the whole FOI FQN .
The best accuracy and precision for both simulated and in vivo data were achieved using R 3 ; therefore, the figures and results from all previous sections display these results.

| DISCUSSION
This work presents a newly developed spectral fitting software, ProFit-1D, and systematically evaluates its performance for accuracy and precision of the fit results against the MRS-community gold standard for spectral-fitting LCModel. The ProFit-1D software and the data sets are available for free.
As part of the accuracy test of the new fitting algorithm, a systematic evaluation for all possible perturbations of 1 H-MRS spectra was performed. Analyzing ProFit-1D for particular disturbances such as phase distortion, frequency shifts, baseline distortions, line broadening and noise, which mimicked in vivo conditions, allowed for the conceptualization of whether the fitting software was underperforming for a particular type of disturbance. The results displayed in F I G U R E 4 ProFit-1D parameter evaluation of the parameter differences of 0 , 1 , g , , e , and c for the eight simulation setups with changes in 0 , 1 , g , global , local , e , noise, and baseline. The value of Δ 0 , Δ 1 , and Δ g are the differences between the fitted and the simulated values of 0 , 1 , and g , whereas |Δ |, | | Δ e | | , and |Δc| are the mean absolute differences of the respective input parameters. These results are also summarized numerically in Supporting Information Table S2 | 2921 BORBATH eT Al. Figures 4 and 5 show that the newly developed ProFit-1D fitting algorithm does not have systematic errors in determining the fitting parameters in general and the metabolite concentrations more specifically.
During the development of ProFit-1D, it was observed that large phase and frequency shift distortions lead to fit uncertainty. Therefore, similarly to other spectral fitting packages, 15,24,25 preprocessing steps for spectral fitting were introduced. Herein, the spectrum read from the.RAW file was frequency aligned and phase-corrected considering the multiple singlets in the magnitude spectra and using the main metabolites for an initial 0 and 1 correction. However, the 1-3 outliers produced by the local,k simulations are for simulating local,k of up to 9 or 15 Hz, which are mimicking rather unusual scenarios: The effects of temperature on the metabolite moieties, as measured by Wermter, 42 can be up to 7•10 -4 (ppm/K) or 0.28 (Hz/K) at 9.4 T; the effects of pH or other ions are more significant, although only a highly pH-sensitive metabolite such as homocarnosine has two resonances that shift by 7.7 and 20 Hz for a 0.1 pH change. 38,43 Overall, ProFit-1D had the highest uncertainty for fitting noise and baseline simulations. Also, previous publications using LCModel demonstrated that the fitted spline baseline affects the fitted metabolite concentrations F I G U R E 5 Concentration changes in percent c % k for each metabolite for simulation setups 0 , 1 , global , and local , comparing the results of ProFit-1D and LCModel. The results are also summarized in Supporting Information Table S3. Horizontal lines inside the boxes indicate median values (50% quartile), whereas the bottom and top box boundaries illustrate the 25% and 75% quartiles, respectively. Plus signs (+) show outliers. The results for simulation setups g , e , noise, and baseline are shown in Supporting Information Figure S3. Asp, aspartate; GABA, γ-aminobutyric acid; Gln, glutamine; Gly, glycine; Lac, lactose; NAAG, N-acetylaspartylglutamate; sI, scyllo-inositol; Tau, taurine; tCr(CH 2 ), total creatine moiety significantly. 26,27,44 For this reason, Wilson 24 proposed the AB-fit algorithm with automatic determination of the optimal spline smoothness through the modified Akaike's information criterion. This method was also implemented into the ProFit-1D software and evaluated on spline baselines extracted from previous LCModel fits. Comparing the simulated baselines with the fitted spline baselines, both LCModel and ProFit-1D appear to model the simulated F I G U R E 6 Correlation plots between the fitted and simulated metabolite concentrations for both ProFit-1D and LCModel fits. These are the 100 c simulations described in Table 1, with the simulated concentrations spread out linearly from the literature mean concentration values up to 3.5 times the SD in both directions. The identity line is shown in black. The plots show the following metabolites: Asp, tCr(CH 2 ), GABA, Gln, Glu, Gly, GSH, Lac, NAA(CH 2 ), NAAG, tCho+, and the MM spectrum. The correlation plots for the other metabolites are shown in Supporting Information Figure S4 | 2923 BORBATH eT Al. baselines well (Figure 3). The fitted splines did not pick up minor variations of the input spline baseline; however, as seen from the residual and the fitted concentrations, these appear to have little impact on the estimated concentrations. On the other hand, some major baseline distortions such as major lipid peaks or significant water residues led to some structured noise for both fitting software ( Figure 3C and Supporting Information Figure S2G). Overall, the spline smoothness estimation through the mAIC appears to work robustly, even though for one spectrum out of the 15 the mAIC criterion estimated a stiffer than necessary spline baseline (Supporting Information Figure S2L).
Correlation plots ( Figure 6) show that ProFit-1D appears to be slightly more accurate than LCModel in determining the true underlying metabolite concentration variance for most metabolites or shows an equal accuracy performance. Based on the line-shape setup enforcing a e = 0 and g ≈ 0 for the MM spectrum, ProFit-1D concentrations of the MM and lactate, for example, are estimated more accurately. On the other hand, the enforced small frequency shift of NAAG in LCModel (CHSDSH[2] = "NAAG" and ALSDSH[2] = 0.002) led to more accurate NAAG concentrations ( Figure 6 and Supporting Information Figure S4). Although ProFit-1D performed better regarding the overall accuracy of concentration estimates, LCModel performed better for the precision of concentration estimates for noisy and strongly baseline distorted data.
For the in vivo data set performance, the ProFit-1D-fitted spectra matches the measured spectra well, and the fit residuals show minimal noise structure, similar to LCModel fits ( Figure 7). However, the mAIC curves tended to be slightly different in the reproducibility tests. The achieved in vivo reproducibility of ProFit-1D was slightly worse than for the LCModel fits. Although the main metabolites are comparably accurate (RPC % main = 17 % for LCModel vs RPC % main = 19 % for ProFit-1D), a higher discrepancy in precision is seen  that combined the time domain and frequency domain. Interestingly, a further improvement of the performance was observed following the addition of weights. The bestperforming cost function R 3 had proven useful both in the optimization of the fit parameters more accurately early on, when fewer metabolites were included, but also in later iterations. The cost function was induced to apply a more accurate minimization at the most prominent parts of the metabolite's spectrum or where multiple metabolites are present. Most likely, the weights help avoiding overfitting due to lipid contaminations in spectral ranges of lower interest for the in vivo data. The optimization with R 3 appeared to be less influenced by these, as it can be deduced from the smaller FQN 1.95: 4.1 ppm versus the higher FQN 0.6: 4.1 ppm , compared with R 1 and R 2 .
The goal of the current software development of ProFit-1D was to keep the accuracy of the fitted results as high as possible while also maintaining high precision. Hence, it was chosen not to increase the precision artificially at the detriment of accuracy, such as through enforced stiffer baselines or tighter bounds.
The current software version has the limitation of being optimized for semiLASER 9.4T human brain data and does not include non-Voigt lineshapes. However, in future, the ProFit-1D algorithm could be extended by nonparametric lineshape modeling, and it could also be tested on other data sets, particularly for more clinically relevant field strengths and other sequences. The extension to different data sets will be more straightforward, as all fitting iterations and boundaries are defined through a single MATLAB file. Other fitting software also use the concept of progressive bounds 15,24,25 or complementary information 23 ; hence, we would expect the iteration setup presented here should be applicable with minor modifications to other MRS spectral fitting.
Including a fitting software into a fully integrated MRS analysis pipeline is highly desirable. This would provide more user-friendly software and make MRS data analysis clinically more widely applicable. Although ProFit-1D is not yet more precise than LCModel (developed for 30+ years already), it comes close to its performance, and because ProFit-1D is open-source, integration into other software packages will be possible. Finally, open-source code should allow for more straightforward modifications to fit non-proton spectra 45,46 or fitting downfield spectra. 38

| CONCLUSIONS
In this study, the new fitting algorithm ProFit-1D is presented and systematically evaluated for its accuracy and precision using both simulated and in vivo data. Through systematic evaluation, by simulating variations in all spectral parameters that influence in vivo spectra, a high accuracy of the ProFit-1D fitting algorithm could be developed. Furthermore, an adaptive spline baseline stiffness regularization was demonstrated for simulations of in vivo-like spline baselines as well as in vivo spectra. Additionally, a new type of cost function was introduced, which combines complementary spectral information. This cost function improved accuracy and precision in comparison with the traditional approach.
Finally, the accuracy and precision between the developed ProFit-1D algorithm and the LCModel software were compared. ProFit-1D was slightly more accurate but somewhat less precise than LCModel on the data sets used for evaluation.

SUPPORTING INFORMATION
Additional Supporting Information may be found online in the Supporting Information section. FIGURE S1 From the series of simulated spectra for the variations in g , e, and noise, sample spectra are shown. For each simulation series, all parameters were kept at the default values as described in Table 1, while varying only the parameter mentioned in the title. The sample plots depict the simulations for the minimum, median, and maximum values given in Table 1 FIGURE S2 Fit results for the baseline simulation are shown. The blue line shows the input spectrum, whereas the black line indicates the input baseline. The fitted baselines (dashed lines) and the resulting residual (continuous lines at the bottom) are shown in red for LCModel and purple for ProFit-1D. The dotted lines in between show the difference between the simulated and fitted spline baselines. The plot offsets used for display purposes are indicated on the right of each subplot. Fitted baselines agree well with the simulated ones, except for subfigure L. Inlays show the mAIC curves for the ProFit-1D fitting. These spectra show extracted baselines from previous LCModel fits FIGURE S3 Concentration changes in percent c % k for each metabolite for simulation setups g , e , noise, and baseline, comparing the results of ProFit-1D and LCModel. The results are also summarized in Supporting Information Table  S3. Horizontal lines inside the boxes indicate median values (50% quartile), whereas the bottom and top box boundaries illustrate 25% and 75% quartiles, respectively. Plus signs (+) show outliers. The results for simulation setups 0, 1, global , and local are shown in Figure 5  FIGURE S4 Correlation plots between the fitted and simulated metabolite concentrations are shown for both ProFit-1D and LCModel fits. The identity line is shown in black. The plots show the following metabolites: total creatine (Cr) (CH 3 ), myo-inositol (mI), N-acetylaspartate (NAA)(CH 3 ), scyllo-inositol (sI), and taurine (Tau) FIGURE S5 Scatter plot showing the simulated (black circles) concentrations of glutamate (Glu) (c Glu ) and glutamine (Gln) (c Gln ) with respect to each other. The depicted concentrations are a subset of the c simulations (see Figure 6) to allow better visualization; the full set of 100 simulations is shown in Supporting Information Figure S6. The simulated scatter points are evenly spread in the concentration range of Glu (4 to 9 mmol/kg) and Gln (0.1 and 3.6 mmol/kg), as expected from in vivo studies. The even distribution demonstrates that independent concentration variations of these metabolites were tested. Similarly, all other metabolites had independent variations. Connected by lines to the simulated metabolite concentrations (black circles), the concentrations fitted by LCModel (orange rhomboids) and ProFit-1D (purple crosses) are shown. The differences from the simulated values show the accuracy of the fitting software, with higher accuracy observable for the metabolite concentrations of both Glu and Gln in ProFit-1D compared with LCModel. Note that there are systematic concentration errors, such as in c Glu in either of the fit software, partially impacted by the c Gln (a metabolite with a similar spectral appearance). The bulk of the fit accuracy errors appear to have a different origin than the interaction between these two metabolites FIGURE S6 Scatter plot showing the simulated (black circles) concentrations of Glu (c Glu ) and Gln (c Gln ) with respect to each other. The depicted 100 concentrations are from the c simulations (see Figure 6). Connected by lines to the simulated metabolite concentrations (black circles), the concentrations fitted by LCModel (orange rhomboids) and ProFit-1D (purple crosses) are shown. A more elaborate explanation with a subset to allow better visualization is shown in Supporting Information Figure S5  FIGURE S7 Bland-Altman plots shown for 9 of the 17 fitted metabolites. Odd rows show the concentrations fitted with ProFit-1D, whereas the even rows show the results for LCModel. The individual scatter points represent the fits and refitted concentrations for the 11 volunteers. One subspectra with 32 averages is compared against the other subspectra with 32 averages (blue circles). Similarly, a comparison was made for the two 64-averages subspectra/subject (red circles). The labels also report the calculated reproducibility coefficients (RPC % k ). The continuous horizontal line and number indicates the mean c % i,k error, whereas the dotted horizontal lines represent the reproducibility bounds, meaning the ± 1.96 ⋅ std i c % i,k values FIGURE S8 The top-left subfigure shows the method of detection of the truncation point. First the noise baseline is found, then the truncation point is determined according to Equations (S1A) and (S1B). The different spectral filtering options are depicted in the bottom-left subfigure, showing F trunc (FID truncation), F sine−bell (the sine-bell filter), and F matched (the matched filter). Applying these onto the FID of a sample in vivo spectrum reduces the noise while keeping the spectral pattern, as shown in the right subfigure and the zoomed-in inlay TABLE S1 Bounds and starting values for the individual parameters during the different fitting iterations. Parameters may be adjusted globally to all metabolites or independently for each metabolite. In iteration 2, where new metabolites were added, the starting values for these metabolites were taken as the mean from the metabolites from the previous fit. Iter., iteration; indep., independently TABLE S2 ProFit-1D parameter evaluation of the parameter differences of 0, 1, g , , e , and c for the eight simulation setups with changes in 0, 1, g , global , local , e , noise, and baseline. The mean and the SD between the simulations for the fitting parameter are displayed for each simulation setup. The values of Δ 0 , Δ 1 and Δ g are the differences between the fitted and the simulated values of 0, 1, g , whereas |Δ |, | | Δ e | | and |Δc| are the mean absolute differences of the respective input parameters. The corresponding scatter plots to this table are shown in Figure 4 TABLE S3 Concentration changes in percent c % k for each metabolite for simulation setups 0, 1, global , local , g , e , noise, and baseline, comparing the results of ProFit-1D and