QSM reconstruction challenge 2.0: A realistic in silico head phantom for MRI data simulation and evaluation of susceptibility mapping procedures

Purpose To create a realistic in silico head phantom for the second QSM reconstruction challenge and for future evaluations of processing algorithms for QSM. Methods We created a digital whole‐head tissue property phantom by segmenting and postprocessing high‐resolution (0.64 mm isotropic), multiparametric MRI data acquired at 7 T from a healthy volunteer. We simulated the steady‐state magnetization at 7 T using a Bloch simulator and mimicked a Cartesian sampling scheme through Fourier‐based processing. Computer code for generating the phantom and performing the MR simulation was designed to facilitate flexible modifications of the phantom in the future, such as the inclusion of pathologies as well as the simulation of a wide range of acquisition protocols. Specifically, the following parameters and effects were implemented: TR and TE, voxel size, background fields, and RF phase biases. Diffusion‐weighted imaging phantom data are provided, allowing future investigations of tissue‐microstructure effects in phase and QSM algorithms. Results The brain part of the phantom featured realistic morphology with spatial variations in relaxation and susceptibility values similar to the in vivo setting. We demonstrated some of the phantom’s properties, including the possibility of generating phase data with nonlinear evolution over TE due to partial‐volume effects or complex distributions of frequency shifts within the voxel. Conclusion The presented phantom and computer programs are publicly available and may serve as a ground truth in future assessments of the faithfulness of quantitative susceptibility reconstruction algorithms.


| INTRODUCTION
Quantitative susceptibility mapping has proven to be a valuable tool for assessing iron concentrations in the deep gray matter, 1-3 estimating vessel oxygenation and geometry, 4,5 differentiating blood and calcium products, 6,7 and studying demyelinating lesions in the white matter. [8][9][10][11] However, several recent methodical investigations have suggested that study outcomes may depend on the particular processing algorithms chosen for QSM. [12][13][14] Quantitative susceptibility mapping typically involves the following steps: coil combination, 12 phase unwrapping, 14 multi-echo combination, 12 background field removal, 14 and, finally, the estimation of susceptibility maps. 13,[15][16][17] Processing artifacts and inaccuracies at any of these five processing stages can propagate into the computed susceptibility maps.
The first QSM reconstruction challenge (RC1) in 2016 18 aimed to provide initial insights on the accuracy of various proposed algorithms for estimating susceptibility from background-corrected frequency maps (ie, the last processing step of QSM). One of the key conclusions of RC1 was that the choice of the algorithm and the used parameter settings can have a substantial, nonnegligible effect on the appearance and accuracy of computed susceptibility maps. However, following completion of the challenge, it was also recognized that the particular gold-standard (reference) susceptibility maps used for evaluating the challenge submissions limited the interpretability of the challenge outcomes. The reference maps were generated from multiple acquisitions in which the subject had rotated the head toward 12 different orientations. From these data, two reference maps were created: one calculated with the susceptibility tensor imaging 19 technique and one by calculation of susceptibility through multiple orientation sampling (COSMOS). 20 Meanwhile, only one of the 12 field maps was provided to the challenge participants. The rationale of this approach was that RC1 would yield the most objective and meaningful results if algorithms were evaluated using real-world in vivo data. However, at the completion of RC1, it was observed 21 that a nonnegligible discrepancy existed between the provided frequency map and the frequency map obtained when the field perturbation was forward-simulated based on the provided reference susceptibility maps. It was speculated that a part of the discrepancies was related to unaccounted microstructure effects on in vivo brain phase images. 22 Current single-orientation QSM algorithms assume that frequency contrast is caused entirely by variations in bulk magnetic susceptibility, and all other contrast mechanisms are neglected. Consequently, the discrepancy between the provided field map and the goldstandard susceptibility reference rendered it challenging or even impossible to achieve a reconstruction from the field map that was close to the reference used. It turned out that the best-performing RC1 submissions (ie, those with the smallest error metrics) were overregularized and had a nonnatural appearance.
The goal of the second reconstruction challenge (RC2) in 2019 was to address the identified limitations of RC1 and provide more meaningful insights on the current state-ofthe-art in QSM algorithms, to identify their strengths and limitations in different scenarios and inform and coordinate future methodological research efforts. During the planning phase for RC2, the challenge committee concluded that the systematic evaluation of the accuracy and robustness of QSM methods should focus on synthetic (in silico) phantoms with realistic forward simulations rather than on real-world data. The challenge was designed with two stages: stage 1 mimicked the clinical setting in which the ground truth was unknown to participants; in stage 2 the ground truth was made available, and thus allowed for systematic parameter optimizations to obtain the best possible quality metrics that can be obtained with each reconstruction algorithm. The results of RC2 were reported in a separate manuscript. 23 In this paper, we present the modular framework designed to generate the realistic digital head phantoms for RC2. Methodological researchers may use the RC2 phantom in their studies to evaluate existing and future QSM algorithms and Results: The brain part of the phantom featured realistic morphology with spatial variations in relaxation and susceptibility values similar to the in vivo setting. We demonstrated some of the phantom's properties, including the possibility of generating phase data with nonlinear evolution over TE due to partial-volume effects or complex distributions of frequency shifts within the voxel. Conclusion: The presented phantom and computer programs are publicly available and may serve as a ground truth in future assessments of the faithfulness of quantitative susceptibility reconstruction algorithms.

K E Y W O R D S
in silico head phantom, MRI simulations, MRI simulations, quantitative susceptibility mapping compare their results with RC2 submission. The comparison of their metrics with those of RC2 submissions will facilitate the objective evaluation of methodological improvements and algorithm performance across labs. As more advanced physical models are incorporated into the QSM algorithms, researchers may extend the phantom according to their needs. Code and data are freely available and have been designed to facilitate adding new features to the phantom, such as calcifications and hemorrhages or microstructure effects. The software package may also be used to optimize acquisition protocols, and prepare and test complete QSM reconstruction pipelines. In combination with other software, the package will allow us to evaluate the effect of image distortions or blurring on QSM.

| Limitations of previous evaluation strategies
In the literature, most QSM algorithms were evaluated based on their visual appearance, 5,24,25 based on the RMS error (RMSE) of reconstructions of simple digital piece-wise constant phantoms consisting of geometrical shapes [25][26][27][28][29][30] or simplistic head phantoms. 22,31 Evaluation of the susceptibility quantification accuracy and precision typically relied on phantoms made of agar or aqueous solutions with varying concentrations of contrast agents such as gadolinium 20,26,[32][33][34][35] or iron oxide particles. 5,28,[36][37][38] Such measurements have been of great importance in establishing that QSM linearly maps the magnetic susceptibility property and that measurements across different platforms can be compared. In vivo, QSM accuracy has often been evaluated 27,31 by using previously published iron concentrations in the deep gray-matter nuclei 39 as a surrogate gold standard. This approach suffers from the large variability in iron concentrations across subjects.
A major limitation of most previously used digital phantoms and liquid or gel phantoms was that they had piece-wise constant susceptibility distributions, which are particularly easy to invert for methods with total variation regularization ("inverse crime"). To address this limitation, validation has also been performed by injecting gadolinium into tissue samples 20 or using air bubbles or glass beads. 40 In the first case experiments, however, the ground truth is again not available as the agents diffuse within the tissue. Therefore, it has to be reverted to visual inspection. In vivo, as an alternative to visual inspection, maps have been compared with a COSMOS reconstruction of the same subject, [28][29][30]41,42 due to their reduced level of streaking artifacts, similar as in RC1. However, using a COSMOS solution as gold standard implicitly assumes that the measured phase satisfies the COSMOS field model. Specifically, COSMOS assumes that (1) susceptibility is isotropic throughout the brain; (2) the dipole model with the sphere of Lorentz approximation can be used throughout the brain 22 ; and (3) microstructure-related frequency effects 43 and chemical exchange 44,45 do not exist. Because these assumptions are simplistic, COSMOS does not generate an appropriate ground-truth susceptibility map for single-orientation QSM, and therefore cannot be considered a good gold-standard method.

| Design considerations for RC2
This RC2 committee's decision to use a digital phantom resulted from the realization that a true gold-standard technique for in vivo QSM did not exist. Without a gold-standard technique, it was not possible to obtain a meaningful in vivo reference susceptibility map through measurements. The committee had also discussed the design of real-world test objects (phantoms) that are consistent with the QSM phase model. 33,34 Based on the committee members' experience with phantom design and a literature research of previously used phantoms, it was concluded that the inclusion of sufficiently complex morphology and fine-scale susceptibility features would be prohibitively challenging. It was unanimously concluded that it would be most reasonable to focus the committee's efforts on a digital phantom that could be adapted and extended to the community's evolving needs in the future. For real-world data, available references usually represent only approximations of the ground truth (gold standard). On the contrary, in silico phantoms provide a genuine ground truth. In silico phantoms also allow for a controlled investigation of the effect of deviations from the underlying QSM model on the reconstruction performance. In addition to the ability to model different biophysical phase contributions, digital models also allow a controlled inclusion of measurement-related phase errors. For example, field measurements close to the brain surface are affected by nuisances, such as signal dropout and the nonlinearity of the phase evolution due to the nonnegligible higher-order spatial terms inside the pixel 46 that make the measured field deviate from the actual voxel-average field. Similar limitations are present when developing background field-removal methods. Despite this known limitation, only a few methods have the possibility of explicitly accounting for field-map uncertainty, 27,32 while remaining methods address this problem by increasing the brain mask erosion. 24,26,47 As a first step toward a future systematic evaluation of all of these experimental aspects influencing QSM reconstruction quality, the RC2 in silico phantoms enforced consistency of the provided frequency map with the physical model used by current QSM algorithms. Moreover, the RC2 phantoms were designed to feature a realistic brain morphology and naturally varying susceptibility distribution within anatomical regions.

| Data acquisition
We acquired MRI data from a human volunteer (female, 38 years old), who gave informed consent, and the experiment was approved by the local medical ethical committees (Amsterdam University Medical Center and Radboud University Medical Center). We used a 7T scanner to obtain relaxation-rate maps and a 3T scanner to obtain DTI data and bone-air tissue interfaces. To generate the brain phantom, we acquired inherently co-registered quantitative maps of R 1 , 48 R * 2 , χ, and M 0 maps using the MP2RAGEME 49 sequence on a 7T (Philips Achieva; Amsterdam, Netherlands) scanner. The main sequence parameters were TR/TI 1 /TI 2 = 6.72/0.67/3.86 seconds. The first and second TIs were acquired with TE 1 = 3 ms and TE 1/2/3/4 = 3/11.5/20/28.5 ms and flip angles α 1 /α 2 = 7°/6°, respectively. The acquisition was performed sagittally with FOV = 205 × 205 × 164 mm 3 and matrix size = 320 × 320 × 256, resulting in an isotropic resolution of 0.64 mm and a total acquisition time of 16:30 minutes.
To add microstructure effects to the phantom, we acquired DTI data using two simultaneous multislice EPIbased data sets with opposed phase-encoding directions. The main sequence parameters were TR/TE = 3520/74 ms, simultaneous multislice factor = 3, in-plane acceleration = 2, matrix size = 140 × 140 × 93, and FOV = 210 × 210 × 139.5 mm 3 , resulting in 1.5-mm isotropic image resolution. The diffusion-weighted parameters were b = 0/1250/2500 s/mm 2 and 12/90/90 directions, respectively, resulting in a total acquisition time of 12:10 minutes. Diffusion data were processed using FSL software (https://fsl.fmrib.ox.ac.uk/fsl/ fslwi ki/); eddy_correct and top up were used to undistort the DWI data. Data were coregistered to the 7T anatomical space, and the FMRIB Diffusion Toolbox was used to extract tensor information (eg, fractional anisotropy, main eigenvector orientation). Figure 1 shows a pictorial representation of the pipeline used for the segmentation. The 7T T 1 maps, derived from the MP2RAGE data set, were segmented into 28 tissue classes, including left and right splitting, using the cbstools atlasbased pipeline (https://www.cbs.mpg.de/insti tute/softw are/ cbs-tools 51 Classes were then reclustered into 16 tissue clusters: CSF (initially split into four classes); gray matter (initially split into eight classes, left and right cortical, cerebellar, amygdala, and hippocampus); caudate; putamen; thalamus; white matter (encephalus, cerebellum, and brain stem); and large blood vessels. Deep gray-matter structures not clearly distinguishable on T 1 maps (red nucleus, substantia nigra, globus pallidus, and dentate nucleus) were manually segmented using the active contours function implemented in ITK-Snap (version 3.6) 52 on R * 2 and χ maps. A calcification present in the subject's interhemispheric fissure was identified and segmented using the M 0 map. An initial vein-andartery mask was computed based on Frangi-filtered R * 2 maps as described in Chan et al. 53 The region outside the brain was segmented into bone, air, and tissue using a model-based segmentation approach with deformable surface meshes, 54 using the PETRA sequence as input. The resulting segmentations of nasal cavities and auditory canals were refined manually using ITK-Snap for the computation of realistic background fields.

| Tissue segmentation
After the combination of the individual tissue brain masks into a piece-wise constant whole-head phantom, we used R * 2 and R 1 maps to correct the label boundaries in various brain regions using customized thresholds (Supporting Information Table S1). The initial tissue segmentations were based on one single quantitative parameter (R 1 for tissues compartments, R * 2 for veins), and because smooth surfaces had been enforced in some regions, this resulted in segmentation mismatches that benefited from this second processing iteration.

| Susceptibility map
The susceptibility map was simulated by assigning tissuetypical susceptibility values taken from literature, 55,56 tissue (Table 1), to the various tissue segments. We modulated the susceptibility values in each region using the image intensities on R 1 and R * 2 maps according to the following equation: where R * 2tissue and R 1tissue are the mean apparent transverse and longitudinal relaxation rates of that given tissue segment class. There were three main motivations for using such an expression to compute our ground-truth susceptibility map: • Modulation-avoided susceptibility values were constant throughout anatomical regions (piece-wise constant). (1) Absence of modulation would be both unrealistic and advantageous to algorithms with gradient-based regularization terms; • From a practical perspective, R 1 and R * 2 were the only two "bias field" free maps available at high resolution that could be used to create an anatomically valid intensity modulation; and • Both transverse and longitudinal relaxation rates, like magnetic susceptibility, are known to have a linear dependence on the concentration of paramagnetic and diamagnetic perturbers when dealing with simple liquid solutions. The main difference to susceptibility is that relaxation rates are agnostic to the sign of the magnetic perturber; particularly in brain tissues, both R 1 and R * 2 have been shown to have a linear dependence on the concentrations of iron and myelin. 57 Although it is reasonable to assume that the susceptibility map (assuming any other tissue properties constant) could be F I G U R E 1 The process used to obtain the head segmentation: R 1 map (obtained from MP2RAGEME) was used to create an atlas-based segmentation using CBS tools; R * 2 were processed with a Frangi filter for vein segmentation; a semimanual approach using ITK snap was used for segmentation of the deep gray-matter nuclei based on the R * 2 and a susceptibility map computed using HEIDI; and the M 0 map was used to segment the calcification. Finally, PETRA data were used to obtain air, bone, and tissue masks using a CT-based deep-learning algorithm followed by manual ITK snap. Then, the various tissue segmentations were fine-tuned using denoised R 1 and R * 2 maps, using manually defined thresholds. The various masks were combined to generate a whole-head segmentation with 16 different tissue types given by a linear combination of these two maps, the main aim was to introduce a realistic texture. To obtain proportionality parameters, a tissue and b tissue , resulting in realistic susceptibility variations, Equation 1 was inverted for each brain tissue class using as (r) tissue , the HEIDI susceptibility map calculated from the original data. The coefficients tissue , a tissue , and b tissue used for the two phantoms in the QSM challenge 2.0 are shown in Table 1 and Figure 2. Note that having different proportionality parameters for each tissue (in addition to a different mean value per tissue type) results in a susceptibility map that cannot be derived simply from the magnitude signal variations. Because the measured relaxation rates of blood (both in arteries and veins) are prone to errors (due to inflow effects on R 1 and flow effects on R * 2 maps), T A B L E 1 Parameters used to create the two magnetic-susceptibility head models released in the QSM challenge The values in the three columns correspond to the parameters described in Equation 1, the assumed mean magnetic susceptibility of the tissue, and the R * 2 (a tissue ) and R 1 (a tissue ) modulation terms, respectively. Model 1 values were chosen from literature and doing the fitting described in Equation 1. Model 2 mean susceptibility values were ad hoc modification of those found in literature, while the R 1 and R * 2 modulation were changed by plus and minus 10%, respectively.

F I G U R E 2
View of three slices in sagittal, coronal, and axial directions of the two digital phantoms created for the QSM challenge 2.0. Top and Bottom maps were obtained using the parameters described in Table 1 for models 1 and 2, respectively the proportionality values were relatively small for the blood pool, rendering these compartments piece-wise constant. Bone, calcification, and other non-brain-tissue compartments were made piece-wise constant (the lack of a susceptibility map outside the brain prevented the derivations of a tissue and b tissue ). Close to air-tissue boundaries, where strong field gradients are present, tissue R * 2 values are overestimated (see R * 2 maps in Figure 1 over the ear canals). 58 A low-pass-filtered version of the gradient of the acquired field map was used to differentiate regions where the R * 2 values could be trusted from those where they were unreliable. In the latter regions, we forced a tissue = 0 (ignore R * 2 contribution when generating the susceptibility phantom). To avoid discontinuities between high (> 0.08 ppm/mm) and low (< 0.3 ppm/mm) field gradient regions, a smooth transition was created by mixing the two combinations (from full Equation 1 and a tissue = 0, respectively). Please refer to the provided code for more details on the implementation.
To avoid unrealistically sharp edges of magnetic susceptibility at the interfaces between tissue regions, we have introduced partial-voluming in those transitions. Note that transitions between brain tissues tend not to be sharp (for example, gray-matter layers on the white-matter side are highly myelinated, 59 as are the outer parts of the thalamus), whereas between tissues and blood, CSF, air, bone and muscle, the interfaces will be sharp. The probability of a voxel being a given tissue, P tissue , was computed by smoothing each binary brain tissue mask using a 3D Gaussian kernel with a FWHM of 1.2 voxels. This smoothing was not applied to veins or non-brain-tissue masks. The probability was computed as P tissue (r) = S tissue (r) ∕ ∑ 16 tissue = 1 S tissue (r), where S tissue is either the smoothed or unsmoothed mask of a given compartment, depending on it being brain tissue or non-brain tissue. The susceptibility phantom was then given by The importance of moving from a piece-wise constant (where a tissue and b tissue are set to zero) to a contrast-modulated constant (where P tissue is simply a binary mask) or the probabilistic formalism of the susceptibility distribution can be appreciated in Figure 3. Yellow arrows highlight the transition between gray and white matter that becomes smoother, and green arrows highlight smoothing out small segmentation errors within the thalamus. In contrast, Figure 3 shows that most vessel structures have remained sharp, with only some minor reduction in susceptibility value. The R * 2 maps tend to enlarge venous structures due to blooming artifacts. By not smoothing the blood compartment mask when computing the final susceptibility map, this effect was not further extended. However, because neighboring tissues have been smoothed into the blood compartment, the blood partial volume in blood vessels is reduced, resulting in a lower susceptibility the smaller the vessel is, mimicking a realistic scenario.

F I G U R E 3
Intermediate stages of the of the creation of the in silico susceptibility head phantom: traditional piece-wise constant approach (A), modulated model as described in Equation 1 with the values presented in Table 1 (B), and finally adding the probabilistic modulation described in Equation 2 and masking regions of error bound R * 2 (C). Green and yellow arrows highlight transitions between tissue types that improved using the probabilistic approach applied to the tissue compartments. Note that the probabilistic smoothing was only applied to the brain tissues; as a result, veins retain shape in the susceptibility map only with a reduced magnetic susceptibility. Blue arrows highlight regions where the large field gradient masking approach was able to avoid abnormally large susceptibility values.

| Data simulation
Spoiled gradient-recalled-echo data can be simulated using the steady-state equation: where 0 is an initial phase distribution originated from the transceiver phase, and Δ is the frequency shift directly attributed to magnetic susceptibility. To simulate S for given TR, TE, and flip angles (α), we used the M 0 , R 1 , and R * 2 maps derived from the MP2RAGEME 49,60 sequence, where 0 (r) is the TE = 0 phase (a manually selected 3D second-order polynomial was used, ensuring 2 phase variation inside the brain region), and the frequency shift, Δω(r), was calculated according to where D(r) is the magnetic field dipole along the z-direction with Lorentzian correction. The convolution was performed in k-space using the formulation proposed in Marques and Bowtell 61 and Salomir et al. 62 To avoid aliasing artifacts associated with the discrete Fourier transform (circular convolution), which would appear as unrealistic background fields, we padded the phantom with zeros along each dimension (factor of 2) before evaluating Equation 4. Such a formulation of the signal equation explicitly neglects chemical shift (associated with spins from, for example, fat) and any chemical exchange effects on the frequency. Furthermore, because no gradient waveform was defined explicitly, image distortion effects were not simulated, and the impact of blood flow on phase data was not accounted for.
A digital phantom enables simulating the MR signal with and without background fields (fields generated by tissues and other sources located outside the brain). The latter effectively mimics "perfect" background-field correction. Using the whole-head susceptibility phantom allowed the creation of realistic background fields. To create a phantom without background fields, referred to as "local field" hereafter, all voxels outside the brain were set to zero, and the susceptibility distribution within the brain was demeaned as follows: Although not pursued for the QSM challenge purposes, field shimming was simulated by fitting the frequency map with second-order and third-order Legendre polynomials.

| Simulation of different acquisition protocols
For the demonstration of the acquisition protocol simulation with the code described in the Supporting Information, we chose two example protocols designed for different applications: • (P1): Optimal for the observation of cortical gray/whitematter contrast in both the magnitude and phase data. In this case, the longest TE was chosen to be close to that of the T * 2 of cortical gray matter (33 ms). 63 ; and • (P2): Optimal for the quantification of deep gray-matter susceptibility. In this case, the longest TE should be at least that of the of the region with the highest iron concentration, which was the globus pallidus in the generated phantom (14 ms).
For the sake of simplicity, both protocols had the same echo spacing of 8 ms as the original volunteer data set. The TR of the acquisition was chosen as short as possible, assuming a readout acquisition window of 8 ms (P1: TR = 16 ms; P2: TR = 40 ms). We neglected dead times associated with phaseencoding preparation, flow compensation, rewinding, RF excitation, saturation, and crushers. The flip angle was chosen at the Ernst angle for the globus pallidus (T 1 = 1100 ms; α = 8) and such that T 1 -weighted contrast on magnitude images was maximized between white matter (T 1 = 1100 ms) and cortical gray matter (T 1 = 1900 ms; α = 23) in protocols P1 and P2, respectively. This resulted in the following protocols: • P1: TE 1 /TE 5 = 4/36 ms; and • P2: TE 1 /TE 2 = 4/12 ms.
We mimicked k-space sampling by cropping the Fourier spectrum of the original 0.65-mm resolution data to an effective spatial resolution of 1 mm isotropic. We applied the same approach to down-sample the ground-truth susceptibility map. In the case of the susceptibility maps, the sharp edges between structures as well as the orders of magnitude-larger susceptibility differences between air/bone and tissue resulted in severe Gibbs ringing artifacts, which were removed using subvoxel shifts. 64 This step was repeated in all three spatial directions. Further processing consisted of spatial unwrapping of echo differences using SEGUE, 65 combination of resulting field maps using the optimum weights 12,41 and, when necessary, removal of background fields using the Laplacian boundary value method. 26

| Quantitative susceptibility mapping reconstruction optimization
To demonstrate the applicability of the current framework for the QSM challenge or for QSM reconstruction optimization purposes, we performed a simulation with only local fields (see Equation 5). The protocol used was that of the QSM challenge (TR = 50 ms; TE 1/2/3/4 = 4/12/20/28 ms; α = 15 23 ). The QSM reconstructions using truncated k-space division, 66 closed form L2, 67 fast algorithm for nonlinear susceptibility inversion, FANSI 68 and iLSQR 69 as implemented in the SEPIA toolbox, 70 were performed with varying regularization parameters. The reconstructions were evaluated using the reconstructions metrics created for the challenge (Table 2). For a more detailed description, see Supporting Information Section 2.

| Adding microstructural effects to the obtained contrast
Microstructural effects are known to affect the observed phase. One of the driving factors of the microstructural effects is white-matter fiber orientation. The provided data and code include a simple first-order approximation of these microstructure effects, which is TE-independent. Wharton et al 22 demonstrated that the typical impact at 7 T for a protocol with TEs of 7 ms and 13 ms was given by where FA norm is the fractional anisotropy divided by 0.59 (the average anisotropy observed in a human optic nerve), and θ is the angle between the white-matter fiber and the static magnetic field. Both of these quantities can be derived from the acquired diffusion data. Such a correction to the frequency shift was applied only within the segmented white-matter mask. Figure 4 shows phantom 1 with the two different sequence parameters created by the proposed simulation toolbox (see Supporting Information Section 1 or data sharing collection for code). The top row shows an example slice of the simulated data from the P1 protocol aimed at computing QSM in cortical gray and white matter. In this case, the longer TE matched the T * 2 of that gray matter, resulting in both significant signal decay in deep gray matter and a large number of phase wraps close to tissue/air boundaries. The flip angle used (23°) was set to increase T 1 contrast in gray versus whitematter boundaries at short TEs, as can be clearly appreciated on the top-left figure. Such information can be used to inform the QSM algorithm regarding expected morphological features. The bottom row shows the images associated with the P2 protocol, aimed at measuring the susceptibility values in deep gray-matter regions. The TE range is smaller, so that the magnitude signal in the globus pallidus has not yet disappeared at the last TE, and the tissue contrast in the magnitude data is considerably weaker because of the smaller flip angle (8°). From the data shown in Figure 4, it can be expected that P1 will benefit more from a morphological informed

Metric name Description nRMSE
Whole-brain RMSE after demeaning (ie, the subtraction of the mean within the mask) rmse_detrend_Tissue Normalized RMSE relative to ground truth (after demeaning and detrending) in gray/white-matter mask. Detrending was performed by compensating the estimated the slope of a linear fit of the reconstructed QSM voxel values against those of the ground truth in the tissue region of interest. The reconstruction was then divided by this factor to ensure that proportionality errors (measured by other metrics such as DeviationFromLinearSlope) do not affect the RMSE calculation rmse_detrend_blood RMSE relative to ground truth (after detrending) using a one-pixel dilated vein mask rmse_detrend_DGM RMSE relative to ground truth (after detrending) in a deep gray-matter mask (substantia nigra and subthalamic nucleus, red nucleus, dentate nucleus, putamen, globus pallidus, and caudate)

DeviationFromLinearSlope
Absolute difference between the slope of the average value of the six deep gray-matter regions versus the prescribed mean value and 1.0

CalcStreak
Estimation of the streaking artifact in a region of interest surrounding the calcification by means of the SD of the difference map between reconstruction and the ground truth. The region of interest was a hollow rectangular prism, with its inner boundary being two voxels away from the edge of the calcification and the outer boundary six voxels away from the inner boundary

CalcMoment
Volumetric susceptibility moment of the reconstructed calcification, compared with the ground truth (computed in the high-resolution phantom to be −49.8 ppm). This metric has been suggested to be more robust in regions of punctuated large susceptibility sources, where there is no signal in the region of the perturber 40 Note: All of the RMSE metrics were multiplied by 100.
reconstruction than P2, and that the use of nonlinear fit 71 for the calculation of the field map will also be particularly relevant, as noise will dominate the later echoes in P1. Figure 5 compares the field map obtained with the P1 simulation from the whole-head phantom to a field map obtained directly from the susceptibility brain phantom using Equation 5 (ground-truth field map). When carrying out the signal simulation with the whole-head phantom, both the base resolution and the 1-mm resolution field maps are dominated by the background components arising from the air/ bone/tissue interfaces, as can be seen in the transverse slice above the sphenoid sinus (first column, indicated by the black arrow). The differences with the ground-truth field map (second column) are to a large extent explained by the quadratic field used to represent the procedure of shimming. Once Laplacian boundary value was applied to the total field to obtain the tissue-specific field contributions (third column), the original resolution field map (top row) showed localized, smoothly varying differences relative to the ground truth, which have been described previously. 14 Both the base resolution and 1-mm resolution at first appear to have very similar properties, yet the normalized RMSE (nRMSE) from the down-sampled data set (bottom row) demonstrated additional deviations from the ground truth (nRMSE was 40% higher than that of the high-resolution data set). These discrepancies are caused both by incomplete background-field removal (see gray arrows) and errors around veins. In such regions, the field map measured from the gradient-echo data naturally deviates from the mean field in that pixel, because of the reduced signal in veins (once partial volume is introduced by the reduced resolution, the field estimation is biased toward the tissue compartments).
To disentangle the effects of background field correction and MRI signal simulation on the deviations observed relative to the ground truth, we repeated the signal simulation with the local fields. In that case, the slowly varying smooth deviations disappeared and the high-resolution phantom did not demonstrate substantial deviations relative to the ground truth. The differences in the field computed at base resolution without background fields (top right panel) is at the numerical precision level, yet the nRMSE is still not negligible (nRMSE = 27) because of the errors present in the calcification region without signal and its immediate surrounding where spatial unwrapping fails.
To further investigate the sources of errors discussed previously, Figure 6 evaluates the phase evolution in three voxels: two in the surrounding of the calcification and one in the white matter. Figure 6C shows that, in the case of a homogeneous tissue region, there is a perfect match between lowresolution and high-resolution phase evolutions as well as the fitted frequency (based on the five-echo simulation) and the ground-truth frequency (computed from the susceptibility map). Figure 6B shows a region closer to the calcification; like in the case for the high-resolution data (light gray lines), in the case of the low-resolution data (dark gray) there is a larger error both with respect to fitted frequency (dashed line) and ground-truth frequency evolution. It is also clear that the phase evolution in the low-resolution data is no longer linear F I G U R E 4 Transverse slices of the simulated data of model 1 using protocol 1 (top row) and protocol 2 (bottom row). First two columns show magnitude and phase images, respectively, at the first echo time (where the different T 1 -weighting is clearly visible) and the last two columns show the magnitude and phase images associated with the last echo time of the respective protocol ( Figure 6B), as predicted from theory, due to partial-volume effects and the varying intravoxel frequency gradients. 46 In a pixel in the vicinity of the calcification, the errors are further enhanced for the high-resolution data, where unwrapping errors can introduce errors on the fitted frequency (the data are still fitted accordingly, but do not correspond to the groundtruth field). Figure 6E shows the mean squared difference map associated with the frequency fit on the low-resolution data; the errors are are predominantly found in regions of rapidly changing magnetic fields, around the calcification and close to tissue/air/bone interfaces and surrounding vessels. Figure 7 shows the reconstructions with minimum nRMSE for the four algorithms tested. It appears that the direct methods (truncated k-space division and closed-form L2) still have some broad streaking artifacts in regions surrounding both the calcification and deep gray-matter regions, whereas in the iterative methods these were reduced. It is interesting to note that the total variation regularized nature of the FANSI clearly contributed to a better reconstruction of the superior cerebellar vein when compared with the iLSQR (see the Supporting Information for the performance on remaining metrics and the QSM 2.0 challenge report, where these artifacts were addressed by more thoroughly optimized algorithms). 23 Figure 8 shows an example of the originally acquired and the simulated magnitude and phase data, respectively. The magnitude data (first column) shows similar high-resolution features at the matched TE, despite the acquisition protocols not being identical (the simulations only support multi-echo gradient-echo acquisitions rather than MP2RAGEME, as used for data acquisition). It can be visually observed that the simulated data suffer from reduced bias field inhomogeneity; this is a result of a bias field correction applied to the computed M 0 map obtained from the MP2RAGEME. This choice was justified by two factors. First, the magnitude bias field observed after SENSE 72 reconstruction does not reflect the local SNR but a mix of the volume and surface F I G U R E 5 Transverse slices through derived field maps associated with protocol 1 data computed at the base resolution (top row) and after down-sampling to 1 mm (bottom row); black arrows highlight large background fields induced by air-tissue interfaces. The first and fifth columns show the field extracted from the complex signal when the whole-head and brain-only models were used, respectively, to compute the frequency shift. The third column shows the local field computed after background-field removal. The second, fourth, and sixth columns show the differences relative to the corresponding ground-truth field distributions. Gray arrows highlight the incomplete background-field removal that is exacerbated following down-sampling. The ground-truth field maps were computed using the forward dipole formulation (Equation 4) on the whole head (second column) and brain tissues alone (fourth and sixth columns) susceptibility models. It is clear that the normalized RMSE error (nRMSE), once the down-sampling is performed, is dominated by partial-volume effects in and around veins (noise pattern on the bottom of both the fourth and sixth columns) and imperfect background-field removal | 537

| Comparison of simulations including microstructure to real data
coil sensitivities as well as the transmit coil inhomogeneities. Second, we wanted to separate the physical simulation from the interaction with the hardware. In the nonbackground field-phase data, the background field associated with nasal sinus and ear canals is weaker on the simulated data than on the measured data (as can be appreciated by the larger number of phase wraps on the latter). This can be attributed to two features: the imperfect bone/air segmentation or an underestimation of the susceptibility differences between tissue and bone or air. The latter is also supported by observing in the third column (after background field removal) that the field surrounding the calcification is smaller in the simulations than that what is observed in the experimental data.
When comparing gray/white-matter contrast on both the phase data (second column) and tissue frequency (third column), the simulation appears to visually approximate the acquired data better when the microstructural correction term (middle row) is added to the simulated data, as expressed in Equation 6. These data can now be used to test how different reconstruction pipelines are biased due to microstructural effects. Note that, because our susceptibility phantom is based on reported values of susceptibility rather than this particular subject susceptibility values, no quantitative evaluation of the similarity can be performed.

| DISCUSSION
In this paper and the accompanying shared data set and code (described in greater detail in the Supporting Information), we have presented and disseminated a realistic human brain phantom that can be used by both the QSM and the MRI communities to simulate multi-echo gradient-echo data and evaluate QSM pipelines in a controlled manner.
The data and code provided allow users to: • Create new susceptibility phantoms with different levels of spatial modulation for each compartment; this can be performed by simply changing the values presented in Table 1 that control both the mean value and spatial modulation present within each tissue; • Create realistic gradient-echo multi-echo data at 7 T and, to some extent, at other fields (it should be noted that, unlike susceptibility, relaxation times are field-dependent and their field dependence is tissue-dependent); • Assess the effect of protocol changes (as well as nuisance factors such as RF phase, B 0 shimming, and noise) on the quality of the obtained QSM maps; and • Assess the effect of changing some of the background field removal and QSM algorithm-specific options while having a ground truth to test it against. This digital phantom will be important for QSM users and researchers deciding on acquisition protocols. Protocol considerations such as effect of the number of TEs and their range, as well as the degree of T 1 weighting and resolution on the ability to accurately measure QSM in a given brain region, can be quickly tested in this framework. This would allow us to extend the analysis done by Karsa et al 73 on the effects of FOV and anisotropic voxels sizes, or the analysis by Biondetti et al on the effects of Laplacian-based single echo versus multiple-echo techniques. 74 Although the simulation framework is useful for optimizing protocols, the simulation in its current form is a static one. Consequently, flow artifacts (which build up when a large number of echoes are used), respiration-related B 0 fluctuations, 75 and spatial distortions associated with the readout bandwidth are not considered. The latter two would be relatively straightforward to implement. Spatial distortions associated with different readouts can be obtained in a computationally efficient manner by using our provided phantom data in, for example, freely available software packages such as JEMRIS (http://www.jemris. org/). 76 Wave controlled aliasing in parallel imaging 77 and 3D EPI 78,79 have been used for QSM, but not much research has been done to quantify the effect of their blurring or distortions on the performance of either the background field removal or the susceptibility maps. Respiration artifacts would simply require a library of respiration fields over time, as acquired with field cameras. 80 Flow artifacts would be more complex to simulate, because the current vessel segmentation does not distinguish arteries and veins, and it would be difficult to have a local flow velocity and pulsatility estimation.
The dipole model used in QSM assumes a sphere of Lorentz approximation, 61 which does not hold true, particularly in white matter. The phantom released for RC2 purposes explicitly circumvented this limitation by ensuring perfect consistency with the QSM model in Equation 4 (ie, no microstructural effects were present). With the released phantom data set, we include diffusion data (both raw data on the 1.5mm space and its derivatives co-registered to the phantom space after distortion correction). These data can be used to compute the frequency perturbation, as shown in Figure 8, or the hollow cylinder model can be used to explicitly introduce TE-varying perturbation, similarly to what has recently been done in the context of myelin water imaging, 22,[81][82][83][84] and study the bias introduced by these effects on the reconstructed QSM maps. A critical challenge for more advanced modeling is the resolution of the diffusion acquisition, despite using state-of-the-art hardware and MR sequences. Here, we have simply interpolated our 1.5-mm DWI to the anatomical space and expect this to be sufficient to develop and validate QSM methods that account for microstructural effects in white matter. 85 Quantitative susceptibility mapping is gaining interest in the context of neurological disorders such as multiple sclerosis, Parkinson's disease, and Alzheimer's disease, and other clinical applications such as hemorrhages or tumor imaging with iron oxide nanoparticles. 86 For the latter applications, relaxation and susceptibility values in the form of, for example, lesions in strategic locations can be added to the current phantom. Simulated data might also be relevant in the case of group studies of diseases, in which differences in the deep gray-matter nuclei were found 87,88 and the optimum QSM reconstruction parameters might improve the limits to detect those changes. Such a question can be addressed with the digital phantom by changing the parameters of Equation 1 for a F I G U R E 7 Examples of the optimum nRMSE reconstructions in three orthogonal planes obtained from the 1-mm data set based on SIM2 (see Figure 2) with peak SNR = 100 used for the QSM 2.0 challenge. The sagittal and coronal slices were chosen to cross the calcification region, to highlight remaining streaking artifacts. The four different rows correspond to the four different reconstruction pipelines tested. Abbreviations: FANSI, fast algorithm for nonlinear susceptibility inversion; iLSQR, improved least squares algorithm; TKD, truncated k-space division | 539 MARQUES Et Al.
given set of structures and find the QSM pipeline that better quantifies those changes.

| CONCLUSIONS
The presented realistic and modular phantom aims to enable researchers to optimize reconstruction as well as acquisition parameters. As such, the phantom served as a ground truth for the QSM RC2. Its modular design allows us to add microstructure effects a posteriori, 22 as well as include new nuisances such as hemorrhages or fine vessels with realistic relaxation and susceptibility properties. We foresee that this brain model will be an important tool for the evaluation of various processes associated with QSM processing and interpretation.
F I G U R E 8 Effect of adding a microstructure correction term to the simulated field map. The first and second columns show the simulated magnitude and phase data using the whole-brain susceptibility model at TE = 20 ms, whereas the third column shows the computed tissuefrequency map (after brain masking and background-field removal). The first row shows the simulated data using protocol 1 without the addition of the microstructure effects. The second row shows the effect of adding the microstructural effects, as described in Equation 6 (first column), both on the third TE and the computed tissue-frequency map. The third row shows, for visual comparison purposes, the experimental data acquired at 7 T