Quantitative neuroimaging measures of myelin in the healthy brain and in multiple sclerosis

Abstract Quantitative magnetic resonance imaging (MRI) techniques have been developed as imaging biomarkers, aiming to improve the specificity of MRI to underlying pathology compared to conventional weighted MRI. For assessing the integrity of white matter (WM), myelin, in particular, several techniques have been proposed and investigated individually. However, comparisons between these methods are lacking. In this study, we compared four established myelin‐sensitive MRI techniques in 56 patients with relapsing–remitting multiple sclerosis (MS) and 38 healthy controls. We used T2‐relaxation with combined GRadient And Spin Echoes (GRASE) to measure myelin water fraction (MWF‐G), multi‐component driven equilibrium single pulse observation of T1 and T2 (mcDESPOT) to measure MWF‐D, magnetization‐transfer imaging to measure magnetization‐transfer ratio (MTR), and T1 relaxation to measure quantitative T1 (qT1). Using voxelwise Spearman correlations, we tested the correspondence of methods throughout the brain. All four methods showed associations that varied across tissue types; the highest correlations were found between MWF‐D and qT1 (median ρ across tissue classes 0.8) and MWF‐G and MWF‐D (median ρ = 0.59). In eight WM tracts, all measures showed differences (p < 0.05) between MS normal‐appearing WM and healthy control WM, with qT1 showing the highest number of different regions (8), followed by MWF‐D and MTR (6), and MWF‐G (n = 4). Comparing the methods in terms of their statistical sensitivity to MS lesions in WM, MWF‐D demonstrated the best accuracy (p < 0.05, after multiple comparison correction). To aid future power analysis, we provide the average and standard deviation volumes of the four techniques, estimated from the healthy control sample.


| INTRODUCTION
Quantitative magnetic resonance imaging (MRI) techniques that are used as markers of white matter (WM) and myelin content include magnetic resonance relaxation-based and magnetization transferbased methods. Both approaches provide WM contrast related to myelin, but are predicated on different assumptions and models, and may not directly correspond to each other (Alexander et al., 2011;De Santis, Drakesmith, Bells, Assaf, & Jones, 2014). Diseases of WM, and multiple sclerosis (MS) in particular have motivated the search for quantitative and specific MRI measures of myelin content as measures of treatment response and remission (Filippi et al., 2012). Although conventional clinical MRI protocols for monitoring MS include qualitative T 1 and T 2 weighted images, these lack specificity to the underlying pathology and are difficult to compare across imaging sites and scanners (Barkhof, Calabresi, Miller, & Reingold, 2009). Quantitative MRI can supplement standard clinical imaging by providing detection of changes within normal appearing tissue, as well as biological characterization of these alterations (MacKay and Laule, 2016).
Multicomponent relaxation-based techniques model tissue water compartments by observing the difference in relaxation times. For example, water trapped between myelin bilayers and the water inside or outside of axons have different T 1 and T 2 relaxation times, and by measuring the relative signal contribution from each component, a measure such as the myelin water fraction (MWF) can be obtained (Deoni, Rutt, Arun, Pierpaoli, & Jones, 2008;MacKay et al., 2006). The MWF is positively correlated with myelin content, and therefore lower MWF is thought to reflect a decrease in myelin content (Laule et al., 2008). Quantification of the macromolecular content as a proxy for myelin can be performed using the magnetization transfer ratio (MTR; Filippi et al., 1995;Schmierer, Scaravilli, Altmann, Barker, & Miller, 2004), the more advanced quantitative magnetization transfer (qMT; Schmierer et al., 2007), or the macromolecular tissue volume (MTV; Mezer et al., 2013). A decrease in MTR is commonly interpreted as a decrease of myelin. T 1 -relaxation time (commonly referred to as quantitative T 1 , qT 1 ) is also found in the literature as a quantitative biomarker for pathology (Margaret Cheng, Stikov, Ghugre, & Wright, 2012), with increased qT 1 associated with myelin loss.
Many of these techniques have been investigated in MS (Harrison et al., 2013;Kitzler et al., 2012;Kolind et al., 2012;Ontaneda, Thompson, Fox, & Cohen, 2016). For multi-echo techniques, Laule et al. (2006Laule et al. ( , 2008 demonstrated strong associations across tissue types between histological staining of myelin and the T 2 -relaxation-based MWF obtained with a multi-echo spin echo sequence. The acquisition can be achieved in clinically feasible times using a combined GRadient And Spin Echo (GRASE) sequence ; the MWF measurement obtained using a GRASE is here termed MWF-G. An alternative, and more recently, multicomponent relaxation technique is mcDESPOT, from which the fraction of signal attributed to myelin water (here called MWF-D), similar to MWF-G, can be obtained. To date, there is no human ex vivo histology validation of mcDESPOT.
However, Hurley et al. (2010) used the shaking pup preclinical model of dysmyelination and showed that MWF-D was sensitive, at the very least, to an absence of myelin. Wood et al. (2016) performed whole-brain mcDESPOT in a cuprizone mouse model and found a decrease in MWF-D, related to the expected demyelination. MTR has been validated against postmortem tissue using histopathology, showing good sensitivity to myelin and demyelination (Schmierer et al., 2004). However, since MTR measures the interaction between macromolecular protons and bulk water, it is also related to the total water content of tissue and therefore, in cases of WM injury, edema, and inflammation can obscure the myelin-related signal (Gareau, Rutt, Karlik, & Mitchell, 2000;Vavasour, Laule, Li, Traboulsee, & MacKay, 2011).
In addition, some postmortem studies have found an even larger association between MTR and axonal count (Mottershead et al., 2003;Van Waesberghe et al., 1999). Single component measures of qT 1 have also been shown to correlate strongly with histological staining for myelin (Schmierer et al., 2007). However, interpretations of changes in T 1 are confounded by the known relationship between 1/T 1 and 1/water content (Fatouros, Marmarou, Kraft, Inao, & Schwarz, 1991;Gelman, Ewing, Gorell, Spickler, & Solomon, 2001;Kamman, Go, Brouwer, & Berendsen, 1988;Rooney et al., 2007). Thus, it is difficult to deduce whether changes in T 1 are due to demyelination and/or changes in total water content.
The common feature of most of these validation studies is that they have shown that the quantitative image contrast is similar between myelin-sensitive techniques, though they do not correspond only to the amount of myelin present. Even histological stains, such as luxol fast blue, provide a contrast; the optical density of these contrasts does not in itself provide a quantification of tissue content (Stüber et al., 2014). In general, strong correlations across tissue types (in the brain typically white and gray matter) indicate that two methods have similar tissue contrast, an important precondition.
However, the level of consistency between quantities from the different methods is unclear.
In this study, we investigated the cross-subject similarity between four prominent quantitative MRI techniques that are associated with WM content: Multi-echo T 2 relaxation, mcDESPOT, MTR, and qT 1 .  Table 1). This subset of 24 was used when comparing imaging methods in the MS group only.

This study was approved by the University of British Columbia Clinical
Research Ethics Board and all subjects provided written informed consent before participating in the study.

| Quantitative image processing
The signal decay curve obtained by the GRASE T 2 relaxation sequence was modeled by multiple exponential components and the T 2 distribution was estimated using nonnegative least squares with the extended phase graph algorithm as well as spatial regularization (   n/a n/a n/a n/a 2,500 n/a n/a  to the total area under the distribution. From the mcDESPOT data, the MWF-D and qT 1 maps were calculated using the processing methods outlined in Deoni and Kolind (2015); briefly, the SPGR and IR-SPGR scans were used for DESPOT1 with High-speed Incorporation of RF Field Inhomogeneities (DESPOT1-HIFI) analysis (Deoni, 2007), resulting in maps of the global T1 and the B1 field. The bSSFP data (acquired with two phase-cycling schemes) and the global T1 and B1 maps were then used to calculate global T2 and B0 field maps using DESPOT2 with full modeling (DESPOT2-FM) analysis (Deoni, 2009). Finally, the B0 and B1 maps combined with the SPGR and bSSFP (with both phase-cycling schemes) were used to calculate the MWF using stochastic region contraction (Deoni & Kolind, 2015).

| Registration to template space
To provide a representative common space for the datasets collected here, a study-specific template was constructed, from the high flip angle (18 )  Finally, using the combined registrations, the MWF-G, MWF-D, qT 1 , and MTR quantitative maps were registered and resampled into 1 mm isotropic MNI space in a single linear interpolation step. In addition, for the patient sample, the lesion masks (which are described below) were registered and resampled into MNI space using nearest neighbor interpolation. The registration pipeline is outlined in Figure 1.

| Delineation of lesion and regions of interest
Lesions were identified on the T 2 w, PDw, and FLAIR for each subject by an experienced radiologist, placing one or more seed points where a lesion was identified. Segmentation was then performed automatically using the provided seed points according to the method described by Tam, Traboulsee, Riddehough, Sheikhzadeh, and Li (2011) and McAusland et al. (2010). In addition to the semi-automatically defined lesions masks, additional masks of the boundary of each lesion (2 mm dilation of the semi-automatically defined lesion) were created and are referred to here as peri-lesional tissue.
Tissue masks were derived from the Harvard-Oxford atlas distributed as part of the FSL package (https://fsl.fmrib.ox.ac.uk/fsl). Three masks were created, one for gray matter, one for WM, and one encompassing subcortical structures (amygdala, hippocampus, basal ganglia, and thalamus; Figure 3). Anatomical regions of interest in WM were defined in standard space using the JHU probabilistic tractography atlas (Hua et al., 2008). These regions were then combined with the group WM tissue mask to ensure only WM was included. Eight major WM bundles were investigated: anterior thalamic radiation (ATR), cingulum,

| Summary and correlation maps
A group average quantitative map and a standard deviation (SD) map were created for each modality from the healthy control group.
Spearman correlation maps between the different quantitative values at every voxel were calculated for the whole brain using 3dTcorrelate, part of the AFNI package (Cox, 1996). Correlation maps and histograms (per tissue class) of the correlation coefficients were produced.

| Receiver operator characteristic maps
For all quantitative data sets available in the patient population, individual-patient voxelwise difference maps were created by Z-normalizing the patient's parameter maps to the mean and SD of the healthy control sample. In the 24 patients for whom all four quantitative sequences were successfully acquired (see Table 1 for characteristics), receiver operator characteristic (ROC) curves were constructed for each individual dataset, testing the accuracy of each MRI method for lesional WM detection (Fawcett, 2006). These curves were generated by using a range of Z-score cutoffs between the minimum and maximum Z-score and comparing whether each voxel was inside or outside of the Z-score cutoff and whether the same voxel was included or not included within the semi-automated lesion masks. For every threshold, the proportion of voxels outside of the Z-score cutoff (considered to be lesional from the advanced MR technique) that were also inside the semi-automated lesion mask is equal to the true positive rate. The percentage of voxels above the Z-score cutoff but outside of the semi-automated lesion mask is equal to the false positive rate. To compare the sensitivity/specificity of the four methods to manually defined lesions, the area under the curve (AUC) of ROCs from each dataset was calculated for each subject and compared pairwise across all methods using a nonparametric Friedman test. Post hoc inter-method comparisons were tested using Wilcoxon tests (Demšar, 2006), corrected for multiple comparisons using the Bonferroni-Holm step-down method.

| RESULTS
Sample mean and SD maps of the 38 healthy control datasets are illustrated in Figure 2. In all four methods, the SD values are lower in WM than in gray matter, reflecting the fact that these are healthy adult controls (so no large deviations in WM content are expected). All analyses comparing methods or including the patient group were

| Correlations between techniques
To assess the similarity between methods at capturing individual differences, voxelwise Spearman rank correlations between parameters are demonstrated in Figure 3  Correlations across tissue classes (gray matter, WM, and subcortical gray structures) are reported in Table 3. These are reported separately for the healthy control population (n = 38) and the subset of patients for whom all four modalities were available (n = 24). The correlation coefficients (ρ) are summarized in Table 3. The most consistent correlations across all tissue classes were found between MWF-D and qT 1 (median ρ across tissue classes 0.80, mean ρ = 0.77) and MWF-G and MWF-D (median ρ = 0.59, mean ρ = 0.59). In lesional tissue, for the 24 patients for whom all quantitative modalities were available, qMRI estimates covaried between all pairs of methods.

| MS and healthy control comparisons
Violin plots were created for each tissue class (gray matter, WM, and subcortical structures) in the patient and control groups (Figure 4a), indicating the spread of values for each region. In these analyses, all available data in the patient group was used (n = 24 for MTR, and n = 56 for the other methods). For all methods, significant differences    Table 1 for subject demographics).

| AUC analysis
To quantify the relative sensitivity and specificity of the four quantitative MR techniques for detecting lesions, we calculated ROC curves for each individual patient. True positives were defined using the lesion masks, identified semi-automatically with radiologist supervi- to both qT 1 (p = 0.014) and MTR (p = 0.024). There was no significant difference between qT 1 and MTR (p = 0.92). See Figure 6 for a plot of the ROC curves for each modality averaged over subjects. In this plot, each false discovery rate is fixed at the average (Fawcett, 2006).

sion. A nonparametric Friedman test indicated that the AUCs for
Importantly, the range of AUC values for MTR, qT 1 , and MWF-G were only slightly (albeit significantly and consistently) different (see Figure 6 for average ROCs at fixed false positive rate thresholds).

| Sample quantitative imaging parameter summary maps and brain coverage
In place of estimating statistical power for a limited amount of study designs/effect sizes, we include group average mean and SD maps of each parameter for the healthy control group (Figure 2) and provided as files in nifti format on neurovault (https://neurovault.org/ collections/4709/) and at https://www.msmri.com/. These summary statistics were calculated in MNI space.

| DISCUSSION
In measure is specific to myelin, it should also show close correspondence to T 1 in healthy tissue (Stüber et al., 2014).
However, in the MS brain specifically, the combination of increased water content, demyelination, and tissue iron deposition may disrupt this relationship. In the extreme case of completely demyelinated tissue, the inverse relationship between myelin and T 1 would break down as the MWF maps will approach zero. Correlation coefficients between MWF-D and MWF-G indicated a moderate correlation in most tissue types. In fact, after MWF-D and qT1, MWF-D and MWF-G were the pairs of measures significantly correlated across the most tissue classes.
We observed weak, nonsignificant correlations in most tissue classes between MWF-D and MTR. Similar to all other inter-method comparisons, the correlation was significant in lesional tissue, indicating that although the methods may be measuring different quantities in healthy tissue, they are similarly sensitive in MS lesions.
There is a more established history in quantifying the similarity between MWF-G, MTR, and qT 1 . Previous results have demonstrated a lack of correlation between MWF-G and MTR in nonlesional WM (Vavasour et al., 2011). In an experimental autoimmune encephalomyelitis (EAE) guinea pig model, Gareau et al. (2000) suggested MTR was sensitive to inflammatory-related changes to the structure of WM whereas MWF-G was related to myelin content itself. They detected no relationship between MTR and MWF-G. Here, we detected a positive relationship on average in healthy control WM and in lesional tissue in MS patients. In the MS patient group, NAWM did have a positive though nonsignificant relationship indicating here in patients (n = 24), and maybe the Gareau investigation (n = 24 as well), there may simply have been insufficient samples to detect weak relationships. Several in vivo studies suggest that MTR is most strongly influenced by water content (Fox et al., 2005;Giacomini et al., 2009;Vavasour et al., 2011;Van Waesberghe et al., 1997). In the present study, we find weak correlations between MTR and qT 1 (which is strongly linked to water content) in WM, but similar to previous studies, we find a strong correlation in lesions, most likely due to an increase in water content.  In addition to characteristic focal lesions, there are diffuse abnormalities in the MS brain and notably: differences in brain volume (Stefano et al., 2010); increased edema (Filippi et al., 2012) and axonal swelling (Fisher et al., 2007); and diffuse changes in WM, as observed in the present study (Figures 4 and 5). We observed moderate global differences in the myelin-related measures between controls and our MS cohort (Figure 4), similarly to previous studies (Kitzler et al., 2012;Faizy et al., 2016;Kolind et al., 2012;Oh, Han, Lee, Nelson, & Pelletier, 2007;Cercignani, Bozzali, Iannucci, Comi, & Filippi, 2001;Vavasour et al., 1998). When looking at smaller WM tracts, all MR measurements in the present study showed differences between subjects with MS and healthy controls in at least some regions ( Figure 5).
Comparing MS to controls, qT1 showed the highest number of significantly different regions (8), followed by MWF-D and MTR (6), and then MWF-G (4). These alterations in normal-appearing tissue may reflect diffuse alterations in the WM. Multi-component relaxation especially may be sensitive to subtle lesions that may be missed by radiological inspection (Laule et al., 2004). Additionally, cortical lesions may have downstream functional or structural effects on connected gray and WM tissue (Siffrin, Vogt, Radbruch, Nitsch, & Zipp, 2010).
These nonlesional pathological features are likely represented in the performance curves of the modalities for lesion detection by apparently false positives ( Figure 6). Cortical lesions are notoriously difficult to detect with both conventional and quantitative MRI (Calabrese & Castellaro, 2017), although with specific sequences, high field imaging, and ideally retrospective histopathological knowledge of lesion location, myelin content of cortical lesions can be investigated by mapping the lesions detected with these advanced techniques onto myelinsensitive MRI (Jonkman et al., 2016)] maps; unfortunately, none of these methods were available for this study.
One crucial aspect of individual differences relevant here is the effect of age on all of the quantities. Changes in WM tissue content occur throughout the lifespan (Callaghan et al., 2014) and, if investigating single patients, this change needs to be taken into account. In this study, we used an average and SD taken from a sample of healthy individuals with a wide age-range, from 20 to 52 years. This is not ideal. MS can occur at any point throughout the lifespan and therefore detecting lesions or abnormalities needs to be performed in the context of the maturation of the brain at the time of the scan. What is still missing is a measure of each quantity appropriate to age, akin to a growth curve, though this has been developed in younger cohorts (Dean III et al., 2014;Sadeghi et al., 2013).
All quantitative MR measurements showed differences between lesions and NAWM. Compared to radiologist-detected lesions, MWF-D had the best performance in detecting these lesions (Figure 6), followed by qT1 and MTR. Good performance by qT1, in particular, is predictable since the presence of lesions on PD/T 2 -weighted images is due to their increase in water content, and qT 1 is greatly influenced by water con- ROC curves are used here as an indicator of relative sensitivity and specificity of the MRI methods, but their values can be misleading when the true positive/true negative ratio is very skewed, as is the case here. Lesion volumes were, on average, about 0.5% of the total WM volume mask used in this analysis. With these proportions, a 1% false positive rate could have twice as many false positive voxels as there are lesional voxels, even with perfect sensitivity (100 true positive rates). To improve on this, many studies have combined multiple image modalities, both qualitative and quantitative, to produce more accurate lesion segmentations (Brosch et al., 2016;Lladó et al., 2012;Mah, Jager, Kennard, Husain, & Nachev, 2014). Notably, combined analysis of MWF-G and MTR over time has shown promise in distinguishing tissue injury and remyelination (McCreary et al., 2009).
ROC curves also miss another important factor in profiling tissue biomarkers in MS. Changes in both myelin and water content are known to occur (Laule et al., 2004), reflecting different aspects of the pathological processes. To accurately separate these features, any given technique should preferably be sensitive to one or the other, to distinguish which process is taking place. From this point of view, it is apparent here that qT 1 is very sensitive to tissue changes (e.g., larger z scores than any other method in Figures 4 and 5) but not specific to which change is occurring in the tissue, for example, change in water content or myelin, reflected in the lower area under the curve in Figure 6 compared to MWF-D. With the advantages and disadvantages of each technique, it is also necessary to consider the relative performance of each technique with regards to the acquisition time.
The voxel volume for all sequences was~5 mm 3 , with isotropic voxels and whole-head coverage for mcDESPOT, whereas GRASE and MTR data were collected with 5 mm thick slices and limited coverage. With respect to time, GRASE has a disadvantage compared to mcDESPOT and MTR. GRASE is fundamentally limited by the need to acquire a sufficient number of data points in the T 2 -decay, here 32 points going out to 320 ms, and a sufficiently long TR, in this study TR = 1,000 ms, to avoid T 1 -weighting confounds, although recent developments may allow faster GRASE acquisition (Chen, Majumdar, & Kozlowski, 2014;Zhang et al., 2015). Sensitivity to tissue-specific changes as well as acquisition time should be taken into account when selecting the appropriate quantitative technique to use in a study.