Comparing MRI metrics to quantify white matter microstructural damage in multiple sclerosis

Abstract Quantifying white matter damage in vivo is becoming increasingly important for investigating the effects of neuroprotective and repair strategies in multiple sclerosis (MS). While various approaches are available, the relationship between MRI‐based metrics of white matter microstructure in the disease, that is, to what extent the metrics provide complementary versus redundant information, remains largely unexplored. We obtained four microstructural metrics from 123 MS patients: fractional anisotropy (FA), radial diffusivity (RD), myelin water fraction (MWF), and magnetisation transfer ratio (MTR). Coregistration of maps of these four indices allowed quantification of microstructural damage through voxel‐wise damage scores relative to healthy tissue, as assessed in a group of 27 controls. We considered three white matter tissue‐states, which were expected to vary in microstructural damage: normal appearing white matter (NAWM), T2‐weighted hyperintense lesional tissue without T1‐weighted hypointensity (T2L), and T1‐weighted hypointense lesional tissue with corresponding T2‐weighted hyperintensity (T1L). All MRI indices suggested significant damage in all three tissue‐states, the greatest damage being in T1L. The correlations between indices ranged from r = 0.18 to r = 0.87. MWF was most sensitive when differentiating T2L from NAWM, while MTR was most sensitive when differentiating T1L from NAWM and from T2L. Combining the four metrics into one, through a principal component analysis, did not yield a measure more sensitive to damage than any single measure. Our findings suggest that the metrics are (at least partially) correlated with each other, but sensitive to the different aspects of pathology. Leveraging these differences could be beneficial in clinical trials testing the effects of therapeutic interventions.

In MS, white matter microstructure can be compromised by both demyelination and axonal loss. While axonal loss can result directly from inflammation within lesions, it can also occur as a consequence of demyelination outside lesions (Bitsch, Schuchardt, Bunkowski, Kuhlmann, & Bru, 2000;Trapp et al., 1998;Trapp & Stys, 2009). Therefore, the promotion of repair through remyelination and neuroprotection represents the two major goals of novel therapeutic interventions in MS. To improve the characterisation of damage and quantification of repair, non-invasive MRI-based methods to accurately measure microstructural damage are crucial. A variety of MRI-based approaches that probe various physical properties of the tissue have been shown to be sensitive to demyelination and axonal loss in MS (Filippi, Preziosa, & Rocca, 2017;Mallik, Samson, Wheeler-Kingshott, & Miller, 2014).
Membranes and myelin sheaths of the axons collectively hinder diffusion, leading to anisotropic diffusion which, in DTI, is represented by a tensor. Parameters extracted from DTI, such as fractional anisotropy (FA), radial diffusivity (RD), and axial diffusivity (AD), correlate with histological measures of myelination as well as with axonal density (Chang et al., 2017;Klawiter et al., 2011;Moll et al., 2011;Mottershead et al., 2003;Schmierer et al., 2007Schmierer et al., , 2008. Even though these metrics are sensitive to changes in myelination, diffusion MRI itself is an indirect approach to measure demyelination. Water trapped in the myelin sheaths has a very short T2 relaxation time (Mackay et al., 1994) and does not contribute to the signal measured in diffusion-weighted sequences because of their comparatively long echo time. Also, myelin is not a prerequisite for diffusion anisotropy (Beaulieu, 2002).
More direct approaches to assessing myelination are myelin water imaging and magnetisation transfer (MT) imaging. Myelin water imaging exploits the signal derived from water that is trapped within myelin sheaths and that has a short T2 relaxation time (between 10 and 55 ms) compared to water in other tissue compartments (Mackay et al., 1994). By making use of a multi-echo- (Mackay et al., 1994) or multi-flip-angle strategy , the relative contribution to the signal from this T2 species can be calculated and is referred to as myelin water fraction (MWF). By contrast, MT imaging makes use of the MT that happens between macromolecules, such as those found in myelin, and surrounding tissue water, when the macromolecular protons are subjected to an off-resonance radio-frequency (RF) pulse that selectively saturates the macromolecular pool of protons. Both MWF Laule, Leung, Traboulsee, Paty, & Mackay, 2006;Schmierer, Scaravilli, Altmann, Barker, & Miller, 2004) and magnetisation transfer ratio (MTR) (Gareau, Rutt, Karlik, & Mitchell, 2000;Moll et al., 2011;Mottershead et al., 2003;Schmierer et al., 2008) correlate with the histological markers of myelin.
Although these MRI methods are considered valid approaches for assessing microstructural integrity, the relationship between their metrics, that is, to what extent the metrics provide complementary versus redundant information, remains largely unexplored. In healthy volunteers, significant correlations between diffusion-based metrics and MWF have been reported (De Santis, Drakesmith, Bells, Assaf, & Jones, 2014), but these depend on the brain region investigated and the health of the tissue (Bells, Morris, & Vidarsson, 2007; Drabycz, Kolind, Whittall, & Mackay, 2008). Kolind et al. (2008) found low correlations between diffusion-based metrics and MWF in MS lesions. Also, correlations between MWF and MT-derived metrics are low or inconsistent between various types of tissue (Underhill, Yuan, & Yarnykh, 2009;Vavasour, Laule, Li, Traboulsee, & MacKay, 2011).
One challenge when comparing DTI-based metrics with other metrics is that the diffusion tensor is heavily influenced by the underlying fibre architecture (De Santis et al., 2014;Pierpaoli, Chiro, Basser, & Trace, 1996). For example, in two voxels with identical axonal density and myelin content, DTI-derived metrics may disagree, if one voxel lies in a region with one predominant fibre orientation, while the other voxel lies in a region containing several crossing fibre populations. In this study, we considered the potential spatial heterogeneity of the relationship between white matter MRI metrics and pathology. We z-transformed the microstructural parameters derived from MS patients to the average parameter of healthy controls in the same location. The resulting voxel-wise damage scores represent the scaled versions of the original microstructural parameters, which are independent of the underlying fibre architecture. We aimed to explore the relationship between four microstructural metrics (FA, RD, MTR, and MWF) that are frequently used in clinical studies of MS patients.
Microstructural damage estimates from the four metrics were considered for three MS tissue-states: (a) normal appearing white matter (NAWM), with the least expected damage; (b) lesional tissue that only appears hyperintense on a T2-weighted scan; and (c) lesional tissue that also shows T1 hypointensity and is expected to have the strongest underlying microstructural damage (Moll et al., 2011;Sahraian, Radue, Haller, & Kappos, 2010;Van Waesberghe et al., 1999;Van Walderveen et al., 1998). Additionally, we investigated whether exploiting the covariance between the metrics (Mangeat, Govindarajan, Mainero, & Cohen-Adad, 2015) to generate a composite metric could lead to higher sensitivity to tissue damage than any of the individual metrics.

| Participants
We collected demographic, clinical, and MRI data from self-reported right-handed MS patients, who fulfilled the following eligibility criteria: between 18 and 60 years of age, no relapse or change in treatment for at least three months before study entry, and no other neurological or psychiatric conditions. Patients with relapsing or progressive (Lublin et al., 2014) MS (Polman et al., 2011) were recruited through the Helen Durham Centre for Neuroinflammation at the University Hospital of Wales. We used the Expanded Disability Status Scale (EDSS, Kurtzke (1983)) score and measures from the MS functional composite (MSFC, Cutter et al. (1999)) to characterise disease severity. A subsequent follow-up, four weeks later, ensured that patients had been in a period of clinical stability during the experiment. We also recruited healthy controls to undergo the same MRI protocol.
The study was approved by the NHS South-West Ethics and the Cardiff and Vale University Health Board R&D committees. All participants provided written informed consent.

| MRI acquisition
MRI data were acquired on a 3T General Electric HDx MRI system (GE Medical Systems, Milwaukee, WI) using an eight channel receiveonly head RF coil (GE Medical Devices, Milwaukee, WI). The following MRI sequences were acquired: a T2/proton-density (PD)-weighted sequence and a fluid-attenuated inversion recovery (FLAIR) sequence for identification and segmentation of T2-hyperintense MS lesions; a highresolution T1-weighted sequence for identification of T1-hypointense MS lesions and for registration; a twice refocused diffusion-weighted sequence (40 uniformly distributed directions, b = 1200 s/mm 2 ), a 3D MT sequence and mcDESPOT sequences (Deoni, Rutt, Arun, et al., 2008) to obtain microstructure-sensitive parameter maps. The acquisition parameters of all scan sequences are reported in Table 1. 2.3 | T2-hyperintense lesion marking MS lesions were segmented semi-automatically using the Jim software package (v.6, Xinapse) on the T2-weighted image, also consulting the FLAIR and the PD-weighted images. This was done by two independent operators (IL and ES), in order to assess inter-operator reliability of the lesion maps. We assessed the reliability in two ways: firstly, we calculated an intraclass correlation coefficient (ICC; Shrout and Fleiss (1979)) for the lesion volumes resulting from the two operators' segmentations. Secondly, to quantify the localisation agreement of the two operators' maps for each patient, we derived the Dice coefficient as a similarity index (Dice, 1945). This was calculated for each patient as twice the number of voxels marked by both operators divided by all the voxels marked by either of the operators (Zijdenbos, Member, Dawant, Margolin, & Palmer, 1994).

| Tissue-state segmentation
We segmented three white matter tissue-states in each patient: (i) normal appearing white matter (NAWM); (ii) lesional tissue that appears as T2-hyperintensity only (T2L); (iii) lesional tissue that shows T2-hyperintensity as well as T1-hypointensity (T1L). Average microstructural parameters were calculated for each patient within these three tissue-states.
We restricted our regions of interest (ROIs; i.e., NAWM, T2L and T1L) to white matter that is more likely to show MS damage in order to minimise differences in MRI metrics simply due to spatial bias, that is, to systematic white matter differences related to spatial location (subcortical vs. periventricular) rather than disease pathology. To do this, we first created a lesion probability map from the lesion maps of all patients. We included all patients with available lesion maps, even if they were subsequently excluded from the analysis, in order for the lesion probability map to be as representative as possible of the typical location of white matter lesions. This was achieved by affinely registering the T2-weighted images and lesion maps to the highresolution T1-weighted images, using FSL FLIRT with 6 of freedom and the Correlation Ratio as cost function (Jenkinson, Bannister, Brady, & Smith, 2002). Then, the brain-extracted (FSL BET, Smith   [3,4,5,6,7,8,9,13,18] (Avants, Epstein, Grossman, & Gee, 2008), and the obtained warp was applied to the lesion maps. From the lesion maps in MNI space, we computed a probability map, which was then thresholded at 5% to identify white matter that is susceptible to lesions (see Figure 1). The resulting mask was then registered to each patient's individual space and used to constrain the ROIs for three segmented tissue-states, which were constructed as explained below.

| NAWM ROI
We segmented the lesion filled T1-weighted images using FSL FAST (Y. Zhang, Brady, & Smith, 2001) and thresholded the resulting white matter probability maps at 80% to obtain a conservative white matter mask. To create an individual map of NAWM, all voxels marked as lesions in the semi-automated segmented lesion map produced by one of the operators and all voxels within 5 mm of these voxels, were removed from the individual white matter mask. This generated a NAWM ROI that excluded white matter in close proximity to lesions, which often shows diffuse damage (Fazekas et al., 1999;Seewann et al., 2009). Additionally, the NAWM ROI was restricted to lesionsusceptible regions, as described above.

| ROIs for T2L and T1L
We classified each T2-hyperintense voxel into either T2L or T1L, T1L being T2-hyperintense with corresponding T1-hypointensity, and T2L being T2-hyperintense without corresponding T1-hypointensity. The classification of T2-lesion voxels into T2L and T1L was based on the intensity on the T1-weighted image. For each patient, we first corrected the T1-weighted image for potential bias fields, using FAST (Y. Zhang et al., 2001). Then, we generated a distribution of image intensity values for all voxels in the NAWM mask. We classified each voxel in the T2-lesion mask as T1L if its T1 signal intensity lay at least 1.5 interquartile ranges (IQR) below the lower quartile of the NAWM distribution. Visual inspection confirmed that these voxels were visibly hypointense on the T1-weighted image. All T2-hyperintense voxels that were not classified as T1L were classified as T2-hyperintense only (T2L; Figure 2). The categories T2L and T1L are therefore mutually exclusive within a voxel. A single lesion could consist of T2L as well as T1L voxels. The T1L and T2L ROIs were restricted to lesionsusceptible regions, as described above. Figure 2 shows the segmented tissue map for one representative MS patient.
To make comparisons between the three tissue-states possible, in this analysis we only included participants that had tissue in all three

| MTR maps
The MTR was calculated voxel by voxel with the equation MTR = [(S 0 -S MT )/S 0 ]x100, whereby S 0 represents the signal without the offresonance pulse and S MT represents the signal with the off-resonance pulse. The MTR images in native space were skull-stripped using FSL BET and non-linearly registered to the respective skull-stripped T1-weighted images using Elastix (S. Klein et al., 2010).

| mcDESPOT and MWF maps
Spoiled gradient recalled-echo (SPGR) and balanced steady-state free precession (bSSFP) data sets for each participant were co-registered to the first volume in the sequence in order to correct for inter-scan motion using an affine (12 of freedom, mutual information) transformation . SPGR and inversion recovery spoiled gradient (IR-SPGR) images were used for driven equilibrium single pulse observation of T1 (DESPOT1) processing (Deoni, 2007), resulting in quantitative T1 maps. Furthermore, T2 maps were FIGURE 1 Lesion probability map. The probability map of all white matter lesions detected in all scanned 135 MS patients is shown here. The map shows voxels which were lesioned in at least 5% of the patients (the colour bar ranges from 5% to 50%). The map was used in order to restrict our analyses to white matter regions that are sensitive to the occurrence of lesions. This was done by thresholding the map at 5% and registering the resulting mask to each patient's native space [Color figure can be viewed at wileyonlinelibrary.com] FIGURE 2 Overview of the image processing pipeline. The main analysis steps are outlined, on the left for the tissue-state segmentation, on the right for the quantification of microstructural damage. Tissue segmentation in patients: Lesion segmentation was performed by two independent operators in order to assess reliability of the lesion segmentation. T1-weighted images were lesion-filled and FASTsegmented in order to obtain a white matter mask. To restrict white matter to lesion-susceptible regions, a lesion probability map from all patients' individual lesion maps was created and registered to each patient's native space, creating a restricted white matter mask. NAWM was defined as restricted white matter, at least 5 mm away from lesions in order to avoid tissue damage around lesions. White matter lesions were further segmented into T1L and T2L, based on the intensity in the (bias field corrected) T1-weighted image. The distribution of the intensities on the T1-weighted image in NAWM voxels is shown in green, while the distribution of the intensities on the T1-weighted image in lesional voxels is shown in red and in blue. From the distribution of NAWM voxels, a cut-off was calculated (1.5 IQR below the lower quartile, shown as black line) that was applied to all voxels in the lesion map. Lesional voxels with an intensity below the cut-off were classified as T1L (red distribution), the rest as T2L (blue distribution). Microstructural damage quantification: For each patient, we derived a parameter map for each FA, RD, MTR, and MWF. We scaled these maps to the distribution (mean and SD) of healthy controls through z-standardisation, yielding maps of FA(z), RD(z), MTR(z), and MWF(z), respectively. From these, global estimates of damage were obtained from the three segmented tissue-states. Additionally, voxel-wise values were considered within each patient's white matter mask in order to (a) look at within-patient voxel-wise correlations, (b) combine the four measures through a principal component analysis, and (c) assess sensitivity of each measure to lesional tissue, using a receiver operating characteristic (ROC) analysis. WM: white matter; NAWM: normal appearing white matter; T2L: T2-hyperintense only lesional tissue; T1L: T2-hyperintense lesional tissue that appears also T1-hypointense. FA: fractional anisotropy; RD: radial diffusivity; MTR: magnetisation transfer ratio; MWF: myelin water fraction [Color figure can be viewed at wileyonlinelibrary.com] calculated with the two phase-cycled bSSFP data using the driven equilibrium single pulse observation of T2 (DESPOT2) algorithm (Deoni, Ward, Peters, & Rutt, 2004). The data from the DESPOT1 and DESPOT2 sequences were combined according to the three-pool multicomponent DESPOT pipeline (Deoni, Rutt, Arun, et al., 2008;) that yields whole brain voxel-wise estimates of the myelin water fraction (MWF), while accounting for CSFpartial volume contamination (Deoni, Matthews, & Kolind, 2013).
MWF maps were non-linearly registered to the brain-extracted T1-weighted image using Elastix (S. Klein et al., 2010).

| Maps of microstructural damage
To scale the microstructural measures to the healthy control distribution in a voxel-wise manner, we co-registered all participants' maps.
The brain-extracted lesion-filled T1-weighted images were first nonlinearly normalised to MNI space, using ANTs SyN (Avants et al., 2008). We then applied the obtained warp to the four microstructural parameter maps, which had previously been co-registered the T1-weighted image, as described above. For each metric, voxel-wise mean and SD maps were computed from the data sets of the healthy controls. Then, for each patient and each metric, a map was calculated showing the z-score from the healthy control distribution for each voxel. For example, a z-score of −1 in a patient's voxel indicated that the measure was 1 SD lower than the mean of the metric in healthy controls in the same location. For all further analyses, the damage scores (scaled metrics) are used and reported as FA(z), RD(z), AD(z), MTR(z) and MWF(z), respectively.

| Global measures of damage for betweenpatient correlations
To explore the agreement between the measures when estimating microstructural damage, we extracted a global damage measure for each patient and each tissue-state. We calculated the median % z-score from each patient's three ROIs and from each of the four metrics. We then compared the damage scores across tissue-states using paired t-tests. Then, for each tissue-state, we computed Pearson correlation coefficients between the estimates of damage obtained from each of the four metrics.

| Within-patient voxel-wise correlations
As between-patient correlations are affected by the variability of the measures in the particular sample investigated, we also calculated voxel-wise correlations between the metrics. This was done considering all voxels within the white matter maps of each patient. To report these correlations, we calculated the mean and SD of the correlation coefficient (applying z-transformation) across the group. Additionally, we set up multiple linear regressions, each of which used a different metric as the dependent variable. We calculated the spatial variance explained in that metric by the other three metrics. This was done using in-house software written in MATLAB (v. R2015, Mathworks). A full set of metrics could not be collected from all participants due to specific absorption rate (SAR)-constraints of the mcDESPOT sequences or to logistical reasons.

| Principal component analysis
From the 123 complete patient datasets, 105 patients met our criterion of having at least 0.4 cm 3 volume in each of the three tissue-states. Table 2 shows the demographic and clinical characteristics of the patients and healthy controls included in the analysis. Patients were younger than healthy controls, had smaller whole brain and grey matter volumes and performed significantly worse on tasks measuring dexterity, walking ability and cognition. None of the patients experienced a relapse or worsening of their symptoms in the four weeks after the scan.

| Lesion segmentation
The inter-operator reliability for extracted lesion volume was ICC

| Microstructural damage in MS lesions
Maps of mean and SD of the parameters in healthy controls, which were used to calculate the damage scores in patients, are shown in  Demographic and clinical characteristics of the participants. Unless otherwise indicated, descriptive statistics provided are means and SDs. For statistical comparison between the two groups, chi-square tests were computed for categorical variables, Kruskal-Wallis test for skewed variables (9-HPT, T25-FW), and unpaired t-tests for the rest. P values for group differences are provided. RRMS: relapsing-remitting multiple sclerosis; IQR: inter-quartile range; EDSS: expanded disability status scale; PMS: progressive MS (primary or secondary); 9-HPT: 9 hole peg test (score averaged across two trials); T25-FW: timed 25 ft walk (score averaged across two trials), PASAT: paced auditory serial addition test (3 sec. Version); NBV: normalised brain volume; NGMV: normalised grey matter volume. Normalised brain and grey matter volume were calculated using SIENAX     MWF(z) were significantly lower in T2L than in NAWM, while RD(z) was significantly higher. Differences between T1L and T2L were also significant for all metrics. All statistical tests are reported in Table 3.

| Accounting for age and education differences between patients and controls
The healthy controls were younger (on average, 6 years) and had higher level of education (on average, 4 years) than the patients, possibly confounding the global estimates of damage, which were based on z-scores calculated from the healthy controls. To address this, we performed additional analyses as follows, aiming to estimate the effect that the mean age and educational difference between patients and controls may have on the microstructural metrics. First voxel-wise regression coefficients were computed (using the AFNI [Cox, 1996] function 3dTcorr1D) between the (z-standardised) metrics and age / years of education in the 27 healthy controls (n = 25 for MWF). We extracted the average regression coefficient from the restricted white matter mask in MNI space and then calculated the mean change in each metric that would be expected in a cohort 6 year older. The

| Within-patient voxel-wise correlations
As for the between-patient correlations within tissue-averaged estimates, average (absolute) voxel-wise correlation coefficients lay between 0.38 and 0.74, indicating medium-to-high spatial correlations across the maps of damage (Table 4). Additionally, we computed how Estimates of global damage. For each metric, a boxplot across the included 105 patients is shown, comparing global damage estimates for each tissue-state. Global damage measures were computed as the median z-score within each tissue-state for each patient. All differences between tissue-states are significant for all damage scores as presented in Table 3. NAWM: normal appearing white matter; T2L: T2-hyperintense only lesional tissue; T1L: T2-hyperintense lesional tissue that appears also T1-hypointense; FA(z): z-score for fractional anisotropy; RD(z): z-score for radial diffusivity; MTR(z): z-score for magnetisation transfer ratio; MWF(z): z-score for myelin water fraction [Color figure can be viewed at wileyonlinelibrary.com] with regard to its sensitivity to the three tissue-states, as described below.   and between MWF(z) and the PCA score (p[uncorrected] = 0.09). The metric that could best differentiate NAWM from T2L was MWF(z), followed by the PCA score, MTR(z) and RD(z) and lastly FA(z) (Figure 7).
For the classification problem T1L versus NAWM, all pairwise comparisons were significant, apart from the comparison between MTR(z) and PCA score (p[uncorrected] = 0.20). The metrics that could best differentiate NAWM from T1L lesions were MTR(z) and the PCA score, followed by MWF(z), RD(z) and FA(z).
For the classification problem T1L versus T2L, all pairwise comparisons were significant. The metric that could best differentiate T2L from T1L lesions was MTR(z), followed by PCA score, MWF(z), RD(z) and FA(z).

| DISCUSSION
Our results demonstrate that all considered metrics of white matter microstructure (FA, RD, MTR, and MWF) are sensitive to the presence and severity of MS damage. We found medium to high correlations between these metrics, with none of the metrics sharing more than 65% of their variance with the other metrics. We also showed that combining the four metrics, by capturing their covariance through a PCA, did not yield a metric more sensitive to lesional tissue than any of the individual metrics. These results suggest that, while there is some agreement between the measures of microstructural damage, they may provide complementary information on the severity of damage. Therefore, optimised clinical trials testing preventative, neuroprotective or repair intervention would benefit from the acquisition and consideration of all of these metrics to quantify the different aspects of white matter damage.

| All four metrics indicate microstructural damage
Across the patient cohort, our four MRI-based metrics confirmed significant microstructural damage in NAWM. Lesional tissue with only T2-hyperintensity (T2L) showed greater damage than NAWM and the estimated microstructural damage was greatest in lesional tissue with Area under the curve (AUC). For each metric and each classification problem, the mean ± SD AUC across patients are provided. The AUC quantifies the performance of each metric when classifying T2L versus NAWM, T1L versus NAWM, and T1L versus T2L, respectively. NAWM: normal appearing white matter; T2L: T2-hyperintense only lesional tissue; T1L: T2-hyperintense lesional tissue that appears also T1-hypointense; FA(z): z-score for fractional anisotropy; RD(z): z-score for radial diffusivity; MTR(z): z-score for magnetisation transfer ratio; MWF(z): z-score for myelin water fraction; PCA-score: score derived from the first principal component of our PCA analysis FIGURE 7 ROC analysis. For each patient, a ROC curve was computed for each classification problem: T2L versus NAWM (left plot), T1L versus NAWM (middle plot) and T1L versus T2L (right plot). The patient-averaged ROC curve (average true positive rate depending on the set false positive rate) is plotted for each metric. To compare the performance of the metrics statistically, we considered each patient's area under the curve and performed pairwise comparisons between the metrics (as described in the text). NAWM: normal appearing white matter; T2L: T2-hyperintense only lesional tissue; T1L: T2-hyperintense lesional tissue that appears also T1-hypointense; FA(z): z-score for fractional anisotropy; RD(z): z-score for radial diffusivity; MTR(z): z-score for magnetisation transfer ratio; MWF(z): z-score for myelin water fraction, PCA score: score derived from the first principal component of our PCA analysis [Color figure can be viewed at wileyonlinelibrary.com] additional T1-hypointensity (T1L), which is consistent with histological studies (Moll et al., 2011;Sahraian et al., 2010;Van Waesberghe et al., 1999;Van Walderveen et al., 1998).

Differences in microstructural metrics between lesional tissue
with and without T1-hypointensity are not always found with MRI.
For example, Vavasour, Li, Traboulsee, Moore, and Mackay (2007) reported differences between T1-hypointense and T1-isointense lesions only for MTR, but not for MWF. The sensitivity of MWF to different types of lesions might be influenced by the acquisition method chosen (Faizy, Thaler, Kumar, & Sedlacik, 2016), as well as by the method adopted to classify lesional tissue. In this study, instead of classifying entire lesions, we objectively classified each lesional voxel into T2L versus T1L. This maximised the contrast between the investigated tissue-states, which was also reflected in the high classification accuracy achieved in our ROC analysis, with the average area under the curve exceeding 0.8 for all measures, apart from FA (Table 5).

| Estimates of microstructural damage correlate between the MRI-based measures
The global estimates of damage correlated significantly between the measures, with the lowest correlations in NAWM, in which the tissue damage is also the lowest. Averaging damage across lesional tissue may provide more meaningful measures than averaging across potentially more heterogeneous NAWM, explaining the higher betweenmeasure correlations found in lesional tissue. The voxel-wise correlations that were calculated separately within each patient's white matter support the results from the between-participant correlations of global measures of damage, also yielding medium-to-high correlations. This indicates that the microstructural damage maps for each patient show consistency across the metrics used.
If the covariance between the metrics were sufficiently high, this would allow the accurate generation of synthetic maps of one metric by just measuring the other metrics, potentially saving scanning time (Callaghan, Helms, Lutti, Mohammadi, & Weiskopf, 2015). However, our multiple linear regression analysis suggests that one metric cannot simply be replaced or computed from the other metrics, as on average only around 50-60% of the variance is shared between the metrics.
Overall, we showed that while spatial variance is shared between the metrics and some of the between-patient correlations of global damage scores are high, this is not uniformly true for all metrics (e.g., the weakest between-patient correlations were found between MTR and the other measures). This could be partly due to noise in the individual metrics and partly to the fact that each metric is sensitive to a different biological aspect of damage, which may show some independence and thus potentially reflect distinct pathological mechanisms.

| The metrics may provide complementary information
If noise were the reason for the observed discrepancies between the metrics, then combining them into one measure could potentially provide a metric more sensitive to microstructural damage. A similar concept was implemented by Mangeat et al. (2015) with myelin-sensitive contrasts in the cortex at 7 T: making use of the covariance structure of the metrics MT and T2*, maps of the first component score from a principal component analysis matched histology-based cytoarchitectonic maps better than either of the two original maps.

| MTR may be particularly sensitive to extracellular water
The particular sensitivity of MTR to T1L that we found supports the findings by Vavasour et al. (2007). MTR is sensitive to white matter microstructure because it quantifies the MT that happens on the surface of myelin sheaths. However, Vavasour et al. (2007) suggest that in the case of T1-hypointense lesions, MTR changes might also reflect, to a large extent, the increase in extracellular water that follows tissue destruction. Another explanation could be that the MTR can be affected by T1-effects (Helms, Dathe, & Dechent, 2010). By definition, T1-hypointense lesional tissue is related to the intensity on T1-weighted images, which could contribute to the particular sensitivity of MTR to that type of lesion. The particular sensitivity of MTR to water and T1-effects may be the reason why it showed comparatively low between-patient correlations with the other metrics.
4.5 | MWF may be particularly sensitive to subtle microstructural damage in MS MWF was significantly better than the other metrics at differentiating T2L from NAWM. MWF reflects the signal fraction coming from myelin water compared to the combined signal fraction of myelin and extracellular water. Therefore, MWF is affected by a change in the total myelin content in a voxel, but also by changes in extracellular water. T1L in particular is characterised by an increase in extracellular water that can follow severe tissue destruction. Nevertheless, our measure of MWF did less well than MTR at differentiating between T1L and T2L.
In the case of very severe tissue destruction, it is possible that some of the additional water is assigned to the CSF compartment rather than to the extracellular water compartment of the applied three-compartment model. If this is the case, then the microstructural damage might be underestimated, when looking at the apparent MWF. Additional analyses show that the estimated CSF water fraction is indeed significantly increased in MS lesions, most greatly in T1L (Supporting Information Figure S1). MWF was particularly good at classifying lesions with less severe tissue destruction, that is, T2L. It could be the case that MWF provides a more specific measure than MTR in the case of less severe damage, by potentially being less affected by T1-weighted effects. It is also possible that these two measures differ with regard to their sensitivity to fragmented myelin, which remains to be explored.

| FA and RD are the least specific metrics
The DTI-based measures FA and RD were the least sensitive at detecting both types of lesions. These measures are very frequently used to measure microstructural integrity of white matter, but lack specificity (Beaulieu, 2014). The diffusion tensor is sensitive to a variety of factors, including myelination and axonal density, extracellular water and fibre architecture (Beaulieu, 2014). Studies in animals suggest that axonal damage mostly affects axial diffusivity and demyelination mostly affecting perpendicular diffusivity (Concha, Gross, Wheatley, & Beaulieu, 2006;Song et al., 2003). However, in particular in MS, demyelination and axonal damage are not two isolated processes, but have a complex relationship (Simons, Misgeld, & Kerschensteiner, 2014). The microstructural damage scores we calculated using FA and RD were highly correlated, which is probably due, at least partly, to the fact that an isolated change in RD does affect FA by its definition. Still, RD had higher sensitivity to both lesional tissue-states than FA. One explanation is that RD could be a more sensitive measure of myelination. Another explanation could be that diffusivity generally increases in lesional tissue, independent of the direction along which diffusion is measured. If this were the case, then AD, which is independent of RD, should show similar sensitivity to damage as RD. Additional analyses show that, in our data (Supporting Information Figure S2), this is not the case and that AD performs significantly worse than RD at classifying the tissue-states, although it performs marginally better than FA. This suggests that RD may indeed be the most sensitive of the DTI-based metrics, when estimating microstructural damage.
From the four investigated metrics, FA was the least sensitive to lesional tissue. FA is predominantly determined by structured axonal membranes (Beaulieu, 2014), but is also sensitive to changes in myelination. However, large changes in myelination lead to comparatively small changes in anisotropy (Song et al., 2002 (2012)) or composite hindered and restricted model of diffusion (CHARMED; Assaf and Basser (2005)), and infer more biologically specific microstructural measures, such as the fibre orientation dispersion. Some of these metrics may have the potential to provide even more specific information about pathological processes (Tomassini et al., 2012). However, it is important to mention that due to the complexity of these models, various assumptions and model constraints have to be established. While these may be reasonable for healthy tissue, they do not necessarily hold for pathological tissue, which can make the interpretation of these microstructural metrics challenging (Lampinen et al., 2017).
The investigated metrics are likely to be sensitive to different pathological processes. Therefore, the collection of several, complementary metrics could improve detection and quantification of microstructural pathology, with important implications for clinical trials, whose behavioural or pharmacological interventions could interfere with or induce microstructural changes.

| Limitations
The study is not without limitations. We used lesional tissue to assess the sensitivity of MRI-based metrics to microstructural damage, as pathology is most pronounced in lesions (Tomassini & Palace, 2009).
While axonal loss and demyelination are established pathological processes in MS, lesional tissue is also characterised by inflammation, gliosis, axonal swelling and edema (Filippi et al., 2012). The microstructural measures applied in this study are likely to be sensitive to some of these processes. Additionally, lesional tissue can be very heterogeneous between and within patients, with numerous types of MS lesions described histologically (Kuhlmann et al., 2017). As we cannot differentiate between these lesion types using MRI, we classified the lesions broadly into two lesional tissue-states.
In this study, we focused on T2-weighted white matter hyperintense lesional tissue without versus with T1-weighted hypointensity. The majority of T1-hypointense lesions are chronic black holes, that is, lesions with severe tissue destruction (Sahraian et al., 2010;Tomassini & Palace, 2009). However, a minority of them are active lesions, which can be distinguished on a post-contrast T1-weighted scan by their gadoliniumenhancement. Ours were research scans, acquired without gadoliniumcontrast administration, which prevented us from making this distinction.
However, it has to be noted that the classification of enhancing versus non-enhancing lesions is also not straightforward, since there are a number of methodological factors that determine whether lesions enhance or not (Filippi, 2000). However, from the low proportion of active lesions among T1-hypointense lesions reported in the literature (Ciccarelli et al., 1999;Koudriavtseva et al., 1997;Zinadinov & Bakshi, 2004), and from the inclusion criteria in our study we expect only a minority of T1L assessed to be enhancing lesions.
To ensure informativeness of the results, we only included patients with a high number of voxels within each tissue class.
Although this could bias the sample towards patients with higher total lesion volume, we showed that this was unrelated to disease severity and duration.
When comparing the sensitivity of the metrics to lesional tissue, the ROC analysis showed AUCs between 0.68 and 0.98, which overall indicated high classification accuracy. This analysis assumes that the T2-lesion masks are the gold-standard. However, lesion segmentation is not a trivial process (Carass et al., 2017). As shown in our interoperator reliability analysis, this is highly operator-dependent. Even if the correlation between lesion volumes calculated from two operators suggests reliable lesion segmentation, some operators can be more conservative than others (Carass et al., 2017).
The patient cohort in this study was on average 6 years older than the controls and had on average 4 years fewer of education. Our analyses show that this difference had an effect on the normalised damage scores in the patients. However, the effect size was small compared to the effect of pathology, apart from the relationship between education and MTR. However, here, causality cannot be inferred, as the disease severity may be a confound, affecting both microstructure and how long patients can stay in education for. Additionally, since all patients' maps were normalised in the same way, a potential bias in the z-scores would not have affected the correlations between the metrics.
The high sensitivity of DTI-based metrics to the underlying fibre configuration could contribute to explain, at least in part, their low sensitivity to lesional tissue. We aimed to address this inherent limitation of DTI by spatially normalising the metrics to healthy control tissue. However, the co-registration of the microstructural images, which is based on T1-weighted structural images, limits the extent to which this analysis can succeed. To minimise potential registration biases due to atrophy in our patient group, we employed ANTs SyN algorithm, which is considered as a robust method to register atrophic brains (Avants et al., 2008;A. Klein et al., 2009). Additionally, in advance of applying this method to our patient cohort, we compared it to other methods (linear and nonlinear registration provided by FSLs FLIRT and FNIRT) and investigated the effect of atrophy on the resulting MNI normalisation (Supporting Informtion Figure S3). Our results suggested that ANTs SyN provided the best registration success in our dataset, with differences between patients with higher versus lower atrophy being negligibly small compared to differences between the compared registration algorithms.

| Conclusions
The