Remyelination varies between and within lesions in multiple sclerosis following bexarotene

Abstract Objective In multiple sclerosis chronic demyelination is associated with axonal loss, and ultimately contributes to irreversible progressive disability. Enhancing remyelination may slow, or even reverse, disability. We recently trialled bexarotene versus placebo in 49 people with multiple sclerosis. While the primary MRI outcome was negative, there was converging neurophysiological and MRI evidence of efficacy. Multiple factors influence lesion remyelination. In this study we undertook a systematic exploratory analysis to determine whether treatment response – measured by change in magnetisation transfer ratio – is influenced by location (tissue type and proximity to CSF) or the degree of abnormality (using baseline magnetisation transfer ratio and T1 values). Methods We examined treatment effects at the whole lesion level, the lesion component level (core, rim and perilesional tissues) and at the individual lesion voxel level. Results At the whole lesion level, significant treatment effects were seen in GM but not WM lesions. Voxel‐level analyses detected significant treatment effects in WM lesion voxels with the lowest baseline MTR, and uncovered gradients of treatment effect in both WM and CGM lesional voxels, suggesting that treatment effects were lower near CSF spaces. Finally, larger treatment effects were seen in the outer and surrounding components of GM lesions compared to inner cores. Interpretation Remyelination varies markedly within and between lesions. The greater remyelinating effect in GM lesions is congruent with neuropathological observations. For future remyelination trials, whole GM lesion measures require less complex post‐processing compared to WM lesions (which require voxel level analyses) and markedly reduce sample sizes.


Introduction
In multiple sclerosis, most lesions are thought to go through a phase of inflammatory demyelination followed by a variable degree of remyelination. In lesions that do not fully remyelinate, chronic axonal loss occurs due to loss of trophic support, 1 and this contributes to irreversible and progressive disability. 2 A major unmet goal of multiple sclerosis treatment is to enhance endogenous remyelination, so restoring neuronal function and preventing demyelination-associated chronic axonal loss.
Endogenous remyelination in multiple sclerosis lesions is highly variable, and several factors have been implicated. 3 Increasing age is a major determinant of remyelination failure. 4 A substantial proportion of lesions remain chronically active, with evidence of ongoing, if more subtle, inflammation and demyelination at their edges, and such lesions are associated with progressive disability. 5,6 Lesion location also appears to be relevant; histopathological studies have suggested that lesions in brain grey matter (GM) may remyelinate more effectively than those in white matter (WM), although to the best of our knowledge this has yet to be shown in vivo. [7][8][9] Within and between WM lesions there may be significant differences: combined PET-MRI analyses show substantial heterogeneity of demyelination and remyelination in voxels from the same lesion, 10 and lesions close to the ventricles exhibit greater damage compared with those in deeper tissues, [11][12][13][14] although it is unknown if this gradient reflects greater damage as lesions form and evolve, or a lower potential to remyelinate.
In remyelination trials, the most frequently employed imaging method for assessing remyelination is measurement of magnetisation transfer ratio (MTR). 15 In WM and GM, MTR most strongly correlates with myelin density, though in WM there is also a correlation with axonal count and inflammation. [16][17][18][19] Previous work has shown that assessing remyelination treatment effects in individual rather than pooled lesions increases sensitivity, 20 and has suggested that this should be restricted to lesions with lower than average MTR. 21 However, to date, other factors that could influence lesion remyelination have not, to the best of our knowledge, been assessed in vivo.
We have tested the ability of bexarotene, a retinoid X receptor agonist, to promote remyelination in people with relapsing-remitting multiple sclerosis in a phase IIa placebo-controlled trial (CCMR-One 22 ). The novel MRI efficacy outcome measurechange in MTR in whole lesions whose baseline MTR was below the within-patient medianwas not met, and bexarotene was poorly tolerated, precluding its further development as a treatment in multiple sclerosis. However, statistically significant treatment effects were observed in whole deep grey matter (DGM) lesions, cortical grey matter (CGM) lesions and brainstem lesions, yet not in periventricular, deep-white matter, juxtacortical nor cerebellar lesions. The interpretation of these MTR changes as representing remyelination is congruent with the observation that bexarotene also reduced the latency of visual evoked potentials (VEPs) in eyes previously affected by optic neuritis. 22 Here we explore lesional factors that influence the capacity for remyelination using the CCMR-One trial MRI dataset and calculate necessary sample sizes for future trials of other potential remyelinating agents using the most promising and pragmatic lesion metrics.

Participants
The Cambridge Centre for Myelin Repair Trial Number One (CCMR-One, ISRCTN14265371) was a double-blind phase 2a trial in people with relapsing-remitting multiple sclerosis from two UK centres, aged 18-50 years who had been stable on dimethyl fumarate for at least 6 months. Full inclusion criteria, baseline demographics and methodology can be found in the original trial manuscript. 22 Briefly, patients were randomised to receive bexarotene or placebo tablets for 6 months and underwent MRI and VEPs at baseline and 6 months. The trial was approved by London Westminster National Research Ethics Service Committee (15/LO/0108) and all participants gave written informed consent at enrolment.

Processing
We employed three levels of analysis to explore treatment effects: within whole-lesions (the most simple to process and therefore the most pragmatic for future remyelination ª 2022 The Authors. Annals of Clinical and Translational Neurology published by Wiley Periodicals LLC on behalf of American Neurological Association. trials); within lesion spatial components (the inner core compared with the outer rim of lesions, and their surrounding normal-appearing tissues) and lesion segments (regions defined by their baseline MTR and T1hypointensity); and within individual lesion voxels (potentially the most sensitive assessment given the aforementioned heterogeneity within lesions, but potentially the most vulnerable to methodological artefacts such as partial volume effects at lesion borders and subtle registration inaccuracies). At a whole-lesion level, we compared the effect of lesion location (GM compared with WM and distance from cerebrospinal fluid) and the degree of baseline abnormality (based on MTR and T1hypointensity). And finally, at a voxel-level, we compared treatment effects according to location (again GM compared with WM and distance from cerebrospinal fluid) and baseline voxel MTR and T1.
The lesion segmentation and tissue segmentation processing are detailed in the original trial manuscript (and in the Data S1).
To examine spatial components, lesions were segmented into an outer rim (by eroding each T2-contoured lesion by one voxel-layer) and a central core (the remaining central tissue). Three concentric perilesional cuffs of normal-appearing tissue were created by dilating each T2contoured lesion by one voxel-layer (first cuff), then another voxel layer (2nd cuff) and a final voxel layer (3rd cuff). This was done by 3D 1-voxel dilations. Any voxel included in the cuffs of more than 1 lesion was excluded.
The MTR maps at month 0 and month 6 were calculated directly in the subject halfway volumetric T1 space as follows: (((MToff À MTon)/MToff) 9 100). To compute MTR maps in the halfway volumetric T1 space, we concatenated two transformations: one was the transformation between MTon and MToff to the correspondent T1 volumetric scan at each timepoint, and the second was the transformation to the halfway volumetric T1 space. In that way, all scans from month 0 and month 6 were in halfway volumetric T1 space.
Baseline MTR was used for classification in three ways. Firstly, examining MTR treatment effects at the lesion-level (presented within quartiles of baseline whole-lesion MTR (for WM lesions) and sub/supramedian (for GM lesions due to insufficient numbers of CGM or DGM lesions for quartiles)); secondly, comparing MTR treatment effects at the lesion segment-level (where voxels within a lesion were grouped as regions according to their baseline MTR quartile); and thirdly comparing MTR treatment effects at the voxel-level (presented within quartiles of baseline voxel MTR). Tissue-specific MTR quartile (or median) values were used for WM lesions, CGM lesions and DGM lesions.
As with baseline MTR, T1 intensity was used for classification in three ways. Firstly, examining MTR treatment effects at the lesion-level (presented within quartiles of baseline whole-lesion T1 intensity (for WM lesions) and sub/supramedian (for GM lesions due to insufficient numbers of CGM or DGM lesions for quartiles)); secondly, comparing MTR treatment effects at the lesion segment-level (where voxels within a lesion were grouped according to their baseline T1 quartile, comparing the mean MTR treatment effect at this lesion-segment level); and thirdly, comparing MTR treatment effects at the voxel-level (presented within quartiles of baseline voxel T1 intensity). Tissue-specific T1 quartile (or median) values were used for WM lesions, CGM lesions and DGM lesions.

Statistical analyses
With the exception of the sample size calculations, all analyses used linear mixed models with patient and, where appropriate, lesion random intercepts. These models regressed lesion MTR (with lesions nested within patients, for whole lesion analyses), lesion component/ segment MTR (with components/segments nested within lesions nested within patients, for component/segment analyses) or individual voxel MTR (with voxels nested within lesions nested within patients, for voxel-level analyses), adjusting for the baseline value of the outcome measure (lesional MTR, component/segment lesional MTR or voxel MTR respectively) and the four binary minimisation factors prespecified in the trial: age (≤40/ > 40 years), gender, trial centre/scanner (Cambridge/ Edinburgh) and EDSS (≤4Á0/> 4Á0 score). We also included lesion-subgroup interaction terms to (i) estimate lesion-subgroup specific treatment differences and (ii) test for variation between these differences. The interaction term, which assesses the difference in treatment effects between quartiles or bands, is able to be significant even though the main treatment effect is not, for this reason: these interaction tests are here relatively highly powered because there is a strong within-patient component, since patients can have lesions in both the quartiles being compared, giving increased precision to the comparison of treatment differences between quartiles; whereas the main treatment difference estimation has to be entirely between-patient, since no patient can have both treated and untreated lesions. Although analyses at the lesion, component/segment or voxel level are more flexible and powerful than patient-level analyses, they are vulnerable to selection bias since patients, not lesions or components/segments or voxels, were randomised (and some patients may not have contributed to all analyses): all analyses are therefore exploratory. We repeated each analysis in pure WM lesions, CGM lesions and DGM lesions (each using a tissue-specific cohort-level quartile (or median) values). In pure WM lesions, we mitigated partial volume effects by excluding voxels from the outermost layer for voxel-level and segment-level analyses. Excluding the outermost layer in CGM and DGM lesions left insufficient numbers (34 CGM lesions and 11 DGM lesions) so was not performed. Low numbers of DGM lesions (n = 16) precluded analysing the effect of distance from the ventricles or T1-hypointensity.

Sample size calculations
Complete remyelination could not increase lesional MTR beyond that of the NAWM. We previously found that the difference between mean lesional MTR and mean normalappearing white matter MTR is 5.92 percentage units (pu), 20 though ex-vivo data point to a 50% remyelination ceiling effect, suggesting a maximum treatment effect of 2.96 pu. Necessary sample sizes, for 5% significance and both 80% and 90% power, will therefore be presented to detect five biologically significant treatment effects: 1.3 pu (43% of the maximum), 1.4 pu (47% of the maximum), 1.5 pu (50% of the maximum), 1.6 pu (53% of the maximum), 1.7 pu (57% of the maximum) and 1.8 pu (60% of the maximum).
Sample sizes were standardly calculated for patientaveraged lesional measures, since patients and not lesions are randomised (and clinical measures are patient-level); calculations were for an ANCOVA analysis, which is a comparison of active versus placebo means using a t-test but adjusted for baseline, and can be estimated by incorporating the baseline-follow-up correlations. This used the placebo group means and standard deviations at follow-up, and the placebo group baseline vs follow-up Pearson correlation coefficient. Calculations were performed in Stata (version 16.1, Stata Corporation, College Station, TX, USA).

Correlation between voxel intensity between baseline and follow-up
To explore whether the same voxel was being captured at baseline and follow-up we performed Pearson correlation between the baseline and follow-up voxel MTR for lesional voxels in patients receiving placebo.
Analyses were performed in Stata version 16.1 and R version 4.1. Results were considered statistically significant at the p < 0.05 level.

Data Availability Statement
The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Results
All MRI scans from the 49 patients with baseline and follow-up imaging were of sufficient quality for inclusion, generating 3170 T2 hyperintense lesions (1613 lesions containing only WM (pure WM lesions), 106 lesions containing only GM (pure GM lesions) and 1451 mixed GM and WM lesions). Within the 106 pure GM lesions were 85 CGM lesions, 16 DGM and 5 cerebellar lesions. Due to insufficient numbers, cerebellar lesions were removed from all analyses of pure GM lesions (though were included in sensitivity analyses). Results are presented at the whole-lesion level (Table 1), the lesion component/ segment-level (Table 2) and the lesional voxel-level (Table 3).

Lesion location
We have previously reported the significant treatment effects seen in whole DGM, CGM and brainstem lesions. 22 Grey matter versus white matter Significant treatment effects were seen in pure whole GM lesions (adjusted bexarotene-placebo difference 1.08 pu, 95% CI 0.32-1.84, p = 0.008) but not in pure whole WM lesions (adjusted bexarotene-placebo difference 0.10 pu, 95% CI -0.38 to 0.68; p = 0.57). This remained so when the 5 pure GM cerebellar lesions were included in the pure GM lesions group (adjusted bexarotene-placebo difference 1.07 pu, 95% CI 0.32 to 1.83, p = 0.007). Significant treatment effects were not seen in individual voxels from pure WM lesions (adjusted bexarotene-placebo difference 0.21 pu, 95% CI À0.33 to 0.75, p = 0.445), but again were seen in voxels from pure GM lesions (adjusted bexarotene-placebo difference 1.45 pu, 95% CI 0.16 to 2.74, p = 0.037).

Proximity to brain surface
Lesions were classified depending on which concentric band they were mainly located in (as an indicator of proximity to the brain surface, see Fig. S1). In pure WM lesions, band number had no detectable influence on treatment effects (interaction term p = 0.709, Table 1). When individual voxels from pure WM lesions were classified by band, no statistically significant treatment effects were seen in any band, but the treatment effects were significantly different between the bands (interaction term p = 0.004), increasing with distance from the CSF from bands 1 to 5 (Table 3). There were also no significant treatment effects seen when whole pure CGM lesions were  classified based on their predominant cortical band (and the treatment effects were not significantly different between bands (interaction term p = 0.588)). However, individual voxels from these same CGM lesions showed statistically significant treatment effects in both bands, and the treatment effect was significantly greater in the innermost band (furthest from the CSF, interaction term p = 0.0003). Low lesion numbers precluded exploring the relationship between treatment effect and band number in DGM lesions or their voxels.

Baseline features
Baseline MTR (whole-lesion level, MTR-defined segment-level and voxel-level analyses) The tissue-specific baseline whole-lesion MTR was used to categorise lesions into quartiles (for pure WM lesions) or into submedian and supramedian categories (for CGM and DGM lesions, which had insufficient numbers for quartiles). Within whole pure WM, CGM and DGM lesions, no significant treatment effects were seen in any category of baseline MTR. However, in pure WM lesions, a significant difference was seen between the treatment effects in each quartile of baseline MTR (interaction term p = 0.004) indicating an increasing effect with decreasing baseline MTR (Table 1; Fig. 1). When the same lesions were segmented based on the baseline MTR quartile of each voxel, treatment effects in all segments did not reach statistical significance, and treatment effects were not significantly different between segments within pure WM, CGM or DGM lesions (Table 2).
When individual voxels from the same pure WM lesions were stratified into baseline MTR quartiles, treatment effects increased as the MTR quartile decreased (interaction term p < 0.0001) with a statistically significant   treatment effect observed in the lowest quartile (Table 3; Fig. 2). Conversely, in voxels from CGM lesions, treatment effects decreased as the MTR quartile decreased (p = 0.047) with a statistically significant treatment effect observed in the highest quartile of pure CGM lesions. The treatment effect in the small number of voxels from DGM lesions did not significantly vary between quartiles. To ensure that lesion-level, lesion segment-level and lesion voxel-level analyses examined the same tissue (to enable comparison of the methods), the whole-lesion analysis was repeated after excluding the outermost voxels layers in pure WM lesions. No significant treatment effects were seen in any quartile of baseline MTR (quartile 1: adjusted bexarotene-placebo difference (95% CI) 0.32 (À0.32 to 0.96) pu, p = 0.331; quartile 2: 0.16 (À0.47 to 0.79) pu, p = 0.630; quartile 3: 0.34 (À0.29 to 0.96) pu, p = 0.298; quartile 4-0.13 (0.76 to 0.51) pu, p = 0.689) with no significant difference between the treatment groups (interaction term p = 0.384).

Baseline T1 (whole-lesion level, T1-defined segment-level and voxel-level analyses)
The tissue-specific baseline whole-lesion T1 was used to categorise lesions into quartiles (for pure WM lesions) or into submedian and supramedian categories (for CGM and DGM lesions, which had insufficient numbers for quartiles). Within whole pure WM, CGM and DGM lesions, no significant treatment effects were seen in any category of baseline T1 (Table 1; Fig. 1), and no significant difference was seen between categories.
When the same lesions were segmented based on the T1 quartile of each voxel, treatment effects in all segments did not reach statistical significance, and treatment effects were not significantly different between segments within pure WM, CGM or DGM lesions (Table 2).
When individual voxels were stratified into quartiles based on their baseline T1 value, no statistically significant MTR treatment effects were seen in any quartile (pure WM, CGM or DGM lesions) and the treatment effects were not significantly different between any quartile (Table 3; Fig. 2).

Baseline size
No statistically significant treatment effect was seen in lesions based on their size in any tissue type (Table 1).

Spatial lesion-components
In pure WM lesions there was no difference in treatment effects between the inner core and outer rim ( Table 2). In CGM and in DGM the outer rim and first surrounding normal-appearing GM cuffs showed statistically significant treatment effects while the cores did not.

Correlation between voxel intensity between baseline and follow-up
The correlation coefficients between baseline and follow-up lesional voxel MTR across all lesion subtypes were strong (Pearson correlation coefficients 0.737 to 0.955, see Table S1).

Sample size calculations
Calculated sample sizes using the most promising wholelesion metrics are presented in Table 4, alongside their respective baseline-adjusted treatment effects observed during the CCMR-One trial (range 0.923-2.000 pu). For comparison, the total sample size necessary to detect the observed baseline-adjusted treatment effect averaged over all of a patient's lesions (0.144 pu) is 882 people (at 80% power) or 1180 people (at 90% power); and to detect the observed baseline-adjusted treatment effect averaged over a patient's submedian lesions (replicating the original trial primary efficacy outcome, 0.196pu) 580 people (at 80% power) and 776 people (at 90% power) would be required. Interaction test comparing treatment group differences between the segments 0.545 p values and 95% CIs are for the differences between bexarotene and placebo after adjustment (for the baseline value of that measure and the four binary minimisation factors). MTR, magnetisation transfer ratio; NA, not applicable.

Discussion
We have shown, for the first time in humans, that the response of demyelinated brain lesions to a remyelinating therapy varies within a person with multiple sclerosis, but also within individual lesions. Specifically, bexarotene had a significantly greater treatment effect on whole GM lesions but not whole WM lesions. Statistically significant treatment effects were detectable in WM lesion voxels with the lowest baseline MTR, but were missed when looking at the same tissue using whole-lesion or lesionsegment (regions defined by baseline T1-hypointensity and MTR) approaches. Voxel-level analyses also uncovered gradients of treatment effect in both WM and CGM lesional voxels (effects not visible at the whole lesionlevel), suggesting that treatment effects were lowest near brain surfaces. Finally, larger treatment effects were seen in the rims and surrounding regions of GM lesions compared to inner cores. Bexarotene also reduced the latency of prolonged visual evoked potentials in the same cohort, congruent with the conventional interpretation that increases in MTR reflect remyelination in chronic lesions. While histopathological studies have suggested that the potential for GM remyelination may be greater than WM, 7-9 this has not been shown directly in people with multiple sclerosis before, and studies assessing potential remyelinating treatments have not distinguished GM from WM lesions. 23-25 GM lesions are more extensive than WM lesions, particularly in progressive multiple sclerosis (22.6% of GM and 8.3% of WM in people with secondary progressive multiple sclerosis). 26 However, GM is intrinsically less myelinated than WM, hence previous difficulties detecting GM (compared with WM) lesions both histopathologically and with MRI. Using conventional MRI methods, less than 5% of GM lesions are detected, although with sequences such as double inversion recovery this increases to nearly 10%. In contrast in WM 70% of lesions are detected. 27 Accordingly, we detected substantially more WM (n = 1613) than GM (n = 106) lesions. Despite this, we still found an overall treatment effect in whole GM but not whole WM lesions. For future remyelination trials, we recommend sequences specifically designed to detect GM lesions.
WM lesions with higher MTR values may already have remyelinated to their maximum capacity, limited by the remaining axonal scaffold and gliosis. This may explain why treatment effects decreased with increasing baseline MTR in WM lesional voxels. Conversely axonal loss in WM lesions will limit the capacity for remyelination and lesion T1-hypointensity is thought to reflect this better than a reduced MTR 28,29 (although both T1-hypointensity and MTR correlate with axonal counts and myelin density 16,29 ). However, treatment effects did not vary by voxel T1 intensity in any lesion type, suggesting that MTR rather than T1 intensity (measured with a nonquantitative T1-weighted rather than a quantitative T1 sequence) better assesses remyelination potential. However, this effect is small compared with the differing effects seen in GM versus WM lesions. In CGM lesional voxels the opposite was seen: treatment effects increased with increasing baseline MTR. The cause of this is unclear, highlighting the need to better understand how different cellular substrates effect MTR in WM and GM lesions. For example, while the strongest correlate of MTR in WM lesions and CGM lesions is myelin density, astrocyte density (which may effect remyelination 30 ) and macrophage density contribute significantly less to CGM lesion MTR than they do to WM lesion MTR. 19    WM lesions close to the ventricles have a lower MTR, 12,13 a greater chance of becoming T1-hypointense 11 and longer T1 relaxation times than deep WM lesions (consistent with less remyelination). 14 While proximity to brain surfaces did not influence treatment effects in whole WM nor GM lesions, voxel-level analyses revealed significantly smaller treatment effects in WM lesional voxels near the ventricles which grew with increasing periventricular distance; and significantly smaller treatment effects in the outermost CGM lesion voxels (compared to the innermost voxels). In GM, "surface-in" gradients of demyelination, neuronal and astrocyte loss are seen, and conversely greater glia limitans and microglial activation is seen closer to the surface of the brain, 31,32 though the pathological substrate of the MTR gradient in WM remains unknown. The surface-in gradients of remyelination failure shown here may be driven by the same process that underlies cortical and periventricular gradients in demyelination and neuro-axonal loss (with most evidence invoking a CSF-mediated process secondary to meningeal inflammation 33 ).
A significant number of WM lesions remain chronically active in their periphery rather than their core, 14 where remyelination is known to be more extensive. 34,35 Given that we did not see an effect in whole WM lesions overall, it is perhaps unsurprising that we did not see a major difference in the core compared with the periphery of the same lesions. However, in GM lesions we additionally saw a greater treatment effect in lesion rims compared with their cores, consistent with the histopathological finding of greater remyelination in the periphery of GM lesions. 7 Voxel-level analyses identified statistically significant treatment effects which were missed with whole-lesion or lesion-segment (based on MTR and T1-hypointensity) level analyses. This speaks to the heterogeneity of pathological change within lesions, including that repeated bouts of demyelination and (partial) remyelination may occur anywhere within a single lesion. 36 To identify treatment effects within WM lesions, we recommend voxellevel approaches in voxels with the lowest baseline MTR (though overall treatment effects were greater and considerably easier to measure in GM lesions).
We estimated sample sizes for trials based on patientaveraged MTR measures in GM and brainstem lesions, and, based on the effect we observed, our study cohort was on the border of being sufficient to detect a treatment effect (80% power), and increasing the effect size from 1.3 pu to 1.8 pu (still well within a biologically plausible range for remyelination) reduces the total cohort size needed from 56 to 30 people. However, the optimal outcome measure requires consideration of multiple factors: the magnitude of the treatment difference a trial is being powered to detect must be both achievable and biologically meaningful (Table 4 representing 43%-60% of the maximum possible treatment effects suggested by MRI and exvivo data) 20 ; the outcome measure must be measurable in all participants (for example requiring GM lesions as an inclusion criterion) to avoid some participants not Sample size calculations for future remyelination trials using whole-lesion metrics (patient-level). 1 These values slightly differ to the Adjusted bexarotene-placebo differences presented in Table 1 because the former are only adjusted for baseline value while the latter are additionally adjusted for the four binary minimisation factors prespecified in the trial: age (≤40/>40 years), gender, trial centre/scanner (Cambridge/Edinburgh) and EDSS (≤4Á0/>4Á0 score). contributing to the outcome; and finally that the sample size is practical. Similarly, it is worth noting that this trial was 6 months long, and we do not know if bexarotene had had sufficient time for its maximal effect to be observed (VEPs continued to improve 28 months after stopping bexarotene). 37 After an acute inflammatory episode remyelination continues for at least a year, 38 so sensitivity to treatment effects may be meaningfully enhanced by extending trials by 3-6 months (though bexarotene's safety profile would have likely precluded this). Some other limitations are worth noting. First is the size of the study itself. It was powered to detect a 1.16 pu patient-averaged treatment effect across all lesions with a sub-median baseline MTR, but we observed a baselineadjusted effect of 0.196 pu in such submedian lesions and a 1.474 pu effect in pure GM lesions. To determine if the increase in MTR in whole submedian WM lesions we observed is significant at 80% power, we would have needed a total sample of 580 people, but only 44 people to show an effect in pure GM lesions (and even smaller samples if using DGM or brainstem lesions). Second, as already noted, is the specificity of MTR for myelin. As axonal regeneration is essentially not seen in the adult central nervous system, increases in MTR in chronic lesions can reasonably be interpreted as representing remyelination. However, other MRI techniques such as myelin water fraction and quantitative magnetisation transfer measures may increase specificity for myelin. 39,40 Third, FLAIR detects less than 5% of GM lesions found with histopathology,27 so those detected here may not be generalisable to all GM lesions. Fourth, corroboration in different remyelination trial datasets is needed to examine whether different mechanisms demonstrate such marked changes in GM lesions. Fifth, non-isotropic T2 and FLAIR were used for lesion identification and contouring, increasing partial volume effects at lesion surfaces (though this would affect both the bexarotene and placebo arms equally), so could not account for between-arm treatment effects. We minimised this by excluding the outermost WM lesion voxels yet still found biologically plausible results. Sixth, although GM lesions appear more sensitive, they were not detected in 28% of patients under study and, when present, each patient contributed small numbers of GM lesions (Table 4). If GM lesions are to be used as a primary outcome measure, MRI sequences more sensitive to them, and the presence of GM lesions as an inclusion requirement, will be necessary. Seventh, sequences to reliably distinguish paramagnetic rim lesions were not collected, precluding subanalysis of within-lesion spatial components in chronic active lesions. Eighth we cannot be certain that the same tissue was captured within a voxel between the baseline and follow-up scans, though minimised this by using~1 mm isotropic T1-weighted and MTR scans in halfway space, and we found high correlation between baseline and follow-up MTR intensity within individual lesional voxels (Supplementary Material), suggesting this was unlikely to have substantially influenced our results. Ninth, some analyses contained a statistically significant interaction term but no statistically-significant treatment effect in any quartile or band being examined (e.g. within WM lesional voxels from any band defined by distance to the CSF, Fig. 2). This suggests that the underlying factor (in this case distance to CSF) does influence treatment effects on remyelination, though no significant effect is seen individually within any band or quartile. Finally, we did not adjust for multiple comparisons since we are investigating a number of different hypotheses, and in such contexts correction can be inappropriate. 41,42 We explore these issues further in the Supplementary Materials. Nevertheless, there is a danger of spurious significant results, and p-values close to 0.05 should be interpreted sensibly with caution, and regarded as hypothesis-generating, for examination in future studies.
In conclusion, this study confirms that different lesions have different capacities for remyelination in response to bexarotene. Of the features assessed, GM lesions showed the greatest potential to remyelinate. Our results also show that in early phase remyelination trials, assessing lesions and voxels with greater potential to demonstrate remyelination, rather than many lesions with a lower collective potential for repair, significantly increases sensitivity.

Supporting Information
Additional supporting information may be found online in the Supporting Information section at the end of the article.
Data S1. Image Processing Pipeline, detailing the steps in lesion segmentation and tissue segmentation. Figure S1. Axial slice from 3D-weighted T1 image illustrating (A) the 10 white matter and deep grey matter (WMDGM) bands (with bands 1 and 10 excluded from analyses to mitigate partial volume effects); and (B) the 2 cortical grey matter (CGM) bands. Lesions are highlighted in green. Table S1. Pearson correlation of lesional voxel MTR between baseline and follow-up. A note on correction for multiple comparisons.