Predicting functional impairments with lesion‐derived disconnectome mapping: Validation in stroke patients with motor deficits

Focal structural damage to white matter tracts can result in functional deficits in stroke patients. Traditional voxel‐based lesion‐symptom mapping is commonly used to localize brain structures linked to neurological deficits. Emerging evidence suggests that the impact of structural focal damage may extend beyond immediate lesion sites. In this study, we present a disconnectome mapping approach based on support vector regression (SVR) to identify brain structures and white matter pathways associated with functional deficits in stroke patients. For clinical validation, we utilized imaging data from 340 stroke patients exhibiting motor deficits. A disconnectome map was initially derived from lesions for each patient. Bootstrap sampling was then employed to balance the sample size between a minority group of patients exhibiting right or left motor deficits and those without deficits. Subsequently, SVR analysis was used to identify voxels associated with motor deficits (p < .005). Our disconnectome‐based analysis significantly outperformed alternative lesion‐symptom approaches in identifying major white matter pathways within the corticospinal tracts associated with upper–lower limb motor deficits. Bootstrapping significantly increased the sensitivity (80%–87%) for identifying patients with motor deficits, with a minimum lesion size of 32 and 235 mm3 for the right and left motor deficit, respectively. Overall, the lesion‐based methods achieved lower sensitivities compared with those based on disconnection maps. The primary contribution of our approach lies in introducing a bootstrapped disconnectome‐based mapping approach to identify lesion‐derived white matter disconnections associated with functional deficits, particularly efficient in handling imbalanced data.

tially derived from lesions for each patient.Bootstrap sampling was then employed to balance the sample size between a minority group of patients exhibiting right or left motor deficits and those without deficits.Subsequently, SVR analysis was used to identify voxels associated with motor deficits ( p < .005).Our disconnectome-based analysis significantly outperformed alternative lesion-symptom approaches in identifying major white matter pathways within the corticospinal tracts associated with upper-lower limb motor deficits.Bootstrapping significantly increased the sensitivity (80%-87%) for identifying patients with motor deficits, with a minimum lesion size of 32 and 235 mm 3 for the right and left motor deficit, respectively.Overall, the lesionbased methods achieved lower sensitivities compared with those based on disconnection maps.The primary contribution of our approach lies in Abbreviations: AALCAT, automated anatomical labelling cortical atlas; AUC, area under the curve; CST, corticospinal tract; DSM, structural disconnection-symptom mapping; FWER, family-wise error rate; LSM, lesion symptom mapping; MD L , minority class with left motor deficit; MD R , minority class with right motor deficit; MLSM, multivariate lesion symptom mapping; pSDM, probabilistic structural disconnection map; ROC, receiver operating characteristics; SCCAN, sparse canonical correlation analysis; SCCAN-DSM b , bootstrapped SCCAN-DSM; SVR, support vector regression; SVR-DSM b , bootstrapped SVR-DSM; VDSM, voxel-based disconnection-symptom mapping; VLSM, voxel-based lesion-symptom mapping.
introducing a bootstrapped disconnectome-based mapping approach to identify lesion-derived white matter disconnections associated with functional deficits, particularly efficient in handling imbalanced data.
K E Y W O R D S bootstrap sampling, disconnectome mapping, focal lesion, lesion-symptom mapping, motor deficit, stroke, support vector regression

| INTRODUCTION
Stroke is a leading cause of disability and cognitive dysfunctions worldwide (Feigin et al., 2017).Lesions resulting from strokes can cause structural damage to white matter fibres, disrupting both structural and functional brain networks and leading to neurological deficits (Broca, 1861;Damasio, 1989).Over the past decades, various approaches have emerged to explore the connection between lesion sites and neurological symptoms (Karnath et al., 2018).One pivotal technique is voxelbased lesion-symptom mapping (VLSM), initially introduced by (Bates et al., 2003).VLSM serves as the foundational method for mapping lesions in diverse cognitive domains (Mock et al., 2022;Moore & Demeyere, 2022;Weaver et al., 2021).It offers advantages over overlapsubtraction approaches by generating statistical maps that facilitate the inference of brain-behaviour relationships in stroke patients based on lesioned voxels and functional scores (Arnoux et al., 2018;Bates et al., 2003;Karnath et al., 2018).
Despite its potential, studies reveal that VLSM can only explain a modest percentage (10%-35%) of the variance in cognitive and motor performance (Arnoux et al., 2018;Ouin et al., 2022;Puy et al., 2018;Salvalaggio et al., 2020).Additionally, the statistical power of VLSM is significantly affected by the heterogeneity among patients, encompassing differences in lesion locations, volumes, neurological symptoms, and lesion frequency (Godefroy et al., 1998).To address these limitations, researchers have developed several multivariate lesionsymptom mapping (MLSM) approaches for a more comprehensive assessment of the relationship between lesion sites and symptoms (Smith et al., 2013;Zhang et al., 2014).Among MLSM methods, lesion-symptom mapping utilizing multivariate support vector regression (SVR-LSM) and sparse canonical correlation analysis (SCCAN-LSM) have demonstrated higher efficiency in identifying brain structures associated with functional deficits in stroke patients (Pustina et al., 2018;Zhang et al., 2014).SVR-LSM involves projecting input data into a high-dimensional feature space through a nonlinear transform, enabling the inference of the relationship to lesion sites beyond isolated voxels (Ivanova et al., 2021).SCCAN-LSM seeks linear transformations to maximize correlation between functional scores and voxel values in a low-dimensional space (Hotelling, 1936;Kuhn & Johnson, 2013;Stevens, 2009).Overall, a main drawback of univariate and multivariate LSM methods lies in their inefficiency in handling small lesions located in strategically crucial areas highly linked to severe cognitive deficits that have a significant impact on the structural connectivity between brain regions (Boes et al., 2015;Puy et al., 2018;Siegel et al., 2016;Weaver et al., 2021).
In recent years, structural disconnection-symptom mapping (DSM) has emerged as a valuable approach for examining the impact of focal stroke lesions on largescale brain networks associated with executive, cognitive, and sensory-motor deficits.The brain disconnectome mapping technique investigates structural white matter pathway disruptions induced by lesions (Foulon et al., 2018;Salvalaggio et al., 2020;Siegel et al., 2016).(Foulon et al., 2018) introduced a method involving the estimation of voxelwise probability maps of lesion-caused white matter disconnections using diffusion imaging data from healthy subjects.The disconnection patterns are then statistically assessed for their correlation with functional deficits through regression analysis (Darby et al., 2019(Darby et al., , 2017;;Fasano et al., 2017).Based on this approach, several studies have shown that structural disconnection can account for 16%-58% of the variance of motor, language, spatial attention, and spatial memory performance, with a higher prediction power found for motor deficits (Corbetta et al., 2015;Salvalaggio et al., 2020;Siegel et al., 2016).Compared with lesionsymptom mapping, voxel-based DSM (VDSM) provides distinct advantages by enabling the identification of disrupted white matter pathways and elucidating the underlying neural mechanisms of behavioural deficits.
In DSM studies, researchers commonly employ massunivariate techniques, encountering a limitation analogous to that of univariate LSM methods.Both LSM and DSM exhibit a pronounced dependence on the variance of lesion or disconnection scores at each voxel, contributing to statistical power.In logistic regression analyses, the optimal power is achieved when approximately half of the patient population exhibits a lesion in a given voxel (Kimberg et al., 2007).However, this scenario is infrequent in stroke patients, where only a relatively small subset typically presents lesions associated with varying degrees of functional deficits compared with the broader patient population (Arnoux et al., 2018;Kimberg et al., 2007).Although previous lesion-symptom mapping studies have predominantly focused on lesion frequency, less emphasis has been placed on the behavioural effect size (Gläscher et al., 2009;Shahid et al., 2017).This oversight can significantly affect regression outcomes, particularly in the context of imbalanced data from stroke patients, where a minority subgroup exhibits deficits.
The main objective of this study was to develop a DSM approach, termed SVR-DSM, based on multivariate support vector regression (SVR) (Zhang et al., 2014) to identify brain structures and white matter pathways associated with functional deficits in stroke patients.To demonstrate the proof of concept, we utilized imaging data from stroke patients exhibiting motor deficits, primarily focusing on the well-known anatomy of hemiparesis.In contrast to conventional mass-univariate lesion-symptom mapping, which typically generates statistical t-maps composed of individual lesioned voxels associated with functional (herein motor) deficits, our approach involved lesion-derived structural disconnection maps obtained from the tractography results of healthy subjects.To address the challenge posed by imbalanced datasets in stroke patients with focal structural lesions, where the majority showed no motor deficits, we employed bootstrap bagging to balance the sample size of stroke patients displaying motor deficits against those without deficits.For the multivariate regression analysis, we applied multivariate SVR to improve the outcomes of the regression analysis by modelling complex and non-linear relationships between lesion locations and motor deficits.Finally, we developed a clinical validity tool to assess the effectiveness, specificity, and generalizability of the novel approach in identifying brain structures associated with motor deficits in stroke patients.

| Subject recruitment
Imaging data from a total of 340 stroke patients (207 males and 133 females; mean age: 63.9 ± 10.5 years with a range of 40-81 years) were used from the Groupe de Réflexion pour l' Evaluation Cognitive Vasculaire (GRECogVASC) cohort (Godefroy et al., 2012).The study was conducted in accordance with the principles embodied in the declaration of Helsinki and approved by the regional investigational review board (Comité de Protection de Personnes Nord-Ouest II, NCT01339195).All participants gave written informed consent to participate in the study.This study included patients hospitalized for acute (<30 days) cerebral infarct or haemorrhage with initial positive imaging, a reliable informant, and no previously diagnosed conditions affecting cognition (except for previous stroke) ( Barbay et al., 2018).The aphasia, hemineglect and prior stroke were not considered as exclusion criteria (Puy et al., 2018).Further details regarding the clinical characteristics of the cohort can be found in Barbay et al. (2018).Clinical, neuropsychological and magnetic resonance imaging (MRI) examinations were conducted 6 months post-injury to assess the demographic characteristics of the stroke cohort, as reported in Tables 1 and S1.
Each patient underwent the full GRECogVASC neuropsychological battery, using the French adaptation of the Harmonization Standards battery (Puy et al., 2018).(Barbay et al., 2018).
In this study, we only used the (limb) motor score (facial paresis was not included as it may be due to lesion outside the central motor system) corresponding to 10 minus the sum of the upper (arm weakness score from 0 [no deficit] to 4 [no movement]) and lower limb (leg weakness score from 0 [no deficit] to 4 [no movement]) items in the NIHSS (no impairment = 10; major impairment = 2) (Arnoux et al., 2018).
Out of the 340 patients, motor impairment was observed in 64 patients (32 with left motor deficit [MD L ], 29 with right motor deficits [MD R ], and 3 with both deficits).In our analysis, a motor score of 9 or less was considered indicative of motor impairment.For method development, two minority classes (MD L [n = 35] and MD R [n = 32]) and one majority class (patients with no motor deficit, n = 276) were considered.Three patients with bilateral motor deficits were included in both minority classes.The characteristics of the minority and majority classes are presented in Table 2.
Using a previously validated analysis (Arnoux et al., 2018), a lesion mask was obtained for each patient by manually segmenting lesions on 3D T1-weighted MRI data using MRIcron (http://www.mccauslandcenter.sc.edu/mricro/mricron/), following Harmonization standards and the STandards for ReportIng Vascular changes on nEuroimaging (STRIVE) criteria (Alexander et al., 2010;Hachinski et al., 2006;Wardlaw et al., 2013).The FLAIR, T2, and T2*w images were used for visual inspection to accurately delineate lesions from the 3D T1 images (see Supporting information for more details) (Godefroy et al., 1998).The T1w data were subsequently normalized to the MNI152 template using the standard normalization procedure in SPM12 (Ashburner & Friston, 1999), which involves a combination of linear/ affine transformation and some form of spatial non-linearity/warping.The lesion segmentation results were then converted to binary lesion masks for further analysis (Figure 1).The volume of each lesion was computed in the template space.The lesioned brain structures were determined using the AALCAT (NiiStat) template (Rorden et al., 2007).As ground truth used for performance evaluation, two binary masks were generated for the left and right central motor systems including the corticospinal tract (CST) (Catani & Thiebaut de Schotten, 2008).

| Structural DSM based on bootstrap bagging
The proposed method, SVR-DSM, integrated brain disconnectome mapping and bootstrapped multivariate SVR analysis.To demonstrate the proof of concept, we utilized imaging data from stroke patients exhibiting upper-lower limb motor deficits, primarily focusing on the wellknown anatomy of hemiparesis, including the CST in the central motor system.The processing pipeline of the method relies on three main steps as illustrated in Figure 1.

| Step 1-Structural disconnectome mapping
For each patient, a probabilistic structural disconnection map (pSDM) was derived using the virtual tractography method (Kuceyeski et al., 2013;Wodeyar et al., 2020).Initially, a whole-brain deterministic tractography was conducted on high resolution DWI data from 403 healthy controls (mean age 62.87 ± 13.47 years) in the CamCAN dataset (Shafto et al., 2014;Taylor et al., 2017).The DWI data were initially subjected to preprocessing steps detailed in the Supporting information and (Khalilian et al., 2024).Subsequently, for each patient, white matter streamlines from the healthy subjects passing though the patient's lesion were selected to generate a pSDM specific to the patient.
In the pSDM map, each voxel was represented by a disconnection probability ranging from 0 to 1 based on the number of healthy subjects who exhibited a disconnection in that voxel (Thiebaut de Schotten et al., 2015Schotten et al., , 2011)).A higher disconnection probability at a voxel indicated a greater likelihood of disconnection in that white matter pathway.This approach accounted for individual differences in lesion location and extent, providing a comprehensive representation of the white matter disconnection associated with each patient's lesion (see Supporting information for more details).

| Step 2-Bootstrap bagging
Given the highly imbalanced nature of the dataset involving stroke patients with motor deficits, where the majority of patients with lesions showed no motor deficit, we utilized bootstrap bagging.This approach aimed to balance the sample size of stroke patients displaying motor deficits (considered the minority class) against those without deficits (considered the majority class) (Lee et al., 2020;Nikolaidis et al., 2020).To enhance the statistical power and improve the generalization performance of the regression analysis, 100 bags were generated for each minority class based on the recommendations in (Ngo et al., 2022) (Figure 1).To balance samples in each bag, N patients were randomly selected from the majority class, which included 276 patients with no motor deficit.These selected patients were then added to the group of patients with motor deficits in the minority classes, with N being 35 and 32 for MD R and MD L , respectively.

| Step 3-Multivariate SVR analysis
Finally, we employed the multivariate SVR analysis to model complex and possibly non-linear relationships between lesion locations and motor deficits.For DSM F I G U R E 1 Processing pipeline for structural disconnection-symptom mapping using bootstrap multivariate support vector regression.(a) A binary lesion masks was generated for each patient based on T1-weighted magnetic resonance images; (b) a probabilistic structural disconnection map (pSDM) was constructed for each patient using diffusion imaging data from 403 healthy ageing individuals; (c) the bootstrap aggregation sampling technique was employed to generate bootstrapped bags of lesions with replacement to balance data samples for patients with and without deficits; (d) the support vector regression-disconnection-symptom mapping (SVR-DSM) method was used to obtain a β map for each bag using regression analysis and permutation testing; (e) the β maps were statistically compared using a onesample t-test to generate a statistical map (family-wise error rate [FWER] corrected) across all bags.
using SVR (SVR-DSM), a parametric β map was obtained for each bag through the regression analysis and permutation testing for the left/right motor deficit.Permutation testing was used to randomly shuffle the motor scores in each bag and generate a series of pseudo-β maps.At each voxel, this step allowed testing the null hypothesis that the lesion-motor score association was the same as that derived from random data by counting the number of permutations that had pseudo b values greater than the one obtained using the nonpermuted data.We used the SVR-DSM method for both linear (SVR À DSM b LN ) and logistic regression (SVR À DSM b LG Þ on each bag using continuous and categorical motor scores (deficit/no-deficit) and dichotomized structural disconnection maps after regressing out lesion volumes from the motor scores and structural disconnection volumes from the dichotomized structural disconnection maps, with p < .005and 1000 permutations.To dichotomize the pSDMs, a threshold of 10% was applied.This threshold signifies that disconnection was observed at each voxel in lesion-derived disconnection maps for at least 40 healthy subjects.The choice of 10% was made to strike a balance between sensitivity and the false positive rate (FPR), ensuring the removal of voxels with weak disconnection probabilities across subjects.Lower and higher thresholds resulted in poor specificity and sensitivity due to overestimated and underestimated tract size for DSM, respectively (Wawrzyniak et al., 2022).
We subsequently conducted a statistical comparison of the β maps obtained from all the bags using a onesample t-test (SPM's second level routine, p < .005)for group analysis across the bags.Subsequently, a cluster analysis with family-wise error rate (FWER) correction was performed to adjust the p values of individual cluster tests to a significance level of p < .005.Numerical calculations were performed using the computational resources of the MATRICS platform at University of Picardie Jules Verne, Amiens, France.

| Conventional univariate and multivariate LSM methods
We conducted a comparative analysis of our method with commonly used techniques for lesion-symptom mapping, namely VLSM, SVR-LSM and SCCAN-LSM using lesion masks from patients.In VLSM, a mass-univariate approach, the functional scores' distributions in patients with lesions are compared in each voxel, yielding t statistics (Figure S2) reflecting the degree to which the presence of a lesion in a voxel statistically correlates with functional deficits.In contrast to VLSM, multivariate LSM approaches, including SVR-LSM and SCCAN-LSM, aim to capture the complex interactions that contribute to observed functional outcomes.In this regard, SVR-LSM (Zhang et al., 2014) uses intervoxel correlations to infer the relationship to lesion sites using SVR analysis.SCCAN-LSM, however, assesses associations between lesion locations and functional deficits by identifying sparse linear transformations that maximize the correlation between functional scores and voxel values in a lowdimensional space (Pustina et al., 2018) (see Supporting information for more details).
In our study, we initially applied VLSM, SVR-LSM and SCCAN-LSM to the lesion masks and motor scores from all 340 patients.Subsequently, we adapted these approaches for DSM, namely VDSM, SVR-DSM and SCCAN-DSM, to examine the associations between binarized structural disconnection maps and motor scores.With a focus on multivariate approaches similar to SVR À DSM b LN and SVR À DSM b LG , we extended our analysis using the SCCAN-DSM method on each of the bags, employing both linear (SCCAN À DSM b LN ) and logistic (SCCAN À DSM b LG ) regression models to assess the relationship between continuous and categorical motor scores (deficit/no-deficit) and dichotomized structural disconnection maps.
In regression analyses, lesion volumes were considered as a covariate for both LSM and DSM.All analyses were confined to voxels exhibiting overlap in at least five patients, following the recommendations in (Arnoux et al., 2018).A significance threshold of p < .005(1000 permutations) was applied to mitigate false positives with conservative statistical thresholds (Ivanova et al., 2021).For SCCAN À DSM b LN and SCCAN À DSM b LG , the final correlation maps from the bags were fed to a one-sample t-test (SPM's second level routine, p < .005)for group analysis across the bags.To control for false positives, we performed a cluster analysis on the statistical maps obtained by all the LSM and DSM methods, followed by FWER correction to adjust p values of individual cluster tests (p < .005).Subsequently, based on the statistical maps obtained from each approach, we utilized the automated anatomical labelling cortical atlas (AALCAT) (www.nitrc.org/projects/niistat/),which comprises 116 grey matter areas and 34 white matter tracts, to identify brain structures associated with right and left motor impairments.

| Performance and clinical evaluation
We compared the performance of the bootstrapped SVR-DSM methods with the other approaches employed in this study at two levels.Initially, the 3D statistical map generated by each approach was binarized based on a significance level of p < .005after FWER correction.Subsequently, these binarized maps were compared in terms of sensitivity, FPR and dice score with the ground truth binary masks generated for the left and right central motor systems (Catani & Thiebaut de Schotten, 2008;Cho et al., 2007;Emos et al., 2022;Godefroy et al., 1998;Jang et al., 2008;Kassubek et al., 2005;Kunimatsu et al., 2007;Lindenberg et al., 2010;Zhu et al., 2010).This comparison was specifically conducted to evaluate the sensitivity of each approach in identifying brain structures associated with primary motor systems (Price et al., 2017;Zhang et al., 2015).All three measures range from 0 to 1 (100%), with 1 (100%) indicating complete overlap between the two binary maps for dice and no false negatives or positives for sensitivity and specificity, respectively.
At the second level of comparison, we used a clinical validation metric grounded in receiver operating characteristics (ROC) to evaluate the clinical efficacy of each approach in discerning patients with or without deficits.This evaluation was based on sensitivity (the percentage of patients correctly identified as having a deficit) and FPR.The intersection of binarized statistical maps, confined to voxels within the ground truth masks, and the lesion masks for either left or right motor deficits was initially computed for each patient, quantifying the number of overlapping voxels.For all patients, an overlap vector was subsequently generated.The overlap vectors, along with categorical vectors indicating the presence or absence of motor deficits, were subsequently compared with generate the ROC curves.The lesion size was used as a threshold, determined by the number of overlapping voxels.To assess the overall performance of each method in distinguishing patients with and without motor deficits, we calculated the area under the curve (AUC) for different methods.To identify the minimum lesion size at which each method demonstrated optimal performance, we determined the best cutoff point on each method's ROC curve, considering the trade-off between maximum sensitivity and minimum FPR (Zhang et al., 2015).

| Lesion spatial distribution
Figure 2 displays the lesion and structural disconnection overlap maps obtained for all 340 stroke patients as well as those included in the minority and majority classes.In comparison with the median lesion volume (Table 2) obtained for the majority class (patients without motor deficit) with left (.74 cm 3 ) and right (.62 cm 3 ) hemispheric lesions, the median lesion volume was significantly higher (Mann-Whitney U test, p < .0001)for MD R (1.56 cm 3 ) and MD L (5.84 cm 3 ).The maximum lesion overlap was found for MD L in grey matter structures, including the insula, putamen, pallidum, Heschl, rolandic operculum, superior temporal gyrus and inferior frontal operculum in the right hemisphere.For MD R , only the left putamen showed maximum overlap with lesions.The lesion-affected white matter projections were the CST, internal capsule, uncinate and arcuate inferior occipital-frontal fasciculus, cortico-ponto-cerebellum, optic radiations and long and anterior segments.
F I G U R E 3 Parametric maps (p < .005,1000 permutations, family-wise error rate [FWER] correction) of the mass-univariate voxelbased lesion-symptom mapping (VLSM), multivariate support vector regression (SVR) and multivariate sparse canonical correlation analysis (SCCAN) used to assess associations between the binary lesion masks and motor scores for all 340 patients.For all maps, coloured voxels show significant associations with motor deficits across all patients with p < .005and 1000 permutations after FWER correction.For VLSM, the statistical map obtained with p < .05 is also shown as commonly reported in different studies.Maps are overlaid on the MNI 152 template in MRIcron.VLSM, voxel-based lesion-symptom mapping; SVR-LSM, multivariate SVR-lesion-symptom mapping; SCCAN-LSM, multivariate SCCAN-lesion-symptom mapping.The ground truth binary masks were generated for the motor system including the corticospinal tracts based on prior anatomical knowledge.
The structural disconnection overlap maps clearly demonstrated that structural damage permeated both the left and right hemispheres through the corpus-callosum beyond the lesion sites.The spatial distribution of structural disconnections was more uniform and highly overlapping in comparison with lesions.The increase in the fraction of overlap across structural disconnection maps could potentially boost the statistical power of lesionsymptom mapping.A larger proportion of white matter disconnected tracts was seen in the bilateral corticospinal pathways, internal capsule, corticoponto-cerebellum and corpus-callosum for both minority classes.

| Lesion-symptom mapping
Figure 3 illustrates the statistical maps generated by VLSM, SVR-LSM and SCCAN-LSM for lesion-motor deficit mapping using the lesion masks from all 340 patients, with a significance threshold of p < .005and 1000 permutations.The corresponding Table S2 presents the percentage of overlap between the binarized statistical maps (clusterwise FWER corrected) obtained by each method and the grey and white matter structures in the AALCAT atlas.
For VLSM, the thresholded parametric map (p < .005)revealed significant associations between the left motor deficit and damage in the white matter fibres inside (CST, internal capsule) and outside (uncinate fasciculus, arcuate fasciculus, inferior occipital-frontal fasciculus, cortico-ponto-cerebellum, long and anterior segments) the motor system (Table S2).In contrast, for the right motor deficit, lesioned voxels only survived with p < .05 in the left hemisphere after correction for multiple comparisons.SVR-LSM resulted in larger clusters of lesioned voxels associated with motor deficits within the same brain structures as those obtained by VLSM.SCCAN-LSM significantly eliminated lesioned voxels outside the motor system except for the putamen.However, F I G U R E 4 Parametric maps (p < .005,1000 permutations, FWER correction) of the VDSM, SVR-DSM and SCCAN-DSM used to assess associations between the binary lesion/structural disconnection masks and motor scores for all 340 patients.For all maps, coloured voxels show significant associations with motor deficits across all patients with p < .005and 1000 permutations after FWER correction.VDSM, voxel-based structural disconnection-symptom mapping; SVR-DSM, multivariate SVR-structural disconnection-symptom mapping; SCCAN-DSM, multivariate SCCAN-structural disconnection-symptom mapping.The ground truth binary masks were generated for the motor system including the corticospinal tracts based on prior anatomical knowledge.
it highlighted voxels in the bilateral CSTs, revealing a multicollinearity problem for both motor deficits.

| Structural DSM
Figure 4 shows the statistical maps (p < .005,1000 permutations) of VDSM, SVR-DSM and SCCAN-DSM used to assess the relations between the binarized pSDM masks (with a probability threshold of .1)and motor deficits in all 340 patients.The percentage of overlap between the binarized statistical maps (clusterwise FWER corrected) obtained by each method and grey and white matter structures in the AALCAT atlas is shown in Table S3.
In comparison with the LSM methods, the DSM methods resulted in higher overlap within the critical regions inside the motor system for the left motor deficit (Table S3).All three methods, however, resulted in considerably high false positive rates within regions unrelated to the motor system.For the right motor deficit, only SVR-DSM and SCCAN-DSM were able to correctly identify voxels associated with motor deficits, with relatively fewer false positives in comparison with VDSM for the left motor deficit.LG : bootstrapped multivariate SCCAN-structural disconnection-symptom mapping using linear/logistic regression.

| Bootstrapped multivariate structural DSM
T A B L E 3 Sensitivity, specificity and dice values computed by comparing the ground-truth binary masks and binarized statistical maps (p < .005,1000 permutations, FWER correction) for each method.motor system (CST, internal capsule) for the left motor deficit, in comparison with SCCAN À DSM b .For the right motor deficit, both SVR À DSM b and SCCAN À DSM b demonstrated comparable results.Overall, both methods employing linear regression yielded false positive voxels outside the motor system, characterized by lower average β or correlation values, as illustrated in Figure S1 and Table S3, particularly in prefrontal and frontal areas.

| Performance evaluation and clinical validation
Table 3 presents dice, sensitivity and specificity values obtained by comparing the ground-truth binary masks with binarized statistical maps ( p < .005,1000 permutations, clusterwise FWER corrected) for each method.As shown, VLSM, SCCAN-LSM, VDSM and SCCAN-DSM exhibited lower sensitivities and dice values than SVR-LSM across all 340 patients.Bootstrapping significantly enhanced the sensitivity of DSM using both linear and logistic regression.SVR-DSM (both linear and logistic regressions) shows higher sensitivity and dice scores for the left motor deficit compared with the other approaches.For the right motor deficit, only SVR À DSM b LG exhibited superior performance compared with SCCAN À DSM b LG .However, for linear regression, the sensitivity of SCCAN À DSM b was slightly better than SVR À DSM b .
Figure 6 illustrates the ROC curve for each method, where each point on these curves represents the sensitivity to correctly identify patients with motor deficiency.In Table 4, the AUC values and corresponding cut-off points (minimum lesion size in mm 3 required to detect a motor deficit) are presented, highlighting the trade-off between sensitivity (true positive rate) and specificity (true negative rate).The table also includes the highest sensitivity and the maximum number of lesioned voxels at which the maximum sensitivity is achieved for each method.
As demonstrated, SVR À DSM b LN and SVR À DSM b LG exhibited superior ROC performance, featuring higher sensitivity and AUC, for the identification of patients with left/right motor deficits compared with other methods.Among these, SVR À DSM b LG yielded the highest sensitivity-specificity trade-off (80%-87%) at the cutoff point, with a minimum lesion size of 32 and 235 mm 3 for the right and left motor deficits, respectively.For right motor deficits, in addition to the SVR-based methods, both SCCAN À DSM b LN and SCCAN À DSM b LG also displayed high AUC (.84 and .86)and sensitivity (79% and 78%) at the cut-off point, with minimum lesion sizes of 59 and 29 mm 3 , respectively.Overall, the LSM methods achieved lower sensitivities compared with the DSM methods.Additionally, VLSM and SCCAN-LSM did not yield cut-off points and showed relatively lower AUCs.

| DISCUSSION
In this study, we introduced a bootstrapped multivariate structural DSM method to explore the associations between lesion-derived structural disconnection maps and left/right motor deficits.The implementation of the bootstrap aggregation sampling technique enabled F I G U R E 6 Receiver operating characteristic (ROC) curves showing the sensitivity and false positive rate of each method to correctly identify patients with motor deficiency.No ROC curve could be obtained for VDSM for the right motor deficit.
the balancing of data across stroke patients with and without motor deficits.This approach enhanced the robustness of our findings by addressing variability in lesion locations and sizes, thereby minimizing potential biases.In the subsequent sections, we delve into the key findings and implications of our analysis, shedding light on the nuanced connections between structural disconnections and motor impairments.

| Univariate and MLSM
Our study demonstrated that conventional VLSM analysis using lesion masks could effectively identify brain structures significantly associated with motor deficits within the primary motor systems, provided that the sample size is sufficiently large to increase statistical power.This finding aligns with previous studies (Arnoux et al., 2018;Wawrzyniak et al., 2022).However, despite its promise, this method was plagued by numerous false positives (Type I error, p < .05)outside the motor system within the uncinate, arcuate, inferior occipital-frontal fasciculus, cortico-ponto-cerebellum, long and anterior segments (Wawrzyniak et al., 2022).Decreasing the significance level from p < .05 to p < .005could reduce false positives for the left motor deficit, supporting the findings (Ivanova et al., 2021), which demonstrated that univariate LSM methods with conservative FWER thresholds significantly reduce the proportion of false positives.
Our results further support the findings of previous research (Ivanova et al., 2021), demonstrating that there is no substantial difference between VLSM and multivariate SVR-LSM methods in their capacity to detect and localize networks, especially when significance levels are appropriately adjusted.Overall, the outcomes of both Note: These values are reported at maximum sensitivity (SEN max ), the cut-off points representing the optimal trade-off between sensitivity and specificity as well as the minimum lesion size (mLS) for each method, measured in mm 3 .Abbreviations: AUC, area under the curve; GT, ground truth; SDM, structural disconnection map; SEN, sensitivity; SPE, specificity; SCCAN, sparse canonical correlation analysis; SVR, support vector regression; VLSM; voxel-based lesion-symptom mapping.
methods exhibited a clear bias towards regions with higher lesion frequencies (Cho et al., 2007;Wawrzyniak et al., 2022).Our finding is partly consistent with the finding of Zhang et al. (Zhang et al., 2014), who reported a higher sensitivity and lower FPR for the multivariate SVR-LSM.This method was shown to reduce false positives by excluding voxels with weak lesion-symptom relations when compared with VLSM.Interestingly, our results revealed a higher FPR for SVR-LSM compared with VLSM, even at p < .005.Despite having lower sensitivity compared with both VLSM and SVR-LSM, SCCAN-LSM with optimal sparseness successfully identified voxels associated with motor deficits within the main motor system and eliminated false positive voxels.However, it faced challenges related to multicollinearity, highlighting voxels in the bilateral CSTs for both left and right motor deficits (Godefroy et al., 1998;Price et al., 2017;Pustina et al., 2018).Additionally, it has been demonstrated that incorporating lesion volume as a covariate significantly enhances the spatial accuracy of both univariate and multivariate LSM methods (DeMarco & Turkeltaub, 2018;Sperber & Karnath, 2017;Zhang et al., 2014).

| Univariate and multivariate structural DSM
We further performed structural DSM using VDSM, SVR-DSM, and SCCAN-DSM.To generate the probabilistic disconnection maps, it is recommended to perform fibre tracking using a normative cohort of at least 25 healthy subjects, a necessary condition to account for over 90% of the variance in the tractograms, as demonstrated in comparison with a cohort of 187 subjects aged 18 to 84 years (Wawrzyniak et al., 2022).We used DWI data from 403 healthy controls included in CamCAN (Stage 2, (Shafto et al., 2014;Taylor et al., 2017) within the same age range (40-80 years) as stroke patients.This approach enabled a more robust characterization of structural differences across elderly subjects, who often exhibit enlarged ventricles and sulci, as well as varying degrees of atrophy in grey and white matter (Ridwan et al., 2021).
In our study, before conducting permutation testing, the disconnectome maps were binarized to mitigate bias resulting from heteroscedasticity in general linear modelling errors.We applied a probability threshold of .1 to reduce the variance of disconnection probabilities, following recommendations in Huang et al. (2006) and Wawrzyniak et al. (2022).Lower thresholds have been associated with poor specificity, false positives and issues such as low spatial accuracy and overestimated tract size (Wawrzyniak et al., 2022).
Consistent with findings in Wawrzyniak et al. (2022), our results demonstrate higher sensitivity with the structural DSM methods (VDSM, SVR-DSM and SCCAN-DSM) compared with the lesion-symptom mapping methods when utilizing binarized disconnection maps to identify critical regions within the main motor systems associated with the left motor deficit.However, this heightened sensitivity came at the cost of a significantly elevated FPR.For the right motor deficit, SVR-DSM and SCCAN-DSM outperformed VDSM, showing lower false positives, whereas VDSM exhibited low sensitivities and dice values.The FPR was also mitigated through permutation-based correction for multiple comparisons.Aligning with other studies (Arnoux et al., 2018;Karnath et al., 2018;Pustina et al., 2018), we observed that FWER remained conservative even with p < .05,potentially removing true positives, especially when the entire unbalanced data set was used for LSM or DSM.

| Bootstrapped multivariate structural DSM
Utilizing the bootstrap bagging technique, our methodology aimed to enhance the voxel-level lesion ratio by addressing the imbalance between patients with and without deficits.By aggregating predictions across diverse bags, we mitigated the impact of unbalanced datasets, a crucial factor known to substantially reduce classification model accuracy (Krawczyk, 2016).Furthermore, our approach contributed to refining lesion frequency within each bag, thereby bolstering the statistical power of linear and logistic regression analyses.By utilizing balanced data, SVR À DSM b showed superior performance (higher sensitivity and lower FPR) over other univariate and multivariate LSM and DSM methods.Our approach provided a solution to improve the results of lesion-symptom mapping, which has been shown to be highly prone to the lesion frequency and the variance across the behavioural scores at each voxel (Arnoux et al., 2018;Kimberg et al., 2007;Rudrauf et al., 2008).In most LSM studies, data on deficits of interest have been found to be highly unbalanced, as the number of patients with motor, cognitive and other functional deficits was significantly lower than those without, with a frequency ranging from 10% to 20% in stroke patients.The data imbalance can lead to inaccuracies in assessing the impact of lesions on patient outcomes (Godefroy et al., 2013;Wawrzyniak et al., 2022).In line with the findings reported by (Arnoux et al., 2018), we also found that the method based on logistic regression provided more reliable results compared with linear regression by removing false positive voxels.In comparison with SVR À DSM b , SCCAN À DSM b showed lower sensitivity for the left motor deficit, with false positive voxels in frontal areas exhibiting high correlations with motor deficits.

| Performance and clinical evaluation
We evaluated the performance of various methods in terms of dice, sensitivity, and FPRs.This assessment involved comparing binarized statistical maps with the ground truth, which included the CST for upper and lower limb motor deficits, excluding facial paresis.Overall, for both left and right motor deficits, SVR-DSM, SVR À DSM b LN and SVR À DSM b LG demonstrated higher sensitivities.However, SVR À DSM b LG exhibited the best performance, characterized by lower FPRs compared with the other two methods.In contrast, VLSM, VDSM and SCCAN showed low sensitivities and high FPRs.
Our results align with previous findings (Wawrzyniak et al., 2022), which reported low dice coefficients (average of 3%) for VLSM and higher values (21% on average) for the DSM method in stroke patients.These findings indicated an association between the CST damage and contralateral hemiparesis.SVR À DSM b LG achieved a dice coefficient of 50% and 34% for left and right motor deficits, respectively, compared with VLSM (6.8% and .01%).Pustina et al. (2018) reported higher dice coefficients for SCCAN when employing a sparseness value of .045,using data from 131 patients with chronic stroke lesions in the middle cerebral artery, along with synthetic behavioural scores.Our analysis indicated that the performance of SCCAN was highly influenced by the sparseness value determined in our study through fourfold cross-validation at various sparseness values ranging from .01 to .9.Overall, the low dice coefficients are primarily due to high false positives and small target areas (Ivanova et al., 2021;Pustina et al., 2018).
To evaluate the clinical validity of various methods, we employed ROC curves to determine the cut-off point that balances sensitivity and specificity (FPR).The validation approach based on clinical outcomes, rarely used in prior studies, enhanced the robustness of our work.In our investigation, SVR À DSM b LN and SVR À DSM b LG exhibited superior sensitivity (with higher AUC) in identifying patients with left/right motor deficits compared with alternative methods.SVR À DSM b LG demonstrated the highest sensitivity values (80% -87%) with a minimum lesion size of 32 mm 3 for right motor deficits and 235 mm 3 for left motor deficits.The significant difference in the minimum lesion size required to identify patients with left and right motor deficits was primarily attributed to the higher incidence of lesions in the right hemisphere, resulting in increased lesion overlap across all 340 patients.For right motor deficits, apart from the SVR-based methods, both SCCAN À DSM b LN and SCCAN À DSM b LG also exhibited high AUC values (.84 and .86)and sensitivity (79% and 78%) at the cut-off point.Overall, the LSM methods exhibited lower clinical sensitivities compared with the DSM methods, consistent with (Wawrzyniak et al., 2022).Furthermore, VLSM and SCCAN-LSM failed to yield cut-off points and demonstrated lower AUCs.

| LIMITATIONS
The present study has several limitations that should be acknowledged in future work.First, we utilized a probability threshold of 10% to binarize the probabilistic disconnection maps.Although this threshold was chosen based on the trade-off between sensitivity and FPR, it is imperative to explore different thresholds to examine their impact on the results of DSM.Furthermore, our study relied on lesion modelling using diffusion-weighted imaging data from healthy subjects, a potent method commonly employed by other research groups (Dulyan et al., 2022;Foulon et al., 2018;Kuceyeski et al., 2013;Salvalaggio et al., 2020;Souter et al., 2022;Thiebaut de Schotten et al., 2020;Ulrichsen et al., 2021;Wawrzyniak et al., 2022;Wodeyar et al., 2020), but it may not comprehensively capture the complexity of brain damage in stroke patients.To investigate the precision of our technique, future research should incorporate real DWI data from stroke patients.Moreover, our study exclusively included patients in the chronic stage of stroke, warranting a longitudinal analysis to evaluate the association between structural damage and acute-chronic deficits over extended periods (Shahid et al., 2017).Additionally, we solely considered motor deficits associated with the well-known anatomy of hemiparesis (Byblow et al., 2015;Cho et al., 2007;Godefroy et al., 1998;Kassubek et al., 2005;Lindenberg et al., 2010;Zhu et al., 2010).Subsequent research should scrutinize the technical validity of structural DSM in more intricate motor and cognitive abilities involving large networks (Corbetta et al., 2015;Rondina et al., 2016).In addition, our findings clearly demonstrate that DSM-based approaches exhibit a higher susceptibility to collinearity compared with conventional LSM methods, especially when dealing with highly unbalanced data.To address this challenge, a collinearity analysis should be conducted to identify the source of multicollinearity (Arnoux et al., 2018).Finally, our method was developed using multivariate SVR.Exploring alternative regression analyses, such as ridge or Bayesian regression, might potentially enhance the performance of our approach.Further investigation is necessary to consider these limitations and their potential impact on the overall validity and reliability of our findings.

| CONCLUSION
In this study, we proposed a structural DSM approach based on bootstrapped multivariate SVR analysis to investigate brain structures associated with motor deficits in stroke patients with lesions.The bootstrapped multivariate SVR-DSM, employing logistic regression, significantly outperformed VLSM, SVR-LSM and SCCAN.This approach also achieved the best sensitivity-specificity trade-off for identifying patients with motor deficits.Overall, bootstrap multivariate SVR, using structural disconnection maps, can significantly enhance the results of lesion symptom analysis when dealing with unbalanced data.Crucially, the generalizability of our approach extends beyond motor deficits and stroke-related impairments.The inherent strength of our methodology lies in its adaptability to other deficit types, especially in situations where data imbalances are prevalent.In essence, our findings underscore the versatility and efficacy of bootstrapped multivariate SVR in advancing the precision and applicability of lesion symptom analysis across diverse clinical contexts.

F
I G U R E 2 Overlap maps for lesions and structural disconnections for 340 stroke patients (a), those included in the minority classes for left (b) and right (c) motor deficits, as well as those included the majority class (d).The colour bars indicate the number of patients with overlapping lesions or disconnected tracts at each voxel.These maps are superimposed on the MNI 152 template using MRIcron.
T A B L E 4 Clinical validation results including the sensitivity (SEN) and specificity (SPE) of each method in identifying patients with motor deficiency.