Test–retest reliability and minimal detectable change of corticospinal tract integrity in chronic stroke

Abstract Diffusion tensor imaging (DTI) can be used to index white matter integrity of the corticospinal tract (CST) after stroke; however, the psychometric properties of DTI‐based measures of white matter integrity are unknown. The purpose of this study was to examine test–retest reliability as determined by intraclass correlation coefficients (ICC) and calculate minimal detectable change (MDC) of DTI‐based measures of CST integrity using three different approaches: a Cerebral Peduncle approach, a Probabilistic Tract approach, and a Tract Template approach. Eighteen participants with chronic stroke underwent DTI on the same magnetic resonance imaging scanner 4 days apart. For the Cerebral Peduncle approach, a researcher hand drew masks at the cerebral peduncle. For the Probabilistic Tract approach, tractography was seeded in motor areas of the cortex to the cerebral peduncle. For the Tract Template approach, a standard CST template was transformed into native space. For all approaches, the researcher performing analyses was blind to participant number and day of data collection. All three approaches had good to excellent test–retest reliability for fractional anisotropy (FA; ICCs >0.786). Mean diffusivity, axial diffusivity, and radial diffusivity were less reliable than FA. The ICC values were highest and MDC values were the smallest for the most automated approach (Tract Template), followed by the combined manual/automated approach (Probabilistic Tract) then the manual approach (Cerebral Peduncle). The results of this study may have implications for how DTI‐based measures of CST integrity are used to define impairment, predict outcomes, and interpret change after stroke.


| INTRODUCTION
The corticospinal tract (CST) is an important descending white matter pathway for the control of skilled movement. The CST receives fibers from the primary motor cortex (M1), as well as from sensory and secondary motor areas including somatosensory cortex (S1), dorsal premotor cortex (PMd), ventral premotor cortex (PMv), supplementary motor area (SMA), and presupplementary motor area (preSMA; Archer, Vaillancourt, & Coombes, 2017;Dum & Strick, 1991;Nudo, 2007). The integrity of the CST is often compromised after stroke resulting in motor impairment that persists into the chronic phase of stroke. Microstructural changes in the CST due to direct lesion or axonal degradation related to remote lesion after stroke can be detected and measured by diffusion-weighted magnetic resonance imaging (MRI; Arfanakis et al., 2002;Thomalla et al., 2004).
Diffusion imaging-based measures of white matter integrity are reproducible and reliable between observers, scanners, and across time in nondisabled populations (Albi et al., 2017;Danielian, Iwata, Thomasson, & Floeter, 2010;Fox et al., 2012;Heiervang, Behrens, Mackay, Robson, & Johansen-Berg, 2006;Kristo et al., 2013;Lin et al., 2013;Wakana et al., 2007). Fewer studies have investigated the reliability of diffusion imaging-based white matter integrity after stroke (Borich, Wadden, & Boyd, 2012;Lin et al., 2013;Snow et al., 2016). In individuals with chronic stroke, inter-and intra-rater reliability of diffusion imaging-based measures of CST integrity has been shown to be reliable (Borich et al., 2012;Lin et al., 2013). Yet, only a single study has examined test-retest reliability by calculating intraclass correlation coefficients (ICC) in chronic stroke (Snow et al., 2016). The results suggested that test-retest reliability of measures of CST integrity in individuals with chronic stroke is excellent, however, study limitations (relatively small sample size, variance in the number of days between test and retest scans, investigation of a single tractography approach) may impact the application of these findings.
Minimal detectable change (MDC) values represent the smallest amount of change required to exceed the inherent variability of a measure. These values are necessary for determining whether clinical interventions result in change beyond estimated measurement noise (Portney & Watkins, 2008). To the best of our knowledge, no previous study has reported MDC values of DTI-based measures of CST integrity after stroke. Since a gold standard approach for measuring CST integrity after stroke has not been established and ICC and MDC values are unique to the approach utilized, investigation of the psychometric properties of several different approaches for determination of CST integrity is needed. Common approaches for integrity measurement include single region ROI masks (manually drawn or from an atlas), full tract templates transformed from standard space to native space, and probabilistic tractography with ROIs (manually drawn or from an atlas) to isolate the tract of interest. While many pathways may contribute to the movement (Fling & Seidler, 2012;Rodríguez-Herreros et al., 2015;Stewart et al., 2017), the CST is the only white matter pathway that has been recommended by expert consensus as a biomarker of the motor system after stroke. Therefore, the purpose of this study was to determine the test-retest reliability and to estimate MDC of DTI-based measures of CST integrity in chronic stroke. We hypothesized that DTIbased measures of CST integrity would have good to excellent testretest reliability, with more automated approaches (Tract Template and Probabilistic Tractography approach) having higher reliability and smaller MDCs than manual, region of interest (ROI) approaches (Cerebral Peduncle approach).

| Participants
Structural images, diffusion-weighted images, and behavioral assessments were obtained from 18 individuals in the chronic phase of stroke 4 days apart as part of a larger study (ClinicalTrials.gov Identifier: NCT02785419) that examined brain activity in response to a brief period of practice after stroke. Practice occurred on 4 consecutive days and consisted of movement of a joystick with the more-impaired hand based on a visual cue for 30-45 min (168-240 movement repetitions per day) on all 4 days. This amount of behavioral practice is well below the intensity hypothesized to produce structural brain changes (Scholz, Klein, Behrens, & Johansen-Berg, 2009). Per protocol of the larger study, individuals were eligible to participate if they were ≥18 years old, in the chronic phase of stroke recovery (>6 months poststroke), right hand dominant (Oldfield, 1971), scored ≥19 on the Montreal Cognitive Assessment (Nasreddine et al., 2005), showed evidence of upper extremity impairment by an upper extremity Fugl-Meyer (UE FM) score < 66 (Fugl-Meyer, Jääskö, Leyman, & Olsson, 1975) and/or at least 15% deficit on the Nine Hole Peg Test (Grice et al., 2003) on the more impaired hand compared to the less impaired hand and demonstrated some movement ability as shown by an UE FM score >30 and/or the ability to move at least one block on the Box and Blocks Test with the affected upper extremity (Mathiowetz, Federman, & Wiemer, 1985). Individuals were excluded if they had any acute medical problems, severe ideomotor apraxia as defined by a score ≤65 on the Test of Upper Limb Apraxia (Vanbellingen et al., 2010), hemispatial neglect with <52 on the BIT Star Cancelation Test (Hartman-Maeir & Katz, 1995), significant arm pain that interfered with movement, contraindications to MRI scanning (e.g., metal implants, claustrophobia), or a history of other, nonstroke related neurological disorder. All participants provided written consent, and all aspects of this study were approved by the University of South Carolina Institutional Review Board.

| Image acquisition
All images were acquired on a Siemens Prisma 3 Tesla MRI scanner with a 20-channel head coil at the University of South Carolina's McCausland Center for Brain Imaging. High resolution T1-weighted structural images (TR = 2,250 ms, TE = 4.11 ms, 192 sagittal slices, 1 mm 3 isotropic voxels) and T2-weighted structural images (T2 = 3,200 ms, TE = 567 ms, 176 slices, 1 mm 3 isotropic voxels) were acquired on Day 1. Diffusion-weighted images were collected using an echo-planar imaging sequence (TR = 3,839 ms, TE = 71 ms, 68 slices, 1.8 mm 3 isotropic voxels, 56 noncollinear directions, b = 1,000 s/mm 2 ). Whole-brain diffusion-weighted images (DWI) were acquired on Day 1 and again 4 days later (Day 4). Two runs of diffusion images were acquired on each day with reverse encoding directions (anterior to posterior and posterior to anterior); seven b0 volumes were acquired in each run.

| Structural image preprocessing
FSL's Brain Extraction Tool was used to perform brain extraction using robust brain center estimation and thresholding to maintain the inclusion of lesioned and exclude extraneous nonbrain tissues (Smith, 2002).
A trained researcher hand drew stroke lesion masks on the T2 structural image. All lesion masks were checked by a second, experienced researcher. The T2 lesion mask was linearly registered to the structural T1, then binarized to be used as a weighting volume during registration.
The lesion mask volume was deweighted during all linear and nonlinear registration processes (Schulz et al., 2017a). The structural T1 image was linearly then nonlinearly registered into diffusion space using FSL's FLIRT and FNIRT (Jenkinson & Smith, 2001;Smith et al., 2004).

| Data processing
FA, MD, radial diffusivity (RD), and axial diffusivity (AD) were extracted from the diffusion data. Directional diffusivities were determined by three eigenvalues, where AD is equal to the first eigenvalue (AD = λ 1 ) and RD is equal to the average of the second and third eigenvalue (RD = λ 2 + λ 3 /2; Basser, 1995). FA is a ratio value derived from the eigenvalues that represents the directional preference of water diffusion within the structural bounds of tissue.
Three approaches were used to extract FA, MD, AD, and RD from the ipsilesional and contralesional CST in each participant's native space: Cerebral Peduncle, Probabilistic Tract, and Tract Template. In all approaches, the masks used for data extraction were thresholded to voxels with an FA > 0.2. Separate researchers completed the data analysis for each approach; all researchers were blinded to the subject num-

| Cerebral peduncle approach
A single researcher hand drew masks on the three contiguous axial slices that showed the largest cross-sectional area of the cerebral peduncle (Mark et al., 2008;Schaechter, Perdue, & Wang, 2008). The T1 structural image was registered to native diffusion space. Next, the colored FA map was overlaid on the T1 structural image in native diffusion space for ROI mask drawing ( Figure 1a). All masks were checked by a second researcher. The stroke lesion did not overlap with the cerebral peduncle ROI in any participant. The peduncle masks were thresholded, where voxels with FA < 0.2 were removed. The thresholded masks were used to extract mean FA, MD, AD, and RD from each participant's native space.

| Probabilistic tract approach
Standard human motor area templates (HMATs; Mayka, Corcos, Leurgans, & Vaillancourt, 2006) of M1, PMd, PMv, SMA, preSMA, and S1 were registered to the individual participant's diffusion space, with stroke lesions weighted to zero during the registration process using FSL's FLIRT and FNIRT (Jenkinson & Smith, 2001;Smith et al., 2004). The HMATs were used to seed probabilistic tractography, creating an individual descending probable tract from each HMAT in each hemisphere.
Tractography was completed using the PROBTRACKX2 command in FSL (maximum number of steps = 2000, step length = 0.5 mm, number of samples = 5,000, curvature thresholds = 0.2, volume fraction before subsidiary fiber volume threshold = .01) with the HMAT as the seed region and waypoints in the PLIC and the cerebral peduncle (waycondition "AND"). Three exclusion masks were drawn to limit extraneous fibers that crossed midline, extended into the cerebellum, or were likely part of the alternate motor pathway in the tegmentum pontis. Each participant had six tracts per hemisphere, one for each of the HMATs.

| Tract template approach
A standard Sensorimotor Area Tract Template (SMATT; Archer, Vaillancourt, et al., 2017) was transformed from standard MNI space into each participant's native diffusion space using linear (FLIRT) and nonlinear (FNIRT) registrations, with the stroke lesion mask weighted to zero. First, a linear registration from T1 to MNI space was created using the brain extracted T1 image, the MNI152 1 mm standard brain template and an inverse binarized T1 stroke lesion mask. Next, the linear transform was used to create the nonlinear warp from T1 to MNI space. The nonlinear warp was then inversed and applied to the SMATT masks to nonlinearly transform the SMATTs from MNI space to native T1 space. Next, a linear transform from T1 to diffusion space was used to create a nonlinear warp using the T1 image, the FA image, and an inverse binarized T1 stroke mask. The T1 to FA nonlinear warp was then applied to the SMATTs in T1 space, resulting in a right and left SMATT in diffusion space. This process was repeated separately for Day 1 and Day 4 data for each participant.
The Tract Templates (Figure 1c), registered to the participant's diffusion space, were thresholded at an FA value of 0.2 and used to extract mean FA, MD, RD, and AD.

| Statistical analysis
Means for each diffusion measure (FA, FA ratio, FA asymmetry, MD, RD, and AD) were calculated for ipsilesional and contralesional CST, for Day 1 and Day 4, for all three approaches. Differences between mean values of contralesional and ipsilesional CST were evaluated using paired t tests. To assess test-retest reliability, intraclass correlation coefficients (ICC) were calculated for two-way mixed effects, single measurement, with absolute agreement. ICC estimates were interpreted based on the following guidelines: <0.5 indicates poor reliability, 0.5-0.75 indicates moderate reliability, 0.75-0.9 indicates good reliability, and >0.9 indicates excellent reliability (Koo & Li, 2016 Figure 2). Fifteen out of the 18 participants had a stroke lesion that overlapped with the Tract Template suggesting that most participants had some degree of direct damage to the CST. Despite this, we were able to successfully generate a descending CST in the lesioned hemisphere in all participants ( Figure S1).

| Test-retest reliability and minimal detectable change of measures of corticospinal tract integrity
Overall, test-retest reliability for DTI-derived measures of CST integrity ranged from moderate to excellent ( Additionally, removing these participants did not change the ratings (i.e., "good" or "excellent") for any of the FA ICC values.
MDC 95 values for FA, MD, AD, and RD are presented in Table 2 and Figure 5. CST integrity is reliable and reproducible in nondisabled adults, including older individuals (Albi et al., 2017;Danielian et al., 2010;Fox et al., 2012;Heiervang et al., 2006;Kristo et al., 2013;Lin et al., 2013;Wakana et al., 2007). Our results showed good to excellent reliability of CST integrity in individuals with chronic stroke, similar to previous studies with smaller sample sizes that used different measurement approaches (Borich et al., 2012;Lin et al., 2013;Snow et al., 2016). In a study of individuals with chronic stroke, fiber assignment by continuous tracking (FACT; ICC = 0.59-0.90) had higher inter-rater reliability than FA measured by a cross-sectional manually drawn PLIC ROI (ICC = 0.37-0.71; Borich et al., 2012), suggesting that inter-rater reliability is higher for more automated approaches than for manual approaches. The results of the current study that utilized different approaches to quantify CST integrity found similar results; the Tract Template and Probabilistic Tract approaches that were more automated had higher test-retest reliability than the Cerebral Peduncle approach which was manual.
While the degree of automaticity may influence reliability, the specific tractography approach (deterministic vs. probabilistic) or diffusion data modeling approach (tensor-based vs. constrained spherical deconvolution [CSD]) may not. Our DTI-based Probabilistic Tractography approach produced similar test-retest reliability for CST FA (ICC = 0.93-0.97) to a previous study using deterministic fiber tractography on diffusion data modeled using CSD (ICC = 0.89-1.00; Snow et al., 2016). Differences have been noted in some metrics (number of tracked fibers) between deterministic and probabilistic tractography (Bonilha et al., 2015) and other metrics (mean FA, number of tracts, tract volume) between tensor-based versus CSD diffusion modeling (Auriat, Borich, Snow, Wadden, & Boyd, 2015). However, the consistency of the ICC values from this study and the previous study by Snow et al. (2016) results suggest that there are not large differences in test-retest reliability between deterministic versus probabilistic tractography or tensor-based versus CSD diffusion modeling for the CST after stroke. Importantly, this finding may be specific to the reliability of CST measurement (a tract that runs in a uniform direction) and is not a reflection of the validity of tractography approaches (deterministic vs. probabilistic) or diffusion modeling approaches (CSD vs. DTI). Overall, the results of the current study suggest that test-retest reliability of measures of integrity of the CST using a probabilistic tractography approach in individuals with chronic stroke is excellent.
In addition to evaluating FA, MD, AD, and RD, we also examined the reliability of the normalized tract integrity values (FA ratio and FA asymmetry) since these values are commonly related to measures of behavior in chronic stroke (Borich et al., 2012;Cassidy, Tran, Quinlan, & Cramer, 2018;Lindenberg et al., 2010;Stewart, Dewanjee, Shariff, & Cramer, 2016;Stewart et al., 2017;Stinear et al., 2007). Our study is the first to report MDC values for CST integrity. MDC reflects the responsiveness of a measure by estimating how much change is required to exceed the inherent noise or variability of the measurement (Portney & Watkins, 2008 Wan et al., 2014;Zheng & Schlaug, 2015 showing the strongest relationships to motor behavior (Archer, Patten, et al., 2017;Schaechter et al., 2009)  CST integrity has been suggested as a biomarker of the motor system after stroke (Boyd et al., 2017). The reliability, validity, sensitivity, and specificity of measurements are fundamental to utilization of biomarkers (Milot & Cramer, 2008 found FA to be a reliable measure, supporting its use as a biomarker of the motor system after stroke. In general, our participants presented with variable lesion size, location, and level of motor impairment. The characteristics of the study population should be considered when interpreting and applying the ICCs and MDCs reported here. Stroke severity may impact the relationship between FA and behavior. While we did not have sufficient power to directly compare a mild/moderate impairment group (n = 13) to a severe impairment group (n = 5), previous studies suggest brainbehavior relationships differ based on motor severity (Feldman et al., 2018;Quinlan et al., 2018;Stewart et al., 2017). In addition, our participants were all in the chronic phase of stroke; the reliability of measures of CST integrity may be different in individuals in the acute or subacute phase of stroke recovery. We took precautions to limit experimenter bias by blinding the researchers performing data analysis to the participant's identification and day of data collection. However, we cannot rule out potential bias from possible identification of the lesioned hemisphere during mask drawing, which would have been especially apparent in individuals with large lesions. In addition, the period of practice of a joystick task between imaging collection may have impacted our results. While the amount of practice between scans was below the expected dosage to impact brain structure, the practice could have resulted in larger differences between test and retest measures, as well as lower reliability, if it did impact the structural integrity of the CST.
Finally, all data was extracted from each participants' native space. Therefore, researchers were not able to standardize the location of the ROI masks using MNI coordinates. While this likely introduced variability in the spatial location of masks, extracting diffusion data from native space was performed to limit the effect of the transformation process on data values.

| CONCLUSION
DTI-based measures of CST integrity showed good to excellent testretest reliability in individuals in the chronic phase of stroke. More automated approaches (Probabilistic Tract and Tract Template approach) were more reliable than manual approaches (Cerebral Peduncle approach). However, more research is needed to determine other important psychometric properties of white matter integrity measurements like sensitivity, specificity, and validity, especially considering the variable relationship between CST FA and motor impairment depending on the measurement approach. There is a tradeoff when selecting an approach for measuring CST integrity, where no single approach appears to be superior in reliability, correlation with motor impairment, amount of expertise required, and amount of time required. Examination of the psychometric properties of diffusion tensor measurements in larger populations will be important for identifying optimal processes for quantifying white matter integrity.

CONFLICT OF INTEREST
The authors have no known conflicts of interest to disclose.

DATA AVAILABILITY STATEMENT
The data used in this study are not publicly available but are stored by the principal investigator and are available from the corresponding author upon reasonable request.