Replicable brain signatures of emotional bias and memory based on diffusion kurtosis imaging of white matter tracts

Abstract Diffusion MRI (dMRI) is sensitive to anisotropic diffusion within bundles of nerve axons and can be used to make objective measurements of brain networks. Many brain disorders are now recognised as being caused by network dysfunction or are secondarily associated with changes in networks. There is therefore great potential in using dMRI measures that reflect network integrity as a future clinical tool to help manage these conditions. Here, we used dMRI to identify replicable, robust and objective markers that meaningfully reflect cognitive and emotional performance. Using diffusion kurtosis analysis and a battery of cognitive and emotional tests, we demonstrated strong relationships between white matter structure across networks of anatomically and functionally specific brain regions with both emotional bias and emotional memory performance in a large healthy cohort. When the connectivity of these regions was examined using diffusion tractography, the terminations of the identified tracts overlapped precisely with cortical loci relating to these domains, drawn from an independent spatial meta‐analysis of available functional neuroimaging literature. The association with emotional bias was then replicated using an independently acquired healthy cohort drawn from the Human Connectome Project. These results demonstrate that, even in healthy individuals, white matter dMRI structural features underpin important cognitive and emotional functions. Our robust cross‐correlation and replication supports the potential of structural brain biomarkers from diffusion kurtosis MRI to characterise early neurological changes and risk in individuals with a reduced threshold for cognitive dysfunction, with further testing required to demonstrate clinical utility.

would help identify at-risk individuals, facilitate the development of preventative therapies, and enable early intervention. Unfortunately, psychometric testing is insensitive to early changes in cognition or emotional dysfunction that may precede any formal diagnosis. As such, there is a great clinical need for objective measures that reflect the brain changes that drive cognitive or emotional dysfunction. Many brain disorders are now recognised as being caused by (or are secondarily associated with) network dysfunction; hence, there is great interest in using diffusion MRI (dMRI) as a tool to measure these network changes. dMRI is sensitive to anisotropic diffusion within bundles of nerve axons (Beaulieu, 2002) and can be used to make objective measurements of brain networks (Greicius, Supekar, Menon, & Dougherty, 2009;Mori, Crain, Chacko, & van Zijl, 1999). Recent work on this topic shows disruption to brain connectivity networks at multiple scales is related to emotional and cognitive (dys)function; for example, with dMRI (Fornito, Zalesky, & Breakspear, 2015;Gong & He, 2015;Petersen & Sporns, 2015) and, in combination with functional connectivity using fMRI, multimodal imaging markers of cognition Qi et al., 2018;Sui et al., 2018). Despite this potential and many years of development, high resolution forms of dMRI remain a research tool (i.e., high angular resolution diffusion imaging [Descoteaux, 1999] acquisition schemes or other forms of multi-shell dMRI), with the only widespread, routine, substantive clinical application being limited usage for surgical planning (Fernandez-Miranda et al., 2008). The goal of this study was to use high angular resolution dMRI to create a replicable, robust and objective marker of brain changes relating directly to cognitive and emotional functions.
Despite a considerable volume of research, there is still no meaningful clinical usage of multi-shell dMRI data, such as is collected as part of large-scale initiatives such as Alzheimer's Disease Neuroimaging Initiative and the Human Connectome Project (HCP). In the clinic, the effects of acute or chronic brain injuries are measured by radiologists using "conventional" MRI techniques such as proton-density, T2-weighted and T1-weighted imaging, with observations usually limited to qualitative grading of imaging features (i.e., normal, mild, moderate, or severe). dMRI is part of this routine clinical imaging battery but is restricted to diffusion weighted imaging (DWI), usually calculated as the trace from an axial low angular resolution single shell dMRI dataset. Probably the most valuable clinical example of "conventional" MRI is the routine use of DWI to detect ischaemic infarction, where lesions with restricted diffusion (low ADC value) correspond quite precisely to a pathologically accurate diagnosis of focal infarction. While this is an extremely important technique, it makes no use of the higher order information available in a dMRI dataset and is only really useful for detecting a stroke for a period of 2-3 weeks following the infarction. Many brain injuries do not confer observable signs or symptoms by conventional clinical means. Even when imaging is performed, this is often using computed tomography (CT) or conventional MRI. Such conditions are most often reported as "normal," contrasting with the emerging consensus view that there is often (non-visualised) ultrastructural damage in many cases. Indeed, the percentage of abnormal CT scans following head injury is approximately 10% (Pandor et al., 2011), despite post-injury cognitive or emotional dysfunction in these cases occurring chronically at reported rates between 25 and 65% (Dischinger, Ryb, Kufera, & Auman, 2009;Kreutzer, Seel, & Gourley, 2001;Seel et al., 2003). Even in the presence of visible changes such as small embolic foci of restricted diffusion following surgical cases, the DWI change does not accurately predict these outcomes, likely because the extent of underlying damage is not strongly correlated to these "positive" imaging signs (Indja, Woldendorp, Vallely, & Grieve, 2019).
An emerging technique being applied to investigate brain structure is diffusion kurtosis imaging (DKI). DKI advances the conventional model of dMRI by accounting for deviations from the normative Gaussian pattern of diffusion (Steven, Zhuo, & Melhem, 2013;Wu & Cheung, 2010). DKI-based metrics are therefore independent from diffusion tensor-based measures and reflect heterogeneity of the tissue. DKI has been shown to improve the sensitivity and specificity of diffusion measurements in a range of diseases, including concussion (Lancaster et al., 2016), motor neurone disease  and depression (Kamiya et al., 2018). The improved ability of DKI to detect variation in brain diffusion characteristics may enable new effective imaging markers.
Here, we test the spatial covariance of cognitive and emotional domains with brain structure in a large healthy population using DKI.
Using these data, we sought to form robust signatures of the structures that underpin normal cognitive and emotional function. We reasoned that such signatures may prove to be useful as a tool to identify patients with an increased vulnerability to neurological injury, early disease processes, or potentially to measure functionally meaningful changes secondary to sub-clinical brain injury. We tested the replicability and consistency of these structural brain signatures in independent datasets since any potential clinical application mandates robust and repeatable metrics.

| Participants
To test the robustness of our findings, we developed our analyses in a "Discovery Cohort" and attempted to replicate the key outcomes in a "Replication Cohort." Our Discovery Cohort comprised 203 healthy individuals drawn from the Chronic Diseases Connectome Project (CDCP). Participants were free from psychiatric or neurological diagnoses, and we included all ages, genders and levels of education in the study. Written informed consent was obtained, and the study had institutional ethics board approval (Macquarie Medical Imaging; 5201500943). The Replication Cohort comprised 1,064 healthy subjects from the HCP (Young Adult Cohort; [Van Essen et al., 2013]).

| Cognitive and emotional testing
Neurocognitive testing was performed using the computer-based WebNeuro battery (Brain Resource Inc., Sydney, New South Wales, Australia; Silverstein et al., 2007). This validated battery of tests reports on 15 cognitive and emotional sub-scores derived from multiple tasks: negativity bias, emotional resilience, social skills, depression, anxiety, stress, motor tapping, impulsivity, attention, information processing, memory recognition, executive function, verbal interference, emotional identification and emotional bias.
Fractional anisotropy data were then aligned to a common space (the MNI152 average brain Mazziotta, Toga, Evans, Fox, & Lancaster, 1995) using the nonlinear registration tool, FNIRT (Andersson, Jenkinson, & Smith, 2007). We performed tract-based analysis of diffusion kurtosis scalars and their covariance with each cognitive component using tract-based spatial statistics (TBSS; Smith et al., 2006).
In short, a mean FA image was created and thresholded using default settings (0.2) to create a mean FA "skeleton," which represents the centres of all tracts common to the group. Each subject's aligned diffusion and kurtosis data were projected onto this skeleton and the resulting data fed into voxelwise cross-subject statistics.
T1-weighted images were analysed using voxel-based morphometry (VBM; Douaud et al., 2007). The grey matter was segmented from these images using FAST (Andersson et al., 2007) and, as above, non-brain tissues were removed before registering the images to the MNI brain. A study-specific, symmetric template was created by averaging these images and flipping them along the x-axis. Then, all grey matter images were nonlinearly registered to the template and corrected for local expansion or contraction from the nonlinear component of the registration. The corrected grey matter images were then smoothed with an isotropic Gaussian kernel with a sigma of 3 mm.

| Statistics
The significance threshold for all tests was set at 0.05. We initially excluded subjects who were outliers in any cognitive test (±10 SDs relative to the WebNeuro normative cohort [n > 10,000], Silverstein et al., 2007) or who had missing or incomplete cognitive or imaging data.
Raw neurocognitive tests scores were transformed into Z-scores.
Where multiple measures were reported for a single test (i.e., the continuous performance test consisted of response time plus number of errors) a composite score was calculated, as was the case where multiple tests measured a single neurocognitive domain, based on previous work by Goodkind et al. (2015). This resulted in 15 individual scores which underwent further data reduction using principal component analysis with a varimax rotation. One-sample t-tests employing the WebNeuro normative cohort were used to test whether subjects were cognitively normal.
For TBSS and VBM analyses, a voxelwise general linear model was applied, using permutation-based non-parametric testing in randomise/ PALM (Winkler, Ridgway, Webster, Smith, & Nichols, 2014). Across space, the family-wise error rate was controlled, and threshold-free cluster enhancement applied to account for multiple comparisons. The family-wise error rate was also controlled across contrasts (Winkler et al., 2016).

| Construction and validation of structural brain signatures
Based on the results of the above analyses, we used the Discovery Cohort to combine the best identified features to form potential "structural brain signatures" sensitive to functional measures. To define the signatures, we combined the significant findings from the tract-based analyses of the emotional bias and emotional memory components: at the local peak coordinates of the clusters which were strongly significant (p < .01), we extracted the mean value in each cluster and multiplied them together. Linear regression of this score against the cognitive and emotional measures was used to test sensitivity.
A "sham signature" was also created from the left corticospinal tract using the same methods, an anatomically proximal white matter tract with a low probability of structure-function correlation. This was used to test specificity (anatomically and pertaining to the cognitive components). A mask was placed along the corticospinal tract from the level of the cerebral peduncles to the precentral gyrus white matter and transformed to the individual subjects' space.
These voxels were identified from probabilistic diffusion tractography performed in a single high resolution diffusion dataset gathered using a head-only MRI scanner with high-performance gradients (Foo et al., 2018) and registered to the individual subjects' brain images (Callaghan et al., 2018;Maller et al., 2019). To form the sham signatures, the relevant mean diffusion metrics were extracted from these regions and multiplied together, as described above (Supporting Information S1).

| Replication analysis
Emotional bias and recognition were assessed in the Replication Cohort using the Penn Emotion Recognition Test (ER40; number of correct responses; Gur et al., 2001), which we interpreted as an analogue to our emotional bias and emotional memory components.
The same initial processing steps as in the Discovery Cohort were followed (estimation of diffusion and kurtosis scalars, removal of non-brain tissues, registration to a standard space) but using the TBSS skeleton from the Discovery Cohort for consistency. Then, the structural brain signatures identified in the Discovery Cohort were applied to obtain a score for each signature and linear regression was performed to test whether the same finding could be replicated.

| Demographics and cognition
Characteristics of the Discovery Cohort are shown in Table 1. After the removal of extreme outliers in cognition (n = 20) and subjects with incomplete data (n = 11), 172 subjects remained for full analysis.
There was a wide spread of ages centred on 40 years (median; 27-53 interquartile range) and a slight female gender bias (60%; Figure 1).
Our sample was considered normal for cognition and emotion, as demonstrated by the distributions of age-gender-and educationcorrected scores (one-sample t-tests: p > .05; Supporting Information S2).
The cognitive and emotional data were used to derive five principal components which, together, explained 66% of total variance.
Based on their component loadings (Supporting Information S3), we named these "negative affect," "emotional bias," "executive," "emotional memory" and "verbal," and used them as the independent variables for the following spatial analyses of brain structure.

Emotional bias
Note: Significance was determined by a voxelwise p-value <.05 (corrected for the family-wise error rate within clusters formed using a threshold-free approach [Smith & Nichols, 2009]). "+" indicates a direct relationship, that is, a higher diffusion metric relating to a higher component score and, vice-versa, "−" indicates an inverse relationship. Abbreviations: FA, fractional anisotropy; KFH, kurtosis FA; MD, mean diffusivity; MK, mean kurtosis.
weighted for executive performance in a maze task and attentional switching, and negatively weighted for memory. The emotional memory component was weighted toward both memory and emotional resilience and might therefore be specific to emotional memory.
Lastly, the verbal component was primarily weighted toward performance in a verbal interference test. The negative affect component was inverted so that higher values reflected better performance.

| Voxelwise analysis of diffusion kurtosis scalars
In the tract-based analysis of diffusion kurtosis scalars and their covariance with each of the five cognitive-emotional components, we found two main significant clusters across multiple diffusion and kurtosis scalars for two components: emotional bias and emotional memory (Table 2; Figure 2).   As a confirmatory test, we performed a similar analysis including all intracranial voxels (again, using threshold-free cluster enhancement and p < .05). In addition to the clusters identified in the tract-based analysis described above, we found one cluster reflecting grey matter volume.  (Figure 3).

| Voxel-based morphometry
We detected no clusters of significant voxels representing covariance of the cognitive-emotional components with grey matter volume.
Inspection of the non-thresholded t-statistic maps revealed a subthreshold trend (t > 2.5) in the negative affect component for greater grey matter volumes symmetrically in the hippocampi (Supporting Information S7).

| Specificity of the structural brain signatures
To demonstrate selectivity of the two signatures to their respective cognitive components, we repeated the regression analyses using each of the other cognitive components. Regression analyses of the emotional bias signature using the negative affect, executive function, T A B L E 3 Information for significant clusters identified in the tract-based spatial statistics and whole-brain spatial statistics analyses

| Replicability of the structural brain signatures
We then tested whether the same associations between our structural brain signature and emotional bias were replicable in a second, independent, large, healthy Replication Cohort. This cohort had a median age of 29 years (6-year interquartile range) and a slight female bias (54% female). The ER40 correct responses score (reflecting emotional bias) distribution was negatively skewed (skewness: −0.88; Shapiro-Wilk test: W = 0.94, p < .01), with a median of 36.0 and an interquartile range of 3.0. Linear regression of the emotional bias signature against the ER40 score, age, gender and education revealed a modest but significant direct effect in the same direction as was observed in the Discovery Cohort (F[3,1,061] = 27.28, R 2 = .07, p < .001; age, gender, education not selected). Residuals were approximately normally distributed. We did not test for replicability of the emotional memory component because there were no comparable cognitive data (i.e., testing emotional resilience) available for the Replication Cohort.

| DISCUSSION
The clinical applications of MRI are predominantly limited to the diagnosis of acute injuries or gross physical abnormalities. Even in cases where "changes" are detectable using conventional MRI, these data are not easily quantifiable or predictive of progression to formal cognitive decline or mood disorder. New MRI acquisition and analysis F I G U R E 3 Probabilistic diffusion tractography of the identified tracts. In blue are the probabilistic streamlines generated using the significant clusters as a seed region. The orange regions are gathered from automated NeuroSynth meta-analyses of functional MRI studies of emotional memory or emotional bias. The subfigures are as follows: (a, b, c) medial prefrontal cortex emotional bias-related cluster with intersecting tracks bilaterally mostly via the corpus callosum superior genu in order of coronal, sagittal and axial views, (d, e, f) hippocampal emotional bias clusters intersected in the right hemisphere by tracks primarily from the uncinate and inferior longitudinal fasciculi in order of axial, coronal and sagittal views, (g, h) left middle temporal gyrus emotional memory cluster intersected by tracks from the inferior longitudinal fasciculus from axial and sagittal views, and (i) left hippocampus-amygdala emotional memory cluster intersected by inferior longitudinal fasciculus tracks in a sagittal view techniques can measure brain networks, so have some potential to change this. In particular, dMRI is the focus of much research, including the HCP which has, to date, generated more than 340 papers from members of the HCP consortium alone (Van Essen & Glasser, 2016).
Despite this activity, there is currently no clinically meaningful use of high resolution dMRI. In this article, we aimed to develop objective, replicable dMRI measurements of brain tissue diffusion that predict brain function and that might therefore form candidate markers for objective measurements of brain integrity with further testing in the future.
We applied DKI, a promising dMRI analytical approach sensitive to brain ultrastructure, to map how cognitive function relates to brain structure. Our analysis reflects the most comprehensive and highly powered analysis of this type to date. Our key findings were: (a) the identification of two structural brain signatures which are sensitive and specific to emotional bias and emotional memory; (b) demonstration that these signatures involve white matter pathways, which closely relate to functionally relevant cortical regions linked to these emotional domains; and (c) replication of the primary finding in a second large, independently acquired normal cohort. In identifying associations between microstructural white matter changes and normal variations in neurocognitive function, we show new evidence to support diffusion tensor and kurtosis techniques forming a basis for developing sensitive and specific biomarkers that can readily identify and characterise structurally based brain dysfunction, such as may occur after injury, or in early psychiatric or neurological diseases. Our data were derived from healthy cohorts, and the most realistic potential application is, therefore, as a test to objectively quantify injury and identify subjects who may be more vulnerable to future emotional and cognitive dysfunction.
Further work is clearly required to demonstrate utility in the setting of injury or disease.
The identified structural brain signatures were derived from two main clusters, the first of which related to emotional bias and was located throughout the frontal lobe, as well as extending to the parietal and temporal lobes. Emotional dysregulation associated with the frontal lobe (Phillips et al., 2003) is a frequent morbidity observed in many neuropathological states, such as dementia (Goodkind, Gyurak, McCarthy, Miller, & Levenson, 2010), traumatic brain injury (van der Horn, Liemburg, Aleman, Spikman, & van der Naalt, 2016) and surgical brain injury (Indja et al., 2017). Mood disorders and emotional dysfunction are extremely difficult aspects of such conditions to manage and contribute significantly to impaired quality of life. In our data, the emotional bias score was negatively related to the FA and kurtosisbased metrics, and positively related to the mean and radial diffusivity metrics across the frontal lobe clusters. The direction of this effect suggests that greater performance in our emotional tests (i.e., less bias toward negative emotions) corresponds to greater diffusivity, reduced tissue integrity and reduced cellular complexity in these specific regions (Steven et al., 2013). The neurobiological mechanisms governing these effects may related to the structural integrity of myelin, a theory supported by previous studies of cognitive decline in normal healthy individuals (Madden et al., 2012).
The second cluster was related to emotion and memory and was located along the left medial temporal lobe and left posterior limb of the internal capsule. In contrast, the direction of the effect was different for the emotional memory and MD, axial diffusivity and radial diffusivity relationships: better memory performance corresponding to lower diffusivity consistent with greater density of membranes. The role of the medial temporal lobe and hippocampus in memory  has been studied extensively; and, in dementia, there is strong evidence of an association with grey matter atrophy (Zakzanis, Graham, & Campbell, 2003). Whilst hippocampal atrophy occurs early, in the case of the medial temporal lobe, grey matter atrophy is often not seen until disease is well established (Zakzanis et al., 2003). The lack of any significant association between grey matter volume and cognition is not surprising given that grey matter volume change is a gross, non-specific measure which is impacted by many complex factors. Such atrophy is typically regarded as a downstream consequence of neurological and psychiatric dysfunction, whereas white matter changes are hypothesised to be recognisable early in a disease's course (Agosta et al., 2011;Zhuang et al., 2013). This has been demonstrated in the superior ability of the volume of white matter hyperintensities to predict the onset of Alzheimer's disease over hippocampal atrophy (Brickman et al., 2012). Diffusion tensor imaging has also been shown to accurately estimate the presence and timing of traumatic brain injury over traditional imaging techniques (Mac Donald, Dikranian, Bayly, Holtzman, & Brody, 2007).
Following identification and validation of these emotion-related clusters, we showed that our finding regarding the relationship between emotional bias and diffusion kurtosis characteristics of the matching cluster was replicable in a second, larger cohort. The need for replication of neuroimaging findings has come to the forefront in recent years (Poldrack et al., 2017), spurred by concerns about reliability in psychology research (Simmons, Nelson, & Simonsohn, 2011) and concerns about false positives in fMRI studies (Eklund, Nichols, & Knutsson, 2016). This is exacerbated by the cost of data acquisition and the wide variety and complexity of analyses in the neuroimaging field. In response, practical guidelines for replication in neuroimaging studies have been published (Bakken, 2019;Gorgolewski & Poldrack, 2016), journals have accommodated replication studies (Picciotto, 2018), for example, the Human Brain Mapping Replication Award and the creation of a replication category in NeuroImage: Clinical (Fletcher & Grafton, 2013), and an educational course to teach computation reproducibility has been trialled (Millman, Brett, Barnowski, & Poline, 2018). Successful application of replication analysis principles has provided key advances in the neuroimaging of speech perception (Evans, 2017). We attempted to apply the same principles of throughout our analysis by performing a large scale, powered replication analysis of the key finding, in full communication of methods and availability of data used.
One limitation of this study is that our analysis only included healthy individuals (i.e., those who are assumed not to have subclinical or clinical brain damage), limiting the generalisation of our findings to clinical cohorts. There is a paucity of available high-quality imaging data in this vulnerable population. A second limitation is that our Replication Cohort did not closely match the Discovery Cohort in terms of age, with a difference in medians of 10 years; they also used different tests of emotional function. This was due to the limited availability of datasets which have both multi-shell diffusion data and detailed neurocognitive testing. While the regression model in the Replication Cohort was still significant, the magnitude of the regression slope was smaller than that observed in the Discovery Cohort. However, this is not unexpected: in psychology, effect sizes reported in replication studies are generally half of that reported in original studies (Open Science Collaboration, 2015;van Aert & van Assen, 2018). Further, the significant replication of results, despite the age mismatch between the Discovery and Replication cohorts, may suggest that the observed effects are robust to differences in age. Our approach using a specific, pre-defined composite cluster was not subject to the "model degrees of freedom" which purportedly have driven false-positive replications in neuroimaging (Hong, Yoo, Han, Wager, & Woo, 2019). Last, our study did not make use of multivariate data mining or machine learning approaches which are gaining popularity with application in prediction studies, and show high levels of accuracy (Benedict et al., 2004;Dyrba et al., 2015;Moradi, Pepe, Gaser, Huttunen, & Tohka, 2015).

| CONCLUSION
We identified structural brain signatures of white matter structure which are sensitive to and specific for cognitive function. The tracts involved are both anatomically-precise and correspond to functional hubs derived from a meta-analysis of all available functional neuroimaging literature. The association of our emotional performance signature was replicated in a large independently acquired cohort. These results provide convergent evidence of a significant structural contribution to emotional and cognitive performance, even in normal healthy people. While our findings raise the possibility that MRI may, with further testing, be able to provide objective and quantifiable indicators of subclinical brain changes (e.g., in psychiatric illness, concussion or early dementia), there is a long way to go in bringing dMRI to clinical utility. Developing such markers could be a useful first step toward transforming dMRI into a clinical tool to track brain changes, relating directly to function, in a way that matters to patients.

CONFLICT OF INTEREST
The authors report no conflicts of interest.

DATA AVAILABILITY STATEMENT
The HCP Young Adult Cohort data are available at https://www. humanconnectome.org. Our analysis scripts are available at https:// github.com/t-welton/structural_biomarker. Statistical maps from TBSS and VBM analyses were uploaded to https://neurovault.org/.