Statistical inference in brain graphs using threshold‐free network‐based statistics

Abstract The description of brain networks as graphs where nodes represent different brain regions and edges represent a measure of connectivity between a pair of nodes is an increasingly used approach in neuroimaging research. The development of powerful methods for edge‐wise group‐level statistical inference in brain graphs while controlling for multiple‐testing associated false‐positive rates, however, remains a difficult task. In this study, we use simulated data to assess the properties of threshold‐free network‐based statistics (TFNBS). The TFNBS combines threshold‐free cluster enhancement, a method commonly used in voxel‐wise statistical inference, and network‐based statistic (NBS), which is frequently used for statistical analysis of brain graphs. Unlike the NBS, TFNBS generates edge‐wise significance values and does not require the a priori definition of a hard cluster‐defining threshold. Other test parameters, nonetheless, need to be set. We show that it is possible to find parameters that make TFNBS sensitive to strong and topologically clustered effects, while appropriately controlling false‐positive rates. Our results show that the TFNBS is an adequate technique for the statistical assessment of brain graphs.


| INTRODUCTION
The brain is a complex network, and its function is dependent on the interactions between distributed regions (Sporns, 2016). In parallel with the development of novel acquisition and analytic strategies, interest in studying functional and structural brain networks through neuroimaging has greatly increased in the last few years in an attempt to understand the organizational principles underlying normal brain functioning (Mill, Ito, & Cole, 2017;Misic & Sporns, 2016), and the alterations underlying neural deficits in different pathological processes (Pievani, de Haan, Wu, Seeley, & Frisoni, 2011;Shi & Toga, 2017). One novel approach made possible by these recent technical improvements is the comprehensive description of brain functional or structural connections in the brain, that is, the connectome (Fornito, The study of large data sets such as the human connectome requires systematic approaches that provide relevant and reproducible parameters. In this context, graph theory, describing brain networks as a set of nodes interconnected by edges (characterized by structural or functional connections), has been widely adopted as a strategy for the study of these highly complex networks (Fornito, Zalesky, & Breakspear, 2013). Summary measures derived from graph theory provide information about whole-brain or regional, node-centric, topological network organization (Rubinov & Sporns, 2010). The connectome can also be analyzed at the connection level. Connection-level group analyses have the desirable ability of describing local effects in the connectome, thus providing easily interpretable information that complements the more abstract graph-theoretical topological parameters (Zalesky, Fornito, & Bullmore, 2010). This is especially interesting considering that certain brain regions are vulnerable to specific disorders (Meyer, 1936), implying that some portions of the connectome will be more vulnerable to specific pathological mechanisms than other regions. Topographical patterns of structural and functional connectivity disruption have indeed been shown to be associated with clinical phenotypes in diseases such as frontotemporal dementia and Alzheimer's disease (Pievani, Filippini, van den Heuvel, Cappa, & Frisoni, 2014). Connection-wise approaches, however, are usually hindered by the high dimensionality of connectomics data sets. Conventional mass-univariate testing repeated for every connection may suffer from low statistical power if appropriate correction for the high number of tests is performed (Fornito et al., 2013;Meskaldji et al., 2013). Some neurobiological assumptions, however, may prove to be useful for the development of statistical methods for assessing brain graphs.
Recent evidence indicates that connectivity disruptions in brain disorders progress along specific networks (Greicius & Kimmel, 2012;Jones et al., 2016). Even localized brain pathologies lead to more global network alterations (Stam, 2014), as might be expected in an interconnected system. Focal neuronal dysfunction can lead to changes in the firing patterns of postsynaptic cells, possibly manifesting as downstream functional connectivity alterations. Neuronal degeneration, on the other hand, can lead to deafferentation of its target cells and loss of trophic support to presynaptic neurons (Reier & Velardo, 2003), which might manifest as downstream and upstream structural connectivity abnormalities.
In the context of neurodegenerative diseases, furthermore, it is currently believed that the accumulation of toxic, misfolded proteins in highly susceptible neurons is followed by the transneuronal spread to other vulnerable cells to which they are connected (Campbell et al., 2015;Fornito et al., 2015;Raj, Kuceyeski, & Weiner, 2012;Saxena & Caroni, 2011). Analytical approaches with enhanced sensitivity to effects involving topologically neighboring connections might combine the ability to detect biologically plausible pathological alterations, while minimizing the possibility of finding disconnected, potentially spurious results related to the noisy nature of MRI measures used for connectivity estimation (Bright & Kevin, 2017;Liu, 2016;Polders et al., 2011;Zalesky & Fornito, 2009). Different statistical methods have been developed to perform inference on brain graphs (Fornito et al., 2013;Meskaldji et al., 2013;Varoquaux & Craddock, 2013). One of the most frequently used edge-wise methods is the network-based statistic (NBS) (Zalesky et al., 2010). The NBS is designed to identify clustered effects in brain graphs, and has been used in different study settings (Ab os et al., 2017;Cocchi et al., 2012;Conti et al., 2017;McColgan et al., 2017;Rigon, Voss, Turkstra, Mutlu, & Duff, 2017;Roberts et al., 2017a). Specifically, the NBS is a technique that aims to identify connected components, consisting of neighboring edges that display statistical effects above a predetermined threshold (Zalesky et al., 2010). The statistical significance of the connected components found is then established by estimating the likelihood of finding connected components with equal or greater extent (i.e., number of edges comprised in the connected component) or intensity (also considering test statistic values in the component) by chance. In the presence of statistical effects spanning adjacent edges, the NBS tends to show improved statistical power when compared with mass-univariate testing controlling for multiple testing, while still providing control over the family-wise error (FWE) rate in the weak sense (Meskaldji et al., 2013;Zalesky et al., 2010).
In this study, we describe a technique called threshold-free network-based statistics (TFNBS), designed to assess the presence of statistical effects in brain graphs. The TFNBS combines the NBS with a method commonly used in voxel-wise analyses, threshold-free cluster enhancement (TFCE) (Smith & Nichols, 2009). TFCE is a standard cluster-defining approach employed by FSL (https://fsl.fmrib.ox.ac.uk/ fsl/), one of the main software packages in neuroimaging. TFNBS enhances effects occurring in neighboring network edges, accounting for the topological dependency in the data, while maintaining sensitivity to strong localized effects and providing adequate control of the FWE rates. Our objective in this study is to test the TFNBS algorithm using different types of simulated data and different test parameters, with the goal of providing principled and evidence-based criteria for parameter choice.

| M ET HOD S
To test the properties of the TFNBS algorithm, we use synthetic data containing different ground-truth effects and data containing only random noise. As detailed below, the initial topological structure of the ground-truth effects used was derived from the comparison between a group of healthy controls and a group of Parkinson's disease patients with mild cognitive impairment using resting-state functional connectivity. In contrast with the original implementation of the NBS, TFNBS obviates the need for the a priori definition of a "hard" component-defining threshold. Other parameters, however, need to be specified, as explained below.

| TFNBS general algorithm
The first step of the TFNBS algorithm, as in NBS, is the computation of individual subjects' symmetric N3N connectivity matrices M, where N is the total number of brain regions defined as nodes (Figure 1). Entry i, j in M describes the strength of structural or functional connection between nodes i and j, in a total of N3 N21 In the NBS, a cluster-defining threshold is then applied to Mstat.
Connections that survive this initial threshold are grouped with topologically neighboring suprathreshold edges into connected components.
In In the TFNBS, Mstat undergoes a transformation whereby the value of each edge is replaced by its TFNBS score ( Figure 1). As this score is determined by the strength of statistical effect (height) at this connection and by the heights of its neighboring edges, final TFNBS scores will be influenced by how topologically "clustered" This transformed-scores matrix is then summed across the third dimension, producing the N 3 N final TFNBS score matrix. The goal of this computation is to find the TFNBS-transformed score of edge i,j by solving where dh is the interval between thresholding steps. In our algorithm, parameter dh was set at a hundredth of the maximum value in Mstat.
By setting h0 to zero, therefore, Mstat undergoes 101 thresholding steps, whereas the number of steps in each Mstat-rand varies according to its maximum height.
This nonlinear enhancement procedure is analogous to the implementation of TFCE for voxel-wise analyses proposed by (Smith & Nichols, 2009 Through permutation testing, a p value can be ascribed to each entry in the TFNBS-enhanced matrix. At each iteration of the permutation procedure, subjects can be randomly relabeled (e.g., by reshuffling group membership), and the resulting raw statistics matrix Mstat-rand undergoes TFNBS scoring. The resulting p values can be corrected for multiple testing across the connectome (by comparing a connection's scores to the distribution of maximum scores across the matrix under the null hypothesis) or uncorrected (by comparing a connection's score to its own distribution of scores under the null).

| Parameter selection
For three-dimensional voxel-wise analyses using TFCE, the recommended E and H parameters are 0.5 and 2, respectively (Smith & Nichols, 2009). Values that are appropriate for smoothed, three-dimensional voxel matrices cannot be directly extrapolated to connectomics data, however.
Here we initially test a range of 13 E (0.125 and 0.25-3 at intervals of 0.25) and 20 H parameter values (0.25-5 at intervals of 0.25), in a total of 260 E/H combinations. These combinations were tested using synthetic data with three different effect sizes and contrast-to-noise ratios (CNR), and two different types of topological organization, as described in the section Initial parameter search below. The results of the assessment of sensitivity and specificity of these initial simulations were then used to select a narrower range of E and H values, subsequently used to test simulated subject groups containing signal with different topologies and variable effect sizes/CNR (Section 3.2), and data containing only random noise to assess the occurrence of false positives in the absence of ground-truth signals. These analyses were used to more accurately assess test properties such as sensitivity, specificity, number of false-positives, and FWE rates. Figure 2 illustrates the analysis pipeline followed, described in detail below.

| Ground-truth effects matrix
We initially generated a ground-truth matrix containing a pattern of connections defined as "altered," to be used in the simulated connectivity matrices. To make these ground-truth effects reflect a somewhat realistic topology, we performed univariate comparisons between a sample of Parkinson's disease patients with mild cognitive impairment (n 5 27) and a group of healthy controls (n 5 38), seen in a previous study by our group to have significant resting-state functional connectivity differences (Ab os et al., 2017). Data acquisition and image preprocessing were identical to those used in that study. FreeSurfer v5.1.0 (http://surfer.nmr.mgh. harvard.edu/) was used to divide the cerebral gray matter into 68 cortical and 14 subcortical regions based on the Desikan-Killiany atlas (Desikan et al., 2006). Entry i,j in a subject's 82 3 82 connectivity matrix was defined by the correlation coefficient between the first eigenvariates of the time series of voxels contained in regions i and j.
Two-tailed independent-samples t tests were then applied to each matrix entry, comparing the two subject groups. Connections that survived a liberal threshold of p < .003, uncorrected for multiple comparisons, were included in the ground-truth matrix. This matrix finally included 32 "altered" connections, divided into five connected components: the largest containing 20 connections (linked to 20 nodes), the second with seven connections (linked to eight nodes), the third with three connections (linked to four nodes), and the two smallest with one connection each ( Figure 3 shows the schematic representation of these components). These components were acyclic trees, that is, comprising linear and branching segments, but no cycles (i.e., the maximum number of paths of ground-truth edges between any pair of nodes was one).

| Simulated data
Three sets of synthetic data were generated: 1. Single-CNR matrices: In this step, three sets of 100 simulated subject groups were created, with each group composed of 200 "subjects" divided into 100 "healthy subjects" and 100 "disrupted connectivity subjects." Connectivity matrices were sized 82 3 82 and fully connected. Healthy subjects' matrices contained only noise (sampled from a normal distribution with mean X 5 0 and standard deviation r 5 1). For subjects with disrupted connectivity, matrices contained Gaussian noise ( X 5 0, r 5 1) in all connections except the 32 edges previously defined as ground-truth effects. These ground-truthsignal connections were set by randomly selecting from a normal distribution of values with X 5 20.5 (r 5 1) in the first set, X 5 20.75 (r 5 1) in the second set, and X 5 21 (r 5 1) in the third set. This procedure therefore produced effects with respective average CNR (as well as a Cohen's d effect size) of 0.5, 0.75, and 1.
The TFNBS algorithm is sensitive to signal extent and intensity, not to the topological characteristics of the components detected at each step. As such, the final TFNBS score is not primarily determined by whether the effects display cyclic or tree topologies. The presence of noise, however, might have a differential impact on the detection of effects that do not contain cycles; the random reduction in statistical effect of a central edge in a linear segment can "break" a component into two smaller ones. If a similar effect reduction affects an edge contained in a cycle, on the other hand, the component will have its extent reduced by one edge, but will otherwise retain its extension.
To assess the effect of signal organized in a topology containing cycles, we generated additional single-CNR matrices, with identical effect sizes, number of components, and component extents, but with Parkinson's disease and mild cognitive impairment were initially generated. Intergroup comparisons (two-tailed independent-samples Student's t test, a 5 0.003) were performed to define ground-truth effect edges (Panel 1). Panel 2: Actual intergroup differences displayed a tree topology; additional ground-truth connected components with identical number of edges but forming cliques were generated (cyclic topology). Panel 3: Simulated subject groups (consisting of 100 "normal" and 100 "reduced connectivity" subjects each) containing ground-truth effects (tree or cyclic topology) with single contrast-to-noise ratios (CNR 5 0.5, 0.75, or 1) were compared with TFNBS using 260 E/H parameter combinations. Panel 4: Simulated subject groups (100 "normal" vs 100 "reduced connectivity" subjects) containing ground-truth effects (tree or cyclic topology) with mixed CNR (0.5, 0.75, and 1) were compared with TFNBS using a subset of 44 E/H parameter combinations. Panel 5: Simulated subject groups (200 individual matrices containing random noise) were compared using TFNBS (44 E/H parameter combinations). Panel 6: Simulated subject groups (100 "normal" and 100 "reduced connectivity" subjects) were generated, each with a single ground-truth component with size ranging from 4 to 30 edges (at steps of 2), a single CNR (0.5 or 0.75), and either tree (

| Statistical testing
For statistical inference, permutation testing (3,000 permutations) was performed using the F statistic obtained through a general linear model (GLM). At each permutation, subjects' group membership was randomly reshuffled and the F statistic was computed by comparing the 100 subjects assigned to the "healthy" group and the 100 subjects assigned to the "disrupted connectivity" group. In the analysis of random-only data,

| Initial parameter search
This initial analysis was performed using the single-CNR matrices, and was designed to assess the impact of different E and H parameter combinations on sensitivity and specificity to statistical effects with different magnitudes. As expected, the presence of stronger effects was associated with higher sensitivity. Specificity also tended to be higher in data sets with larger effect sizes. Figure 4 shows the sensitivity and specificity of different E/H combinations in the three CNR levels tested using the tree ground-truth topology. Increasing H and decreasing E led to more conservative hypothesis testing, that is, higher specificity and lower sensitivity. A similar overall pattern was observed with the cyclic topology (Supporting Information, Figure 1

| Focused parameter assessment
In this step, the mixed-CNR matrices and random-noise matrices were analyzed with TFNBS using the 44 E/H parameter combinations tern. With this topology, overall differences between more liberal and more conservative E/H combinations tended to be smaller than with the cyclic topology. This was mainly due to lower sensitivities and higher specificities at higher E combined with lower H combinations.   Due to factors inherent to the type of data, the risk of family-wise errors is therefore high when true effects are present in noisy connectivity matrices. The probabilities given above for a fully-connected matrix, however, describe the "worst case scenario." When matrices are originally sparse or are thresholded prior to statistical testing, as in structural connectivity studies (Roberts, Perry, Roberts, Mitchell, & Breakspear, 2017b;Zalesky et al., 2016) and sometimes in functional connectivity assessments (Yang et al., 2017), the reduced network density would decrease the risk of false-positives. Even considering the probabilities described above, however, whenever any false-positives occur, the total number of false-positives should be small. As shown in Supporting Information, Figure 5, this was usually the case; except when E parameter 5 1, the average number of false-positives (in the comparisons that displayed at least one false-positive using the mixed-CNR matrices) tended to be below six for the tree topology and below two for the cyclic topology. For parameters with comparable sensitivities, the NBS (especially NBS-extent) tended to yield higher FWE rates were, respectively, highly conservative and highly liberal.

| D ISC USSION
In this study, we assess the test properties of the TFNBS algorithm using different parameters and in different types of simulated data.
TFNBS is a technique that enhances topologically clustered effects without requiring a component-defining threshold. Our results show that TFNBS is sensitive to statistical effects clustered into connected components and to strong isolated signals, and appears to be a suitable statistical approach for statistical inference in brain connectivity graphs.
BAGGIO ET AL.

| 2299
The widely used NBS is a technique designed to be sensitive to statistical effects that are organized into connected components in connectivity matrices (Zalesky et al., 2010). This aspect renders NBS more powerful than edge-wise mass-univariate analyses followed by correction for multiple testing (such as control of the false-discovery rate; Benjamini, 2010), provided that effects are clustered (Zalesky et al., 2010). As shown in this study, the test properties of NBS are strongly dependent on the a priori component-defining threshold, which is often chosen arbitrarily. Also, because inference is performed at the component level, edge-wise significance levels are not obtained.
NBS is therefore analogous to cluster-based thresholding as used in voxel-based neuroimaging analyses. The TFNBS, on the other hand, is analogous to TFCE. Similar to TFCE (Smith & Nichols, 2009), the TFNBS has the advantage of producing point-wise p values, and preserving local maxima-retaining finer topological information than the NBS-without the need to set a component-defining threshold.
Nonetheless, the adequacy of the TFNBS for assessing brain In the last few years, the issue of replicability and reproducibility in neuroimaging studies has gained considerable and justified interest (Poldrack et al., 2017). In this context, the adoption of statistical methods that maximize both sensitivity and specificity is critical (Bennett, Wolford, & Miller, 2009 is a caveat to the sensitivity of the TFNBS to isolated effects. Brain connectivity graphs-either functional or structural-are inherently noisy. Low signal-to-noise ratios may lead to strong group effects arising in a few isolated connections by chance. If TFNBS is sensitive to such spurious effects, it might be vulnerable to false-positive findings.
Analysis of matrices containing only random noise, however, showed that the TFNBS offers adequate control of the FWE rate when no true effects are present.
Analyzing data containing mixed-CNR ground-truth effects plus random noise, for the range of high-specificity test parameters assessed, TFNBS and NBS FWE rates were similar. Nonetheless, the TFNBS showed higher specificity and lower FWE rates than the NBS for parameters with comparable sensitivity. This was especially true when compared with NBS-extent, but also with NBS-intensity in the case of the tree topology. These findings indicate that, as expected, the presence of a noisy background does lead to false-positives; nonetheless, for parameters with similar sensitivities, the "signal bleeding" potentially caused by the TFNBS procedure does not increase the FWE rate or reduce the test specificity compared with the NBS.
Regarding the effects of topology on TFNBS and NBS test properties, the higher specificities and lower FWE rates observed with the cyclic topology emphasize the relationship between the number of nodes linked to "actual" effects and the occurrence of false positives, as described above. The spatial specificity of the TFNBS can therefore be reduced if effects involve a high proportion of network nodes, especially if fully connected matrices are used. Nonetheless, the edge-wise p values produced with the TFNBS allow exploring the data a posteriori as information regarding the relative significance of effects is retained in the statistical significance matrices.
One limitation of this study is the fact that edge values in the simulated connectivity matrices were selected randomly from a normal distribution. These networks therefore do not replicate the topological properties of actual human functional connectivity matrices such as transitivity, modular structure, or small-world and scale-free characteristics (Stam, 2014;Zalesky, Fornito, & Bullmore, 2012). Considering that we assessed the effect of randomly distributed noise added to the connectivity matrices, the underlying topology is unlikely to have an impact on the results of the simulated data set analyses. Simulated data incorporating organized noise, nonetheless, might provide useful information regarding its effects on the sensitivity to ground-truth effects. Additionally, the inclusion of other methods for statistical inference on brain graphs, although outside the scope of this study, might have provided a more thorough comparative assessment of currently available techniques.
In summary, we have demonstrated that the TFNBS algorithm can be a valid approach for performing statistical inference on brain graphs.
Comparisons with the method it is based on, the NBS, reveal that the TFNBS may display statistical power similar to the NBS-intensity vari- AA was supported by an FI-DGR grant. BS, HCB, AIGD, CJ, YC, MJM, and FV report no disclosures.

ACKNOWLEDGMENT
This study was sponsored by the Ministerio de Economía y Competi-