The impact of graph construction scheme and community detection algorithm on the repeatability of community and hub identification in structural brain networks

Abstract A critical question in network neuroscience is how nodes cluster together to form communities, to form the mesoscale organisation of the brain. Various algorithms have been proposed for identifying such communities, each identifying different communities within the same network. Here, (using test–retest data from the Human Connectome Project), the repeatability of thirty‐three community detection algorithms, each paired with seven different graph construction schemes were assessed. Repeatability of community partition depended heavily on both the community detection algorithm and graph construction scheme. Hard community detection algorithms (in which each node is assigned to only one community) outperformed soft ones (in which each node can belong to more than one community). The highest repeatability was observed for the fast multi‐scale community detection algorithm paired with a graph construction scheme that combines nine white matter metrics. This pair also gave the highest similarity between representative group community affiliation and individual community affiliation. Connector hubs had higher repeatability than provincial hubs. Our results provide a workflow for repeatable identification of structural brain networks communities, based on the optimal pairing of community detection algorithm and graph construction scheme.


| INTRODUCTION
The human brain can be modelled as a network (Bassett & Sporns, 2017) and summarised as a graph. In structural networks, the nodes of the graph are small volumes of tissue which are interconnected via white matter tracts (edges). Graph theory can provide novel insights into healthy human brain function Braun et al., 2015) and its alteration in various diseases (Aerts, Fias, & been proposed, involving tractography with different algorithms and assigning edge weights using different diffusion MRI-based metrics.
The resulting graphs are quite different from each other and have different levels of robustness and repeatability Owen et al., 2013;Smith, Tournier, Calamante, & Connelly, 2015;Yuan et al., 2019;Zhong, He, & Gong, 2015). We recently explored the repeatability of structural brain graphs, their edge weights and graph-theoretical metrics, for 21 different edge-weighting schemes (Messaritaki, Dimitriadis, & Jones, 2019a). We demonstrated that integrating several metrics as edge weights is very good at capturing differences between populations, and is interesting from the perspective of developing biomarkers (Clarke, Messaritaki, Dimitriadis, & Metzler-Baddeley, 2021;. We constructed structural brain networks from a set of testretest diffusion MRI scan data from the Human Connectome Project (HCP) using the b = 2000 s/mm 2 data and the seven most reproducible graph-construction schemes as derived from our previous study on the same data (Messaritaki et al., 2019a). We then applied thirtythree community detection algorithms. The "hard" algorithms assign every node to only one community, while the "soft" algorithms can assign a node to multiple communities. For every pair of community detection algorithm and graph construction scheme, we estimated the reproducibility of nodal P i and z i and of provincial and connector hubs, based on both modular network metrics. Our aim was to identify the combination of graph construction scheme and community detection algorithm with the highest agreement of individual communities between the two repeat scan sessions.
The quality criterion for the estimated community partitions was also important in our study. To this end, we compared the quality index of the community partitions estimated over the original graphs with the quality indices of the community partitions computed over surrogate null versions of the original graph (Guimerà, Sales-Pardo, & Amaral, 2004).
We previously reported a statistical procedure for performing condition and group comparisons in terms of brain communities (Dimitriadis et al., 2012). Here, we applied a similar approach to assess between-scan pairwise community similarity for every pair of graph construction schemes and community detection algorithms. We adopted a proper community partition distance metric, the Normalised Mutual Information (NMI) Alexander-Bloch et al. 2012).
Finally, we derived a consensus cluster across participants and repeat scans (Dong, Frossard, Vandergheynst, & Nefedov, 2014;Ozdemir, Bolanos, Bernat, & Aviyente, 2015). The agreement of consensus cluster with individual communities adopting NMI was also used as an objective criterion of the optimal combination of graph construction scheme and community detection algorithm.
We note that the analysis presented here does not aim to assess how well these structural networks represent the functional organisation of the human brain. The accuracy of these networks and of the metrics used as edge-weights in representing the functional organisation of the brain has been validated in recent work by Messaritaki et al. (2021).
Additionally, the metrics used as edge-weights are routinely used in network analyses in the literature (e.g., Caeyenberghs, Metzler-Baddeley, Foley, & Jones, 2016;Nigro et al., 2016;Taylor, Cheol, Han, Weber, & Kaiser, 2015). Our analysis does, however, address one aspect of the accuracy of the partition of the structural connectome. If a partition of the structural connectome is not repeatable in the absence of changes resulting from maturation or intervention, then that partition is not an accurate representation of the modular organisation of the structural connectome. Only partitions that are repeatable can convey reliable information about the structural organisation of the human brain. In other words, even though the repeatability of a partition is not a sufficient condition for it to be representative of the brain's structural organisation, it is a necessary one.
The rest of this manuscript is organised as follows: Section 2 describes the graph-construction schemes, community detection algorithms, community partition similarity, the methodology for detecting connector and provincial hubs, and their repeatability. Section 3 reports our results in terms of repeatable community partitions across the 2D space of graph-construction schemes/community detection algorithms, the repeatability of nodal P i / z i and the detection of connector/provincial hubs. The Discussion summarises the main outcome of our study explaining its advantages, limitations, and suggestions for future directions.

| Data
We analysed the test-retest MRI and diffusion-MRI data set from the multimodal neuroimaging database of the Human Connectome Project (HCP) (Glasser et al., 2013;van Essen et al., 2013). We used the data from the 37 participants for whom there were 90 gradient directions for each b-value. The participants on this test-retest data set were scanned twice with the betweenscan time interval ranging between 1.5 and 11 months. The age-range of the participants was 22-41 years. The test-retest time interval is shorter than the expected time over which maturation-induced structural changes can be measured with diffusion MRI (dMRI).
The diffusion data were also registered to the structural data. We performed the following analyses using the b = 2000 s/mm 2 data.

| Node definition
The Automated Anatomical Labelling (AAL) atlas (TzourioMazoyer et al., 2002) was used to define 90 cortical and subcortical areas (45 areas per hemisphere) as nodes of the structural brain graphs. Structural brain networks (SBN) were generated for each participant using Explore DTI-4.8.6 (Leemans et al., 2009).

| Edge weights
Edges were weighted using the seven most reproducible graphconstruction schemes identified previously with the same data set (Messaritaki et al., 2019a), and which were based on different combinations of the nine metrics listed in Table 1 (see Section 2. 3.4). Each graph was normalised to have a maximum edge weight of 1, while the elements in the main diagonal were set to zero (see Figure 1).

| Integrated edge-weights
Combining multiple metrics into an integrated edge weight is supported by the fact that each metric conveys information about different tissue properties, while at the same time topological properties of SBNs are affected by more than one metric. Here, using the datadriven algorithm described in our previous work (Dimitriadis et al., 2017,c) the nine metrics in Table 1  orthogonal-minimal-spanning-tree (OMST) algorithm was then applied to the resulting networks, selecting the edges that preserve connectivity between nodes, while guaranteeing that the overall network efficiency is maximised. More details on the OMST algorithm and its implementation can be found in our previous work (Dimitriadis et al., 2017,b,c;Messaritaki, Dimitriadis, & Jones, 2019b) and the related code is freely available at https://github.com/stdimitr/multigroup-analysis-OMST-GDD.

| Graph construction schemes
Seven graph construction schemes were used in this study, summarised in Table 2 and falling broadly into two categories. We briefly explain their construction methodologies here.
The first category includes graphs constructed via the data-driven algorithm . (a) NS-OMST: apply the OMST filtering algorithm (Dimitriadis et al., 2017,c) to the NS-weighted matrix. (b) NS + FAÀOMST: Integrate the NS-weighted and FA-weighted matrices with the data-driven algorithm. (c) 9 m-OMST: Integrate all nine diffusion metrics (as originally reported in Dimitriadis, Drakesmith, et al., 2017, see Table 2).
The second category includes SBNs with edges weighted by the NS or the FA and applying a threshold to remove edges with the lowest weights. The threshold was determined by imposing the constraint that the graphs exhibit the same sparsity as the OMST graphs that exhibited the highest reproducibility (Messaritaki et al., 2019a). Once the topology of each of those graphs was specified, the weights of the edges were either kept as they were or re-weighted with one of the remaining two metrics. These graphs are as follows (see Table 2). As we have shown previously, these seven schemes exhibit different values of similarity between them, from 0.99 to 0.42 (Messaritaki F I G U R E 1 Flowchart of the construction of a structural brain network based on tractography and diffusion metrics (see Table 1) et al., 2019a, Table 2), motivating their inclusion in a study on the repeatability of community detection.

| Community detection algorithms
Communities or modules are defined as subgroups of nodes that are more interconnected with each other compared to the rest of the network (Newman & Girvan, 2004;Radicchi, Castellano, Cecconi, Loreto, & Parisi, 2004). In the present study, we compared thirty-three different community detection algorithms, comprising twenty-six with hard clustering and seven with soft clustering. (see Figure 2). In hard clustering, community membership can be represented as a vector that encapsulates the assignments of every brain area to every detected graph cluster (community). In our case, clustering has a dimension of 1 x 90, equalling the number of brain regions in the AAL parcellation. In soft clustering, the outcome is a matrix that encapsulates how many soft clusters a given node (brain area) belongs to. A more detailed description of the adopted community detection algorithms is provided in Appendix B.
In the present study, we considered for the very first time, to the best of our knowledge, a large number of community detection algorithms. We adopted thirty-three graph partition algorithms further divided into twenty-six hard clustering algorithms and seven soft clustering algorithms.

| Permutation test on quality modular indices
For every participant, scan and graph construction method, we produced 1,000 surrogate null graph models by randomising the weighted connections while preserving both the degree and strength of every node and the overall connectedness of the network (Rubinov & Sporns, 2010).
All of the hard clustering algorithms (no.s 1-26) involved a Q quality index for the communities detected. For further details of Q quality indices see Le Martelot and Hankin (2011. For the soft clustering algorithms (no. 27-33), we estimated the normalised mutual information (NMI; see Appendix C) between the original community affiliation and the surrogate null communities produced via the application of every algorithm to the surrogate graph model.

| Between-scan community detection agreement
We quantified the graph-partition distance with the normalised mutual information (NMI; see Appendix C).
T A B L E 2 Summary of the graph-construction schemes assuming integer values between 0 and 74 {37 participants x 2 scans}, which were then transformed to denote the probability of a pair of nodes (brain areas) being classified as belonging to the same community across the cohort and scan sessions. We converted the consensus matrix into a probability one by dividing each entry by 74.
In order to get a consensus or group representative community per graph-construction scheme and community detection algorithm, consensus matrices should be iteratively thresholded and clustered with a community detection algorithm (Lancichinetti & Fortunato, 2012). This algorithm uses an absolute arbitrary threshold to eliminate weak connections and iteratively apply a graph partition technique. Instead of an arbitrary filtering scheme, we adopted our OMST algorithm (Dimitriadis et al., 2017,b,c) to topologically-filter the consensus matrix in a datadriven way. We then extracted the consensus-group representative community by applying the community detection algorithms across the graph construction schemes (Newman, 2006). See Figure 3e for an example of a consensus matrix.

| Agreement of consensus representative community with individual community structures
An important criterion of our analysis is the high similarity between the consensus clustering and individual clustering for every graph construction scheme that showed high group-averaged community similarity (NMI > 0.9). To this end, we estimated this community similarity for every case. Figure 3 illustrates the various steps of the analysis.

| Evaluating the combined graph construction schemes-Community detection algorithms
As mentioned previously, we first identified the combinations of graph construction schemes and community detection algorithms with higher group-averaged between-scan community affiliation agreement (NMI > 0.9) with a p < .05 based on the bootstrapping procedure. We then adopted a criterion of highest community similarity between the consensus clustering with individual community affiliation (clustering). It is important that consensus clustering expresses the inter-subject variability and acts as a vector median for the whole group (Dimitriadis et al., 2012). The final ranking of pairs of graph construction schemes and community detection algorithms was based on: (a) high between-scan group-averaged community similarity quantified with NMI, and supported by a p < .05 ((bootstrapping, see Appendix C), (b) Q quality index with p < .05 based on surrogate null brain models and (c) high community similarity between consensus clustering and individual community affiliations (clusterings) assessed via a two-way analysis of variance (ANOVA) (see Section 2.7).

| Modular driven structural brain hub detection
Appendix A describes in detail the computation of P i and z i and how hubs are classified as either provincial or connector hubs (Guimera & Amaral, 2005). Here, we applied the aforementioned hub detection methodology solely on the graph construction schemes and community detection algorithm that fulfil the evaluation criteria of Section 2.3.10.

| Reliability of nodal participation coefficient P i and within-module Z-score z i
We also explored the intra-class correlation coefficient (ICC) of nodal participation coefficient P i and within-module z-score z i . As a main outcome of this hub detection approach, we quantified the consistency of connector/provincial hub detection first within participant between scans, and secondly across the cohort.

| Assessing a reproducible structural core of the human brain
We detected structural hubs for every participant, scan session and graph construction scheme by applying an absolute threshold to the participation coefficient and within-module z-score (Guimera & Amaral, 2005;Hagmann et al., 2007Hagmann et al., , 2008; van den Heuvel & Sporns, 2013). We estimated an agreement index that quantifies the percentage of connector/provincial hubs that were detected in both scans. This agreement index is defined as: where CH 1,2 are two vectors of size 1 Â 37 (number of subjects) with ones in positions where a brain area is detected as connector or provincial hub in specific subjects. This agreement index is normalised by the total number of participants and takes the absolute value of 1 when a node/ROI is detected as either connector or provincial hub across all participants and in both scans. We characterized an ROI as either provincial or connector hub only if the related Agreement index equals 1.

| Statistical analysis
Firstly, we determined pairs of graph construction schemes and community detection algorithms that fulfilled the evaluation criteria presented in Section 2.3.10. The first two criteria were evaluated via bootstrapping and surrogate null models, while for the third, based on the similarity of individual community partitions with consensus community partition, we run a two-way ANOVA over the pairs of graph construction schemes and community detection algorithms (p < .05).
We run also a two-way ANOVA over the pairs of graph construction schemes and community detection algorithms to assess the effect of both factors and their synergy to the repeatability of network topology quantified with NMI.
The detection of reproducible brain structural hubs using community-based hub detection network metrics require reproducible communities. For that reason, we followed hub detection analysis over the best pairs of graph construction schemes and community detection algorithms. Then, we adopted a two-way ANOVA (p < .05) F I G U R E 3 Outline of the presented methodology. The demonstration based on 9-m OMST graph-construction scheme and gso-discrete mode community detection algorithm. (a) Repeat-Scan Sessions. (b) Structural brain networks from participant 1 from both sessions using 9-m OMST graph-construction scheme. (c) Individual community affiliation of participant 1 from scan session 1. Each colour represents one community. (d) Vectorised community affiliations of the whole cohort from scan sessions 1 and 2 separated with a red line. Every module is coded with a different colour. (e) Consensus matrix is built over group community affiliations across both sessions as presented in (d). Weights in the consensus matrix refer to the total number of times two brain areas are grouped together across the cohort and scan sessions with the maximum value being (number of participants) x (scan sessions) = 74. (f) Representative community affiliation after graph partitioning the consensus matrix presented in (e). Each community is encoded to a different colour. Similarity NMI distance has been estimated between representative community affiliation presented in (f) and individual community affiliations presented in (c) to detect the best pair of graph construction scheme and community detection algorithms over hub detection analysis, using as input the ICC for the participation index and the within-module z-score across nodes and the agreement-index. algorithms. These findings support the quality of the extracted graph partitions and allow us to include all the participants, scans, graph construction schemes and community detection algorithms in our analysis.

| Similarity of individual community partitions with consensus community partition
The highest similarity between individual community partitions and consensus community partition was detected for the combination of 9-m-OMST graph construction scheme and mscd_so community F I G U R E 4 Between-scan agreement of communities affiliations across graph construction schemes and community detection algorithms. Every subplot refers to one of the seven graph-construction schemes. The bars define the group-averaged between-scan agreement of community affiliations. Numbers below the plot in A refer to the number list of community detection algorithms represented in Section 2.3.5 and in Appendix B. Community detection algorithms with the highest agreement between the two scans (NM1 > 0.9) were: mscd_afg,mscd_rb, mscd_rn and mscd_so. For the abbreviations and numbering of the community detection algorithms please see Appendix B detection algorithm. The second highest similarity was detected for mscd_afg and 9-m OMST. Mscd with rb and rn criterions failed to produce an acceptable community similarity between a group representative community estimated via consensus clustering and the individual community affiliations (see Table 3)  3.5 | ICC of nodal participation coefficient index and within-module z-score Table 4 shows the group-averaged ICC of nodal Participation     3.6 | Reproducibility of structural hubs detection based on participation coefficient index and withinmodule z-score We estimated the Agreement index of both connector and provincial hub detection across the cohort. The Agreement index of provincial hub detection across the cohort was higher than the Agreement index for connector hubs ( Table 6 versus Table 7).The highest Agreement index for provincial hub detection was found for {mscd_so, 9-m OMST} ( Table 6). On average across the seven graph construction schemes, the mscd_so algorithm demonstrated the highest average Agreement index for provincial hub detection.
F I G U R E 5 Topological Layout of Modular Assignment into the 90 AAL brain areas based on the community affiliation extracting from the consensus matrix related to 9-m OMST graph construction scheme and mscd_so community detection algorithm. With '*', we denoted the connector hubs detected consistently across participants and repeat scans from the same combination of {mscd-so, 9-m OMST} (see Section 3.5). This circular plot illustrates the 90 AAL brain areas into 45 of the left hemisphere on the left semi-circle and 45 of the right hemisphere on the right semi-circle. Our analysis gave nine communities/modules where each one is encoded with a different colour  The highest Agreement index for connector hubs detection was found for {mscd_rb, FA-t/NS-w} (Table 7). On average across the seven graph construction schemes, the mscd_rb algorithm demon- The group of connector hubs is indicated alongside modular representation of consensus modules illustrated in Figure 5 and also in Table 8. Interestingly, 5 of 13 consistent connector hubs are located within the inter-hemispheric modules (see Figure 5). Our conclusion is that the combination of modular network metrics P i and z i succeeded in uncovering a consistent core of connector hubs but failed to detect provincial hubs consistently.

| DISCUSSION
We have presented the first extensive study in the literature on the robustness of community detection in structural brain networks by exploring different graph-construction schemes (previously shown to exhibit high repeatability themselves) and various community detection algorithms. Our main findings have direct implications for longitudinal studies and studies comparing healthy controls versus diseased populations.
The key findings of our analysis can be summarised as follows: 1. The repeatability of community affiliations depends heavily on the combination of graph-construction scheme and community detection algorithm. All previously reported studies of network communities adopted a specific pair of graph-construction scheme and community detection algorithm, with the majority of them focused on Newman's modularity objective criterion (Betzel et al., 2017;Newman, 2006;Sporns & Betzel, 2016). Based on our first criterion of high repeatability of community affiliation between the two scans and across the cohort (NMI > 0.9) supported statistically via bootstrapping (p = .0001), we identified four community detection algorithms as the best choices: A. mscd_agb.

D. mscd_so.
These four algorithms gave excellent repeatability across the entire set of graph-construction schemes (see Table 2 and Figure 4).
Two-way ANOVA showed an effect of the repeatability of network topologies between scans assessed with NMI across the graph construction schemes, an effect of the community detection algorithm, and an interaction effect of both factors. These findings support our hypothesis that repeatable identification of structural brain networks communities can be derived from the optimal pairing of community detection algorithm and graph construction scheme.  4. An important result of our analysis is that soft clustering community detection algorithms gave the least repeatable results. Therefore, we recommend the use of hard-clustering algorithms for the detection of brain communities, at least when using the AAL template.
5. The best combination of graph-construction scheme  and community detection algorithm (mscd_so) revealed nine distinct modules (as illustrated topologically in Figure 5) with interesting findings: A. Modules 1, 7 and 9 group together brain areas located exclusively within the left hemisphere while modules 2, 6 and 8 group together brain areas located exclusively within the right hemisphere. Modules 3, 4 and 5 group brain areas from both hemispheres together.  (Cole et al., 2013).
C. Five out of thirteen consistent connector hubs are located within inter-hemispheric modules 3-5 supporting their interconnecting role ( Figure 5).
D. Interestingly, eight homologous brain areas were grouped together in either left (module 8) or right hemisphere (module 9).
Lesions of hippocampus, parahippocampal gyrus, amygdala and fusiform gyrus in participants with temporal lobe epilepsy caused an impaired associative memory in learning tasks that require learning and recall of objects and faces (Weniger, Boucsein, & Irle, 2004).
These four brain areas plus the thalamus are those most consistently implicated in neurodegenerative dementias, especially in Alzheimer's Disease, even at an early stage (Manuello et al., 2018).
E. The bilateral superior temporal gyrus, superior temporal pole and middle temporal pole play an inter-hemispheric integration role.
Inter-hemispheric functional connections between temporal lobes predict language impairment in adolescents born preterm (Northam et al., 2012). Phonological awareness, a key factor in reading acquisition was positively correlated with radial diffusivity of the interhemispheric pathways connecting temporal lobes (Dougherty et al., 2007). This bilateral temporal module could play a key role in many functions and dysfunctions.
6. The core of our study was an extensive analysis to identify the optimal pair of graph-construction scheme and community detection algorithm. The choice of this pairing will also affect repeatability of connector and provincial hub detection based on the participation coefficient score P i and the within-module z-score z i .
Our results revealed a high repeatability of nodal P i with the mscd_so algorithm across the seven graph construction schemes.
The highest ICC score was reached for the {mscd_so, 9-m OMST} pair. A significantly higher repeatability of nodal z i was found for mscd_so algorithm compared to the rest of the community detec- showed the highest Agreement index for connector hubs. We detected a group of 13 repeatable connector hubs across the cohort (Agreement = 1), but no group consistent provincial hubs (Agreement < 1). Based on our results, we therefore recommend to not use these modular network metrics for the detection of provincial hubs, at least when using the AAL atlas. The designation of a brain node as a hub depends also on the scale at which brain networks are constructed. Many brain areas in a basic atlas template group together functionally heterogenous subareas and it is possible that a finer-grained parcellation may affect the nodes' classification as a hub or not. For example, the thalamus, despite comprising 50-60 specialised sub-nuclei (Herrero, Barcia, & Navarro, 2002) is in many studies, including ours, treated as a single node.
In our previous study on the same cohort, we focused on the repeatability of network topologies focusing on edge weights and graph theoretical metrics. We demonstrated that network topology and edge weights are repeatable, but the repeatability depends on the graphconstruction scheme (Messaritaki et al., 2019b). The important finding in this work is that the repeatability of network topologies and edge weights does not guarantee the repeatability of community detection at the mesoscale. In the present study, we focused on this important tool for mesoscale network topological investigations, and the detection of robust communities in structural brain networks over the same participants. To the best of our knowledge, this is the first study in the literature that explores the robustness of community detection over a large set of graph-construction schemes (seven) and community detection algorithms (thirty-three). Our analysis detected an optimal pair of {mscd_so, 9-m OMST} that fulfils the three basic criteria: high repeatability of community affiliations between the two scan-sessions, quality over surrogate null graph partitions and high similarity of group community affiliation with the individual community affiliations.
To the best of our knowledge, this is the first time that the second and third criterion were used for the validation of representative consensus community affiliation (this includes studies using a single graph-construction scheme and community detection algorithm).
Running the comparison study for the whole set of thirty-three graph partition algorithms (including graph partition of the original graph and 1,000 surrogate null models) takes a few hours on a personal computer. We suggest to the neuroscience community to always run such an analysis over an in-house test-retest data set acquired with the same settings as in the targeted data set. Optimising the set of algorithms over the test-retest study will increase the chance of repeatability of findings over the single-scan data set. This process will increase the reproducibility of research findings, especially important for cross-sectional studies (Welton, Kent, Auer, & Dineen, 2015).
Our study has a few limitations. This data set involves a specific data acquisition protocol and a specific tractography algorithm. We recommend following our analysis for every study because such an investigation could improve the repeatability and reproducibility of the findings at the mesoscale while also increasing the power of the study at the nodal and network level (Messaritaki et al., 2019a). Additionally, in our study we used only one of the three available b-values to perform the tractography. This was mainly done in order to supplement our previous work (Messaritaki et al., 2019b), and we chose the b-value of 2000 s/mm 2 , because this value provides a balance between the b-value being sufficiently high to resolve crossing fibres with CSD, while at the same time ensuring sufficient SNR in the signal for robust measurements, and that higher-order effects of the diffusion do not need to be taken into account when calculating the diffusion metrics. Using one b-value also reflects acquisition protocols routinely used in other studies, and therefore makes our work more applicable to the general literature. At the same time, tractography results could be improved by combining data from all avaialble bvalues, and the implications of using different community detection algorithms in those cases should be explored as well. Moreover, three or more scan sessions would be also desirable to get a more robust assessment of repeatability. Scanning the same participant on different scanners and /or with different protocols would also allow assessment of reproducibility as well as repeatability. Lastly, the reproducibility of estimates of structural brain networks is affected by the resolution of the MR data (Vaessen et al., 2010), the parcellation scheme used (Bassett, Brown, Deshpande, Carlson, & Grafton, 2011), the interval time between the scan sessions and others.

| CONCLUSIONS
In this study, we compared several graph-construction schemes and thirty-three community detection algorithms for the detection of reproducible communities in structural brain networks. Our extensive analysis showed that every choice in both groups of algorithms exhibits different reproducibility in community detection algorithms, as well as in connector/provincial hubs detection. Our analysis indicates that our analytic pathway should be adopted and performed in every study in order to extract reliable results at the mesoscale of structural brain networks.

CONFLICT OF INTEREST
The authors declare no conflicts of interest. By recovering the community partition and estimating the participation coefficient, we can classify brain hubs into provincial and connector hubs (Guimera & Amaral, 2005). 'Provincial hubs' are high-degree nodes that primarily connect to nodes in the same module. 'Connector hubs' are high-degree nodes that show a diverse connectivity profile by connecting to several different modules within the network.
Brain hubs are important brain areas that are vulnerable and susceptible to disconnection and dysfunction in brain disorders. Rich club organisation of structural hubs supports the robustness of inter-hub connections and promotes the efficient information exchange between brain areas and its integration across the brain (van den Heuvel & Sporns, 2011).
The distinction of nodes into hubs and non-hubs by a combination of network topology and community affiliation is supported by a pair of network metrics called: participation coefficient P i and withinmodule z-score, z i . This definition has been first reported by Guimera and Amaral (2005). Here, we first reported the reliability of these nodal metrics in structural brain networks.
The degree of a node i is defined as k i ¼ P j A ij , where A ij is the adjacency matrix of the graph. Within-module z-score for node i is defined as where κ i is the number of edges of node i to other nodes in its module s i , κ̄s i is the average of κ over all the nodes in s i , and σ si is the standard deviation of κ in s i .
The participation coefficient P i , for node i is defined as where k is is the number of links of node i to nodes in module s, and k i is the total degree of node i. The participation coefficient of a node is therefore close to 1 if its links are uniformly distributed among all the modules and zero if all its links are within its own module.
Both provincial and connector hubs demonstrate a high withinmodule z-score which means that they have many within-module edges. In this work, we used the threshold originally proposed by Guimera and Amaral, (2005) for the z i dimension as z i > 2.5 for both types of studied hubs (see figure 5 in Guimera and Amaral (2005)). In the P i dimension, we defined a node as provincial hub if P i < = 0.3 and as connector hub if 0.3 < P i < 0.75.
The intra-class correlation coefficient (ICC) was estimated for every nodal participation coefficient index, P i , and withinmodule z-score z i across the cohort and for every selected pair of graph construction scheme and community detection algorithm that showed higher group-averaged community similarity (NMI > 0.9).

APP E NDIX B : Graph partition Algorithms
We described briefly the thirty-three graph-partition algorithms used in the present study.
Hard clustering algorithms are divided into three groups: Fast multi-scale community detection algorithms Hard community detection algorithms involving state-of-the-art graph partition algorithms 23. (shi_malik): From tens of available spectral clustering algorithms, we adopted the algorithm from Shi and Malik (2000) 24. (dominant_sets): Dominant sets (Pavan & Pellilo, 2017). We have adopted this algorithm in our previous studies ( For the Louvain's methods, we ran the algorithms 1,000 times and we followed the construction of consensus matrix approach. We described this approach in Section 2.3.8 for the construction of consensus matrix across participants and scans.
AP PE NDIX C: Normalised Mutual Information: Graph-Partition similarity To assess the reproducibility of the thirty-three community detection techniques across the seven graph-construction schemes and repeatscan sessions, we first quantified the similarity between the community partitions from the two scan sessions separately for every participant using the Normalised Mutual Information (NMI) (Alexander-Bloch et al. 2012), defined as follows : where A and B are the community partitions of two SBNs from the two scan sessions while C A , C B are the number of communities in partition A and B, correspondingly. N denotes the number of nodes (here 90), while N ij is the overlap between A's and B's communities i and j which practically means the number of common nodes between the two partitions. N i and N j are the total number of nodes in A's and B's communities i and j respectively. The NMI ranges from 0 to 1 where 0 corresponds to two independent partitions and 1 to identical partitions. This definition was used for hard community partition comparisons while for soft community partition, we adopted the homologue definition of NMI tailored to soft graph clustering ).
We calculated NMI values between every possible pair of scans and for each of the seven graph-construction schemes and the thirtythree community detection algorithms giving an space of {seven graph-construction schemes x thirty-three community detection algo-rithms}. The NMI was then averaged across the 37 participants, to create a group-averaged NMI, and ranked the community detection algorithms with high test-retest reproducibility (group-averaged values NMI > 0.9) in at least one of the seven graph construction schemes (see Figure 4).
To quantify the statistical significance of group-mean community similarity quantified with NMI, we adopted a nonparametric test via bootstrapping procedure (Dimitriadis et al., 2012). Practically, the outcome of every community detection algorithm in our cohort is a matrix (dimensions {(no of participants) x (no of nodes)}) with elements containing integer numbers assigned to every detected community.
Analysing the scan-rescan data give two such matrices, one per scan of participants x no of nodes} and the group mean NMI between those two shuffled matrices is estimated.
d. Steps (b) and (c) are repeated 10,000 times e. A P-value is assigned to every graph construction schemecommunity detection algorithm pair by counting the number of times the permuted between-scan group mean similarity exceeds the actual between-scan group mean similarity, divided by the number of permutations (here 10,000)