Fiber up‐sampling and quality assessment of tractograms – towards quantitative brain connectivity

Abstract Background and Purpose Diffusion MRI tractography enables to investigate white matter pathways noninvasively by reconstructing estimated fiber pathways. However, such tractograms remain biased and nonquantitative. Several techniques have been proposed to reestablish the link between tractography and tissue microstructure by modeling the diffusion signal or fiber orientation distribution (FOD) with the given tractogram and optimizing each fiber or compartment contribution according to the diffusion signal or FOD. Nevertheless, deriving a reliable quantification of connectivity strength between different brain areas is still a challenge. Moreover, evaluating the quality of a tractogram and measuring the possible error sources contained in a specific reconstructed fiber bundle also remains difficult. Lastly, all of these optimization techniques fail if specific fiber populations within a tractogram are underrepresented, for example, due to algorithmic constraints, anatomical properties, fiber geometry or seeding patterns. Methods In this work, we propose an approach which enables the inspection of the quality of a tractogram optimization by evaluating the residual error signal and its FOD representation. The automated fiber quantification (AFQ) is applied, whereby the framework is extended to reflect not only scalar diffusion metrics along a fiber bundle, but also directionally dependent FOD amplitudes along and perpendicular to the fiber direction. Furthermore, we also present an up‐sampling procedure to increase the number of streamlines of a given fiber population. The introduced error metrics and fiber up‐sampling method are tested and evaluated on single‐shell diffusion data sets of 16 healthy volunteers. Results and Conclusion Analyzing the introduced error measures on specific fiber bundles shows a considerable improvement in applying the up‐sampling method. Additionally, the error metrics provide a useful tool to spot and identify potential error sources in tractograms.


| INTRODUCTION
Diffusion magnetic resonance imaging (Le Bihan et al., 1986) is a compelling tool for probing microscopic tissue properties and diffusion tensor imaging (DTI) has become a popular model to inspect white matter architecture.
Tractography algorithms are able to reveal global fiber structures by estimating continuous streamline connections based This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. on the local diffusion information throughout the brain (Basser, Mattiello, & LeBihan, 1994a,b). The performance of tracking algorithms has significantly improved by considering the information contained in orientation distribution functions (ODF) or fiber orientation distribution (FOD), especially in regions with complex fiber configurations (Behrens, Berg, Jbabdi, Rushworth, & Woolrich, 2007;Fillard et al., 2011;Tournier, Mori, & Leemans, 2011). However, tractograms remain biased by algorithmic-specific parameters, that is, stopping criteria, curvature thresholds, seed point distribution, and the choice of the tracking algorithm itself, as well as partial volume effects of different fiber populations or various tissue types within the acquired data voxels. This complicates the estimation of reliable tractograms and thus the extraction of biologically meaningful connectivity measures between brain areas which are a crucial requirement for an accurate, quantitative connectome across different populations (Jbabdi & Johansen-Berg, 2011;Jones, 2010;Jones, Knösche, & Turner, 2012). Lastly, besides validation of diffusion pipelines with dedicated phantom data mainly focusing on geometrical metrics of fiber tracts (Côté et al., 2013), there is currently no objective way to inspect the quality of tractograms in vivo, especially with respect to accurate quantification of tracking errors.
The quantification of white matter properties based on diffusion data also remains challenging. Fiber-specific metrics are quantified by the generally unreliable fiber-count (Jones et al., 2012) or ROI-based approaches. The evaluation of diffusion metrics along segmented tractography bundles was introduced by (Colby et al., 2012) and (Yeatman, Dougherty, Myall, Wandell, & Feldman, 2012). The Automated Fiber Quantification (AFQ) framework allows the automatic identification and segmentation of major white matter tracts and evaluates scalar diffusion measures such as fractional anisotropy (FA) along these trajectories to quantify changes within the tract diffusion profiles among different subjects or groups (Yeatman et al., 2012). A first attempt to correct for tractography biases by estimating an actual contribution for each tract was introduced by Sherbondy et al. using a stochastic algorithm on a supercomputer architecture (Sherbondy, Dougherty, Ananthanarayanan, Modha, & Wandell, 2009;Sherbondy, Rowe, & Alexander, 2010). Another method introduced by Smith et al.
is based on a nonlinear gradient descent method called sphericaldeconvolution informed filtering of tractograms (SIFT). This approach removes fibers of an initially large fiber population to improve the fit between the streamline distribution in each voxel and the fiber ODF (Smith, Tournier, Calamante, & Connelly, 2013). Thereby, a cost function describing the deviation between fiber densities and FOD lobe integrals is minimized by iteratively removing fibers. Fiber densities are calculated by incorporating the length and tangent of reconstructed fibers within a voxel and compared to the corresponding fiber ODF lobes. However, the SIFT approach requires a large amount of initial fibers to determine an optimized subset of included and excluded fiber tracts.
Its successor, SIFT 2 (Smith, Tournier, Calamante, & Connelly, 2015) reduces this requirement, as it determines an effective crosssectional area for each streamline, represented by a floating-point weighting factor for each fiber, instead of a binary keeping or removing of fibers in comparison to the initial SIFT. Pestilli, Yeatman, Rokem, Kay, & Wandell (2014) introduced a similar method, that is, linear fascicle evaluation (LiFE), which is based on the diffusion signal, predicted from the connectome, instead of the FOD. The default forward model is a degenerated tensor representing a stick with zero radial diffusivity. To deal with isotropic compartments, the signal mean is subtracted in each voxel prior to the optimization. Daducci, Dal Palu, Lemkaddem, & Thiran (2015) pursued a similar approach introducing the Convex Optimization Modeling for Microstructure Informed Tractography (COMMIT) framework, though using a more complex forward model by describing both the intracellular stick model, and the extracellular compartment by a tensor. Furthermore, gray matter and cerebrospinal fluid (CSF) are also represented with two distinct isotropic components. It is tempting to interpret the resulting fiber weights as quantitative connectivity measures between brain regions, however, the described optimization methods have their own pitfalls. For example, in voxels with poor or incorrect fiber representations due to tracking errors, noise or partial volume contaminations, compartments are typically overcompensated by increasing the weights of the few present fibers, isotropic or extracellular compartments in order to decrease the global fit error. An overview of pitfalls and open challenges is given in (Daducci, Dal Palu, Descoteaux, & Thiran, 2016).
Here, we propose a novel approach which enables the inspection of the quality and validation of a tractogram optimization such as COMMIT by evaluating FOD characteristics of the error signal along and perpendicular to fiber bundles by utilizing the AFQ framework.
The quality metrics proposed allow for a better understanding of the accuracy and error sources of tractograms and help identifying regions with poorly fitted data. We further show that these metrics, combined with a newly introduced error FA, allow a better interpretation of the directional error distribution. These are important steps toward interpreting fiber weights from a tractogram optimization in a quantitative way to, for example, construct a more meaningful connectivity measure in a connectome. Furthermore, we also present a fiber upsampling procedure: It allows to increase the number of streamlines of a given fiber bundle, in case of, for example, underrepresentation of a certain structure due to anatomical properties, fiber geometry, seeding pattern or algorithmic constraints. Analyzing the introduced error measures on specific fiber bundles shows the benefit of using up-sampled fiber bundles.

| MATERIALS AND METHODS
The major steps of a typical connectome generation process is shown in a simplified form in Figure 1. It is crucial to perform the optimization after the segmentation and up-sampling steps in order to avoid the partial fiber problematic discussed in (Daducci et al., 2016). In this work, in contrast to a connectome pipeline, the segmentation step is not based on cortical parcellation, but performed using the AFQ framework (AFQ: RRID:SCR_014546). This choice was motivated by the ability of the AFQ framework to reliably quantify measures along tracts.
The method section is organized as follows. First, the acquisition protocol, preprocessing steps and tractography algorithm is described.
However, these parameters can easily be swapped with other protocols or tractography algorithms. Thereafter, the AFQ segmentation, fiber up-sampling, COMMIT optimization and error quantifications, including the introduced error measures are described in more detail.

| In-vivo diffusion data acquisition
Diffusion MRI data were acquired on a Philips Achieva 3T TX system (Philips Healthcare, Best, the Netherlands), equipped with 80 mT/m gradients and a 32-element receive head coil array, using a diffusionweighted single-shot spin echo EPI sequence. The study was approved by the local ethics committee and meets the guidelines of the declaration of Helsinki. Written informed consent was obtained from all subjects.
Data sets from 16 healthy volunteers (age: 31.6 ± 8.6, gender: 12 male, 4 female) were acquired with the following diffusion scan parameters: TR: 11.85 s, TE: 66 ms, FOV: 220 × 220 mm 2 , with 40 contiguous slices, slice thickness: 2.3 mm, acquisition and reconstruction matrix: 96 × 96, SENSE factor: 2, partial Fourier encoding: 60%. Diffusionweighted images were acquired along 64 directions distributed uniformly on a half-sphere with a b-value of 3000 s/mm 2 in addition to a b = 0 s/mm 2 scan, resulting in a scan time of approximately 13 min.

| Preprocessing and tractography
For each data set, the diffusion data was corrected for eddy-currents and subject motion by FSL: RRID:SCR_002823 (EDDY) (Jenkinson, Beckmann, Behrens, Woolrich, & Smith, 2012). The white matter mask was estimated from the T1-weighted data set using the tissue segmentation in SPM8: RRID:SCR_007037 (www.fil.ion.ucl.ac.uk/ spm) and transformed back to diffusion space using SPMs coregister function based on normalized mutual information. A Fiber Assignment by Continuous Tracking (FACT) inspired deterministic algorithm generalized to the Orientation Distribution Function (ODF) was used in the tractography step. The ODF was reconstructed using the FRACT method (Haldar & Leahy, 2013). The tracking direction was selected according to the local diffusion maximum of the ODF. Ten seeds were started in each white matter voxel, resulting in approximately 700,000 fibers per subject. The estimated white matter mask was only used for seeding purposes and was not utilized as a tractography stopping criterion.

| Fiber segmentation and up-sampling
The segmentation of the tractograms was performed using the AFQ framework (Yeatman et al., 2012), which is based on a waypoint ROI procedure as described in (Wakana et al., 2007). Additionally, a refinement step was applied, which compares each candidate fiber to tract probability maps (Hua et al., 2008). To avoid conflicting start and endpoints of fibers running through the two ROIs of the target fiber structure, a flip was performed on all tracts which first passed through the second ROI, resulting in consistent fiber alignment in each bundle.
These segmentation steps resulted in the selection of 20 major white matter fiber tracts (Yeatman et al., 2012) out of all white matter fibers contained in the whole-brain tractogram (18 bundles as described in (Yeatman et al., 2012), and two additional tracts as defined in the online version: https://github.com/jyeatman/AFQ).
Next, to increase the number of fibers of potentially underrepresented fiber populations in the different AFQ segmented bundles, for example, due to tractography algorithm biases, the following method was applied: The segmented fibers were equidistantly resampled using 80 interpolation points per fiber and principal component analysis (PCA) was applied to all classified and resampled fibers (Parker et al., 2013). The space was truncated to the first 80 dimensions (from the 240 point descriptors), whereby more than 99% of the explained variance was still captured. In the PCA space, for each bundle separately, new fibers were randomly generated according to the point distribution of the transformed fibers, assuming a bundle-specific multivariate Gaussian distribution. The newly generated fibers were transformed back by inverting the linear PCA transformation.
In a further step, potential outliers were identified based on the calculation of a population-mean fiber, that is, the mean value of all corresponding resampled points of the initial fibers within one fiber bundle. The distance of each randomly generated fiber to the original population-mean fiber was derived by summing up the distances to the nearest points on the mean fiber. New fibers were only accepted if F I G U R E 1 A schematic connectome pipeline is depicted including the positions for proposed up-sampling and validation steps the distance-threshold to the initial population was met. This threshold was set to the maximum fiber distance of all fibers within the initial population relative to its mean fiber. Newly generated tracts leaving the white-matter mask were also rejected. Based on these fiber population up-sampling steps, additional 10,000 fibers per bundle were generated for each data set.
Finally, the up-sampled fibers were again segmented using the AFQ framework to apply the same classification criteria to the newly generated fibers as to the initial tractogram. Around 75% of the upsampled fibers were successfully classified and therefore kept for the further analysis. With the procedure described above, a total of four tractography sets were generated:

AFQ
Classified AFQ fibers based on the initial tractogram AFQUP AFQ set combined with the up-sampled AFQ fibers WB Initial whole-brain tractogram WBUP WB combined with the up-sampled AFQ fibers

| Fiber optimization, optimized tractogram
The optimization of the different tractogram sets was performed using the COMMIT framework (Daducci et al., 2015)

| Error quantification
In addition to the normalized root mean square error (NRMSE) of the optimization fit, an actual signal estimator ŝ was calculated using Ax , by reverting the b = 0 normalization. To further examine the differences and similarities between this signal estimator ŝ and the acquired diffusion data s, a directional error FOD of the signal estimator ŝ and the original diffusion data s was calculated. Remaining signal contributions from under-or overrepresented fibers are assumed to remain in the error signal. The FOD for the diffusion signal estimator was reconstructed by applying the constrained spherical deconvolution (Tournier, Calamante, & Connelly, 2007) to the error signal, which is defined by the element wise difference between the measured and estimated diffusion signals: In order to use a meaningful deconvolution kernel and to be comparable to the FOD derived from the measured signal s, the response function was not re-estimated on the error signal; instead the fiber response from s was used. A maximum spherical harmonics order of l max = 8 was used. Furthermore, a traditional tensor fit of the signal error s err was derived in order to calculate the fractional anisotropy (FA) of s err .
To quantify the different error measures along the segmented and optimized AFQ fiber bundles, we extended the tract profile generation of the AFQ framework. In (Yeatman et al., 2012), FOD along, error FOD perpendicular, and error FA. These measures were tested for statistical significance between the initial and upsampled tractogram sets and were corrected for multiple comparison, using the nonparametric permutation test implemented in FSL (Winkler, Ridgway, Webster, Smith, & Nichols, 2014). The number of permutations were set to 5000 with a significance level of p < .05.
Furthermore, the up-sampling method was also compared with an increase of seed points during the tractography step. Therefore, the number of seed points was increased incrementally up to a factor of eight in a single subject. The resulting tractogram sets were segmented using the AFQ framework and either optimized or up-sampled and optimized for the comparison. The up-sampled tractogram sets were also segmented a second time prior to the optimization.

| RESULTS
In Figure

| DISCUSSION
We have introduced a tool to investigate the quality of a tractogram by further inspecting the directionally dependent error signal between the signal prediction and the measured diffusion signal along reconstructed fiber bundles. Additionally, we presented a method to up-sample a given fiber population in order to achieve better optimization results, that is, a decreased fit error.
The overall mean fit error averaged over all the white-matter voxels and all subjects showed only small, but nevertheless significant changes comparing the initial (AFQ, WB) with the up-sampled (AFQUP, WBUP) fiber tractograms. These small changes at the group level could be attributed to large intersubject variability; however, the up-sampled sets achieved a reduced fit error in each single subject (Figure 3a and b)  In the whole-brain sets (WB and WBUP, Figure 5) no fiber populations were purposely omitted, and therefore a much more homogeneous NRMSE distribution was found in the brain.
In a next step, we further explored the error distribution across the diffusion directions in each voxel. Therefore, the error FA was calculated and evaluated to reveal potential anisotropy in the error signal ( Figure 6). In voxels with a good fit, a low anisotropy is expected, The ultimate goal of using streamline weights to quantify connectivity strength between cortical regions is still not achievable if the tractogram or the optimization is flawed. The fiber-dependent error estimation could also be used to estimate a confidence of the resulting fiber weights aiming to verify the integrity of a quantitative connectivity matrix between cortical regions. Additionally, the difficulty of omitting fibers due to any segmentation and its influence on to the optimization was shown. In the given framework, the segmentation method can easily be exchanged with a segmentation, ,for example, based on a cortical parcellation. Thereby, the up-sampling can also be applied to more fiber bundles. However, other segmentation methods, for example, based on a cortical parcellation scheme also suffer from unclassified fibers which cannot be included in the tractogram optimization step.
The introduced fiber up-sampling method improved the tractogram representation, hence led to a significantly superior optimization expressed by a reduced NRMSE. Even though, the introduced approach did not enhance the optimization in every single structure, impairment caused by the fiber up-sampling was not observed. Additionally, the up-sampling was performed per bundle, whereby a bundle segmentation is a necessary prerequisite.
This requirement might be eliminable, if a different sampling strategy is applied to randomly draw samples in the PCA space. The assumption of a multivariate Gaussian distribution is no longer valid in a whole-brain fiber population and would lead to many implausible up-sampled fibers.
In Figure 10, the fiber up-sampling is compared with increasing the number of seed voxels during tractography. Even an eight-fold increase of fibers did not improve the optimization result substantially.
However, the up-sampling is not only computationally less intensive, F I G U R E 9 A coronal section through the Corona Radiata is shown in a single subject, whereby the signal fiber orientation distribution (FOD)s derived from the measured diffusion signal are depicted in transparent gray, the error FODs derived from the WB set are overlaid in color. Three different voxels are highlighted where the error is significantly larger compared to the other voxels F I G U R E 1 0 Different number of seed voxels and the resulting mean normalized root mean square error (NRMSE) for the automated fiber quantification (AFQ) and AFQUP tractogram sets are shown it also introduces new fibers with different features, a larger spatial extent, and therefore novel trajectories and the up-sampled fibers do differ substantially from the initial population. These new fibers contribute significantly to a more optimal tractogram set. Similar effects were reported by (Takemura, Caiafa, Wandell, & Pestilli, 2016) while different algorithm parameters were introduced instead of only increasing the number of seed points. The presented framework also enables the comparison of different tractogram sets from various sources and allows a more extensive inspection of each tractogram and its strength and weaknesses without defining an explicit gold standard.
For a reliable fiber quantification, it is crucial to eliminate error sources such as a bad fit due to wrong choice of the forward model, poor tractogram representation caused by the choice of tracking algorithm and tractography parameters or an overcompensation in voxels with a low number of streamlines. As discussed in (Smith et al., 2015), FOD lobes of voxels containing very little streamlines compared to the actual measured fiber density will result in assigning high weights to those fibers in order to reduce the error in that particular voxel, but introducing biases in all the other voxels traversed by those fibers. Similarly, the COMMIT model will assign higher volume fractions to extracellular compartments in order to compensate for missing streamlines representing the intracellular compartment. The presented up-sampling procedure can help to limit voxels containing a low number of streamlines, especially in cases where a higher track density cannot be achieved by simply increasing the number of seed voxels. With respect to global tractography algorithms, an increase of fibers is typically computationally very expensive, whereas the described up-sampling method is very efficient. The introduced error measures such as the error FA or the error FOD itself could also be fed back into the tractography process to influence the placement of new seed voxels or adjust tractography parameters in order to achieve a more representative tractogram. In this study, only single shell diffusion data was used for the optimization, whereas the COMMIT framework clearly benefits from multiple shells, for example, in order to distinguish between different isotropic compartments.

CONFLICTS OF INTEREST
None declared.