Uncertainty and its representation have an important role to play in any situation where the goal is to infer useful information from noisy data. In diffusion-weighted MRI (DW-MRI) scientists attempt to infer information about, for example, diffusion anisotropy or underlying fiber tract direction, by fitting models of the diffusion and measurement processes to DW-MRI data (e.g., Refs. 1, 2). In this scheme there is uncertainty caused both by the noise and artifacts present in any MR scan, but also by the incomplete modeling of the diffusion signal. That is, the true diffusion signal is more complicated than we choose to model. This additional complexity in the diffusion signal appears as residuals when we fit a simple model to the data, causing additional uncertainty in the model parameters. All of the uncertainty in these parameters may be represented in the form of probability density functions (pdfs). This article is essentially divided into two parts, dealing separately with uncertainty at the local and global levels. In the first part, we describe a technique for estimating the pdfs on all parameters in any local model of diffusion. We will show results derived from two simple models of the diffusion process within a voxel: The diffusion tensor model which assumes a local 3D Gaussian diffusion profile, and a simple partial volume model of local diffusion, which assumes that a fraction of diffusion is along a single dominant direction, and that the remainder is isotropic. We will then make suggestions for the extension to more complete models of the diffusion process which are able to account for one, or more, distribution of fiber directions within the voxel. In all of these models, the use of Bayesian techniques allows for the application of prior constraints on parameters in the model where such constraints are sensible. For example, in the fitting of the diffusion tensor model, the eigenvalues of the diffusion tensor are constrained to be positive.
The distributions on parameters in a diffusion model are of great significance when making inference on the basis of these parameters. Inference may be at a group level; for example, there have been studies showing reduced anisotropy in groups of multiple sclerosis patients, in comparison with groups of normal subjects (e.g., Ref. 3). However, inference may also be within a single subject. There have been many recent articles (e.g., Refs. 4, 5, 6) describing techniques for using parameters from a diffusion tensor fit to follow major white matter pathways in the brain. However, none of these techniques attempt to quantify the uncertainty in the resulting white matter connections. The output of these algorithms is a set of nodes describing the maximum likelihood pathway through the DTI data, with no measure of confidence on the location of this pathway. The lack of this information makes interpretation of the output pathways difficult, and also makes it hard to devise strategies for tracing reliably in uncertain areas. For both of these reasons, streamlining algorithms to date have chosen not to trace pathways through areas of low diffusion anisotropy (e.g., Refs. 5, 7). Diffusion anisotropy tends to be low in areas of high uncertainty in fiber direction (although the converse is not necessarily true (8)), and therefore, by tracing fibers only when anisotropy is high, streamlining algorithms have tended to generate pathways which (if they had been calculated) would have had narrow confidence bounds on them. This knowledge means that reconstructed pathways are often interpretable as major fiber tracts in the brain (9), but places limits on areas where it is possible to create them. In the second part of this article, we give the mathematical formulation for deriving spatial PDF on connectivity between point A and every other point in the data field given the local pdfs. This PDF is an explicit representation of the confidence regions for pathways in the data. We go on to present a sampling technique to generate this PDF in a computationally efficient manner and describe and discuss technical details, such as data interpolation, required in any fiber-tracing algorithm. We present resulting connectivity PDFs from seed voxels in the thalamus, a deep gray matter structure with relatively low diffusion anisotropy. We show that connectivity distributions estimated from diffusion imaging data in human correspond well with predictions from sacrificial tracer studies in primate. Further results from this study appear with detailed discussion and interpretation in Ref. 8.
An important point to note is that, throughout this article, the estimated probability distributions are pdfs on parameters in a model. This is to be contrasted with the Gaussian distribution described by the diffusion tensor fit (10), and with more recent work (e.g., Ref. 11) which have attempted to recreate the diffusion spectrum as a probability distribution on the displacement, rfinal− r0of a particle within initial locationr0 in the voxel after a diffusion time td. There are crucial differences here, both conceptually and practically.
DENSITIES, BAYES, AND MCMC
When fitting a parametrized model to data, there are two general approaches which may be taken. The first is to look for the set of parameters (ω) which best fit the data. This is called a point estimate of the parameters. A special case of this is Maximum Likelihood estimation, where we look for the set of parameters which maximize the probability of seeing this realization of the data given the model and its parameters:
where Y is the data and M is the model.
The second approach is to associate a pdf with the parameters. In the Bayesian framework, this distribution is called the posterior distribution on the parameters given the data:
This posterior density allows us to ask the question of any hypervolume in parameter space Ω, “What is our belief given the measured data that the true value of ω is in ?”. In the one-dimensional case, this question becomes, for any (ω0, ω1) “What is our belief that ω lies between ω0 and ω1?”. These questions, and their answers, represent the uncertainty we have in the values of the parameters ω.
Unfortunately, calculating this pdf is seldom straightforward. The denominator in Eq.  is:
an integral which is often not tractable analytically. To make matters worse, this joint posterior pdf on all parameters is often not the distribution which we are most interested in. We are often interested in the posterior pdf on a single parameter or an interesting subset of parameters. Obtaining these marginal distributions again involves performing large integrals,
where ωI are the parameters of interest and ω−I are all other parameters. Again, these integrals are seldom tractable analytically.
One solution to this problem is to draw samples in parameter space from the joint posterior distribution, implicitly performing the integrals numerically. For example, we may repetitively choose random sets of parameter values and choose to accept or reject these samples according to a criterion based on the value of the numerator in Eq. . It can be shown (e.g., Ref. 12) that a correct choice of this criterion will result in the accepted samples being distributed according to the joint posterior pdf (Eq. ). Schemes such as this are rejection sampling and importance sampling, which generate independent samples from the posterior. Any marginal distributions may then be generated by examining the samples from only the parameters of interest. However, these kinds of sampling schemes tend to be painfully slow, particularly in high-dimensional parameter spaces, as samples are proposed at random, and thus each has a very small chance of being accepted.
Markov Chain MonteCarlo (MCMC) (e.g., Refs. 12, 13) is a sampling technique which addresses this problem by proposing samples preferentially in areas of high probability. Samples drawn from the posterior are no longer independent of one another, but the high probability of accepting samples allows for many samples to be drawn and, in many cases, for the posterior pdf to be built in a relatively short period of time.
LOCAL PARAMETER ESTIMATION
In this section we present three models of the local diffusion process. The first is the familiar diffusion tensor model (10), which models the local diffusion as a 3D Gaussian. Then we choose two different models which attempt to model underlying fiber structure in a voxel and, from this, predict the diffusion-weighted signal. The first of these is a simple partial volume model allowing for a single fiber direction mixed with an isotropically diffusing compartment in a voxel. The second is a parametrized model of the transfer function between a distribution of fiber orientations in a voxel and the measured diffusion-weighted signal. We infer from the first two of these models using MCMC to estimate the posterior distributions on parameters of interest. We present detailed results from a single white matter voxel showing recovered distributions from both models. We go on to present a validation study, comparing distributions throughout a slice with those recovered from empirical measurements of uncertainty (14).
Local Parameter Estimation: Theory
Diffusion Tensor Model
The diffusion tensor has often been used to model local diffusion within a voxel (e.g., Refs. 10, 15, 16). The assumption made is that local diffusion may be characterized with a 3D Gaussian distribution (10), whose covariance matrix is proportional to the diffusion tensor, D. The resulting diffusion-weighted signal, μi along a gradient direction ri, with b-value bi is modeled as:
where S0 is the signal with no diffusion gradients applied. D, the diffusion tensor is:
When performing point estimation of the parameters in the diffusion tensor model, it has been convenient to choose the free parameters in the model to be the six independent elements of the tensor, Dxx − Dzz, and the signal strength when no diffusion gradients are applied, S0. This parametrization allows estimation to take the form of a simple least-squares fit to the log data. When sampling, however, our choice of parametrization is far less constrained by our estimation technique. The parameters of real interest in the tensor are the three eigenvalues and the three angles defining the shape and orientation of the tensor. By choosing these as the free parametres in the model, not only do we give ourselves immediate access to the posterior pdfs on the parameters of real interest, but we also allow ourselves the freedom to apply constraints or add information exactly where we would like to. As a simple example, as will be seen later, a sensible choice of prior distribution on the eigenvalues makes it easy to constrain them to be positive. So the diffusion tensor is now parametrized as follows:
and V rotates Λ to (θ, ϕ, ψ), such that the tensor is first rotated so that its principal eigenvector aligns with (θ, ϕ) in spherical polar coordinates, and then rotated by ψ around its principal eigenvector.1
The noise is modeled separately for each voxel as independently identically distributed (iid) Gaussian, with a mean of zero and standard deviation (SD) across acquisitions of σ. The probability of seeing the data at each voxel Y given the model, M, and any realization of parameters set, ω = (θ, ϕ, ψ, λ1, λ2, λ3, S0, σ) may now be written as:
where n is the number of acquisitions, and yi and μi are the measured and predicted values of the ith acquisition, respectively. (Note that throughout this article i will be used to index acquisition number.)
Thus, the model at each voxel has eight free parameters, each of which is subject to a prior distribution. Priors are chosen to be noninformative, with the exception of ensuring positivity where sensible:2
Parameters a and b in the Gamma distributions are chosen to give these priors a suitably high variance such that they have little effect on the posterior distributions except for where we ensure positivity. Note that the noninformative prior in angle space is proportional to sin(θ), ensuring that every elemental area on the surface of the sphere, δA = sin(θ)δθδϕ has the same prior probability.
Simple Partial Volume Model
Here we take a slightly different approach to modeling in DWMRI. Instead of modeling the diffusion shape directly, we attempt to build a model of the underlying fiber structure which predicts the diffusion shape, and hence the MR measurements. The simplest such model of fiber structure is to assume that all fibers pass through a voxel in the same direction. Assuming no diffusion–diffusion exchange, this leads to a simple two-compartment partial volume model. The first compartment models diffusion in and around the axons, with diffusion only in the fiber direction. The second models the diffusion of free water in the voxel as isotropic. One consequence of this model is that the diffusivity (and hence the restriction to water diffusion) in all directions perpendicular to the fiber axis is constrained to be the same. This is very different from the diffusion tensor model, where any ellipsoidal diffusion shape may be modeled.
The predicted diffusion signal is:
where d is the diffusivity, bi and ri are the b-value and gradient direction associated with the ith acquisition, f and RART are the fraction of signal contributed by, and anisotropic diffusion tensor along, the fiber direction (θ, ϕ). That is, A is fixed as:
and R rotates A to (θ, ϕ).
Again, noise is modeled as iid Gaussian:
where the parameter set ω now has six free parameters (σ, S0, d, f, θ, ϕ). Each of these parameters is subject to a prior distribution, which are chosen to be noninformative except for where we ensure positivity:
Increasing the Complexity—A Distribution of Fibers?
In the partial volume model presented above, only a single fiber orientation is modeled in each voxel. In fact, there will be a distribution,H(θ, ϕ), of fiber orientations in the voxel. In order to estimate this distribution we must build a model which, given this distribution, could predict the diffusion-weighted MR measurements.
Such a model clearly requires some assumptions. We start by assuming that each subvoxel has only one fiber direction through it, that the MR signal from the voxel is the sum of the signal from arbitrarily small subvoxels, and that the signal from each subvoxel behaves as described by Eq. . (Note that this final assumption is a strong assumption to make, but it is explicit in the model. Any other model of the local diffusion characteristics of a single fiber orientation may be used as a replacement.)
where μtotal is the vector of MR signal from the voxel at each gradient direction and strength, and μj is the same vector for each subvoxel.
If we now consider, instead of the individual subvoxels, the set ΘΦ of major directions (θ, ϕ) in these subvoxels (note the discretization of ΘΦ), then Eq.  is identically equivalent to (see Eq. ):
where Vθϕ is the set of all voxels whose principal fiber direction is (θ, ϕ) and N is the number of subvoxels. This equation, although fearsome at first sight, is actually very straightforward. The first part of the argument to the summation (on the top line) represents the signal due to all of the isotropic compartments, and the second part represents the signal due to all of the fiber compartments. If we now further assume that S0 (the signal with no diffusion gradients applied) and d (the diffusivity) are constant across the voxel, then the inner summation (over voxels which have the same principal direction) may be replaced by a constant for the isotropic compartment, and in the anisotropic compartments, by the distribution function H(θ, ϕ) defined earlier. With a little more manipulation and by letting the subvoxel size tend to zero, it is easy to arrive at:
where 1 − f is now the proportion of the whole voxel showing isotropic diffusion. Note that the integral is over sin(θ)dθdϕ in order to maintain elemental area over the sphere. Finally, if we write the gradient direction ri in spherical polar coordinates ri = [sin αicos βi sin αisin βi cos αi], and define γi as the angle between gradient direction, (αi, βi), and fiber direction (θi, ϕi), then the exponent inside the integral reduces dramatically. We may now write:
This equation reveals a great deal about the diffusion measurement process. The real “signal” of interest is H(θ, ϕ), the distribution of fibers within the voxel. When we measure the diffusion profile of this signal we are measuring a version of this signal which is smoothed in angular space, with a kernel, predicted by this model, of exp(−bd cos2γ). We would like to deconvolve the effect of the measurement process from the signal. However, we leave the details of this estimation process, and validation thereof, as future work.
Local Parameter Estimation: Methods
DT-MRI datasets were acquired on a single, healthy volunteer. The images were obtained on a 3.0 T Varian Inova scanner using a diffusion-weighted single-shot EPI sequence. To minimize eddy currents, a doubly refocused spin-echo sequence was implemented (17). A birdcage RF head coil was used for both pulse transmission and signal detection. The diffusion gradients achieved a maximum gradient strength of 22 mTm−1. Each dataset consisted of three nondiffusion-weighted and 60 diffusion-weighted images acquired with a b-value of 1000 smm−2. The diffusion gradients were uniformly distributed through space using the optimized scheme proposed by Jones et al. (18). Each set of images contained 42 contiguous slices with a 2.5 mm thickness. A half k-space acquisition was performed with a matrix size set to 62 × 96 and a field of view of 240 × 240 mm2. The images were interpolated to achieve a matrix size of 128 × 128 and a final resolution of 1.875 × 1.875 × 2.5 mm3. To minimize motion artifacts, peripheral grating was used such that triggering occurred on every cardiac cycle. The echo time was set to 106 ms while the effective repetition time was 14 R-R intervals. The total scan time for each dataset was approximately 15 min, depending on heart rate.
MCMC estimation was performed for the diffusion tensor model and for the simple partial volume model. In both cases parameters were initialized with a least-squares diffusion tensor fit. The Markov Chains were then jumped 500 times without sampling as a “burnin” (12), followed by 2000 times, sampling every second jump, to give 1000 samples. A single jump of the parameter set consisted of independent jumps of each parameter. In both models samples were drawn from the precision (1/σ2) with a Gibbs sampler and from all other parameters with Metropolis Hastings samplers. Proposal distributions for Metropolis Hastings parameters were zero mean Gaussians with SDs tuned adaptively to give a jump acceptance rate of 0.5. The full conditional distributions for the Gibbs sampling of the precision in both models are given in the Appendix. Computation time for diffusion data with 63 acquisitions is approximately 0.3 sec per voxel on a Pentium IV 2 GHz. Voxels are processed independently, so computation is easily parallelized.
Local Parameter Estimation: Results
Example Distributions From a Single Voxel
Figure 1a,b shows samples from the marginal posterior distributions on θ and ϕ from the diffusion tensor model. The voxel was chosen from the splenium of the corpus callosum. Figure 1c shows 1a,b plotted as a joint histogram around the surface of a sphere. This is then the joint marginal posterior distribution of θ and ϕ or the marginal posterior distribution of principal diffusion direction (PDD). Note how narrow this distribution is. This represents a high confidence in our calculated PDD, which is as predicted in an area of dense white matter such as the corpus callosum. Figure 2a,b shows samples from the marginal posterior distributions on θ and ϕ from the simple partial volume model. The same voxel was chosen as in Fig. 1. Again, Fig. 2c shows 2a,b plotted as a joint histogram around the surface of a sphere.
Validation: Comparison With Empirical Measurements
The posterior pdfs on the parameter estimates, in either of the above models, characterize our uncertainty in these parameters. In Ref. 14, Jones proposes an empirical method for estimating this uncertainty. Following this method, we acquired three repeats of diffusion data with 63 gradient directions and bootstrapped, to create 1000 datasets of different combinations of these repeats. We fit a diffusion tensor at each voxel in each of these new datasets and calculated the uncertainty between the 1000 principal eigenvectors at each voxel. This uncertainty is measured as the size of the 95% confidence angle from the mean direction.
Using only one of these 1000 datasets we drew 1000 samples from the posterior pdf on principal diffusion direction at each voxel under both the diffusion tensor and simple partial volume models. From these samples we computed the same 95% angle from the mean direction.
Figure 3 shows these 95% angles for the diffusion tensor model in Fig. 3a and the partial volume model in Fig. 3b; Fig. 3c shows the same angles predicted by Jones' method.
There are various factors to consider when comparing these results. The first is that the empirical method in Fig. 3c is not necessarily “ground truth.” It has errors associated with it due to subject motion and interpolation related effects, but also, more subtly, due to the dependence within the bootstrapped datasets. This is likely to cause an underestimate in the measured uncertainty. The second factor is the difference in the two models. Figure 3a,c predict uncertainty levels in the principal eigenvector of a diffusion tensor model. Figure 3b predicts the same thing for the less flexible partial volume model. In areas of complex fiber structure, the partial volume model, which has only one fiber direction available to it, is forced to represent this structure as uncertainty in the single direction (this will turn out to be extremely useful when trying to do tractography, as will be seen in later sections). In contrast, the diffusion tensor model will tend to account for complex fiber structure in a voxel not only with uncertainty in the principal fiber direction, but also with a change in the diffusion profile (i.e., a change in the relative sizes of the three eigenvalues). For this reason we would predict that, in regions of complex fiber structure, the partial volume model would show more uncertainty in principal diffusion direction than the diffusion tensor model. We would expect the two models to predict very similar uncertainties in regions of high fiber co-alignment, such as in the corpus callosum (Fig. 3d).
The mean 95% confidence angles within the brain for the three methods are: diffusion tensor model and MCMC (a) 35.4°, partial volume model and MCMC (b) 36.0°, and diffusion tensor model with empirical measurements (Jones) (c) 33.9°. We further compare any two of these three methods by computing their absolute difference as a fraction of their mean value at every voxel, defining fractional deviation (Table 1):
Table 1. Fractional Deviation Values Between the Three Methods in the Whole Brain (Left) and Within the Corpus Callosum (Right)
Inside each cell is the mean with the median in parentheses.
Predictions of uncertainty by MCMC on the two models are within 10% of each other throughout the brain and within 5% in the callosal mask, showing, as predicted, very similar uncertainty where fibers are highly co-aligned, and slight differences in uncertainty in other areas. With the diffusion tensor model, uncertainties predicted by MCMC are within 15% of those predicted by the empirical method when considering the whole brain and 13% when only considering the corpus callosum. These differences are small and may be due to errors in either or both approaches.
GLOBAL CONNECTIVITY ESTIMATION
Global Connectivity Estimation: Theory
In the previous section we described techniques for estimating, at each voxel, probability distributions on every parameter in the chosen model of diffusion. In this section we use these local pdfs from the simple partial volume to infer on a model of global connectivity. The reason we chose this model is explained in detail in the previous section. We wish to maximize the chances that complex fiber structure will be represented by uncertainty in principal direction. We now require a model to take us from the local parameters in this model to parameters describing global connectivity. Note that, throughout the remainder of this article, subscript x refers to “every voxel in the brain.” Hence, (θ, ϕ)x refers to the complete set of principal diffusion directions.
Consider the case where the values of the local parameters are known with no uncertainty. What do they tell us about anatomical connectivity between voxels in the brain? In the case where our local model describes only a single fiber direction passing through the voxel, this global model can only take one form:
Where (∃A → B|(θ, ϕ)x) is the probability of a connection existing between pointsA and B, given knowledge of local fiber direction.
In order to solve this equation we may simply start a connected path from a seed point,A, and follow local fiber direction until a stopping criterion is met. If B lies on this path we may say that a connection exists between A and B. This procedure is at the heart of all “streamlining” algorithms (e.g., Refs. 5, 6, 19), which choose (θ, ϕ)x to be the principal eigendirection of the estimated diffusion tensor at each voxel.
However, in the case where there is uncertainty associated with (θ, ϕ)x, we would like to compute the probability of a connection existing given the data, Yx, which is known. That is, we would like to compute (∃A → B|Yx). In order to calculate thispdf we would have to perform the following integrations:
That is, for each possible value of fiber direction at every voxel (θ, ϕ)x, we must incorporate the probability of connection given this (θ, ϕ)x, and also the probability of this (θ, ϕ)x given the acquired MR data. This process is known as marginalization (see, e.g., Ref. 20).
It can be seen from Eq.  that (∃A → B|Y) reduces to (∃A → B|(θ, ϕ)x) when the localpdfs on fiber direction ((θ, ϕ)x) are delta functions. That is, when there is no uncertainty in the local fiber direction, Eq.  reduces to the streamlining (maximum likelihood) solution. However, when local fiber direction is uncertain, (∃A → B|Y) will be nonzero for some B not on the maximum likelihood streamlines. That is, the global connectivity pattern from A will spread to incorporate the known uncertainty in local fiber direction.
However, even in the discrete data case, Eq.  represents a v dimensional (where v is the number of voxels in the brain) integral over distributions with no analytical representation (the local pdfs, generated with MCMC), and hence clearly cannot be solved analytically.
Fortunately, as we have seen in previous sections, even when explicit integration is unfeasible, it is often possible to compute integrals implicitly by drawing samples from the resulting distribution. In our case, in order to draw a sample from (∃A → B|Y) we may draw a sample from the posterior pdf on fiber direction at each point in space and construct the streamline (henceforth referred to as a “probabilistic streamline”) from A given these directions. Computationally, this process is extremely cheap. Samples from the local pdfs at each voxel have already been generated, so to generate a single probabilistic streamline from seed point A, referring to the current “front” of the streamline as z, it is sufficient simply to start z at A and:
Select a random sample, (θ, ϕ) from (θ, ϕ|Y) at z.
Move z a distance s along (θ, ϕ).
Repeat until stopping criterion is met.
This probabilistic streamline is said to connect A to all points B along its path. By drawing many such samples, we may build the spatial pdf of (∃A → B|Y) for all B. We may then discrteize this distribution into voxels by simply counting the number of probabilistic streamlines which pass through a voxel B, and dividing by the total number of probabilistic streamlines.
The sampling technique above relies on the local pdfs existing in continuous space. Unfortunately, we only have access to MR acquisitions, and hence these local pdfs, on a discrete acquisition grid. We need a technique to generate samples from the local pdfs at a point not on the grid.
An obvious solution to this problem would be to interpolate the original data (using a standard interpolation scheme, such as sinc or trilinear interpolation), and generate the local pdf on fiber direction given this new interpolated data. This would be extremely computationally costly, but also, on further consideration, may not conceptually be the best thing to do. In the middle of large fiber bundles, where neighboring voxels have similar fiber directions (each with low uncertainty), the choice of interpolation scheme will have very little effect. However, in places where neighboring voxels may have significantly different directions, such as at the edge of fiber bundles or where different bundles meet, such an interpolation scheme will generate a fiber direction in between the directions of the voxels on the grid. Moreover, the result of sinc or linear interpolation of data which is related to parameters in a highly nonlinear (e.g., exponent of trigonometric functions) manner is likely to produce interpolated data which does not fit well to the model, and thus the resulting most probable fiber direction will be highly dependent on the noise in the measurements at the grid locations. An alternative to interpolating the data in this fashion is to choose an interpolation scheme which will pick a sample from one of the neighboring voxels on the grid. In a probabilistic system, we also have the opportunity to use a probabilistic interpolation scheme. That is, we can choose a scheme which chooses the data from a single neighboring point on the acquisition grid, but the probabilities of choosing each neighbor will be a function, g, of their positions relative to the interpolation site. There are many possible functions for g, but we have chosen one which is analogous to trilinear interpolation. That is, in the x-dimension the probability of choosing data from floor(x) is g(floor(x)|x) = ceil(x) − x/ceil(x) − floor(x), and from ceil(x) is g(ceil(x)|x) = 1 − g(floor(x)|x), and the same in the y and z-dimensions. If a streamline, z, were to pass through the same point twice, different nearest neighbors may be chosen, reflecting our lack of knowledge of the true pdf at that point.
Algorithms which generate streamlines based on maximum likelihood fiber directions (e.g. principal eigenvector from a diffusion tensor fit) have tended to require harsh streamline stopping criteria based on fractional anisotropy and local curvature (angle between successive steps). Fractional anisotropy thresholds have tended to be in the range of 0.2–0.4 (e.g., Ref. 7), and curvature thresholds have been as strict as requiring successive steps to be within 45° (e.g., Ref. 5). These criteria are in place to reduce the sensitivity of the streamlining to noise in the image, partial volume effects, and other related problems. The aim is to reduce the possibility of seeing false-positives in the results by only progressing when there is high confidence in fiber direction and when the direction is anatomically plausible. The downside of these constraints is the limitations that they impose on which fiber tracts may be reconstructed and where in the brain they may occur. For example, deep gray matter structures, despite displaying a high degree of order in their principal diffusion directions, tend to have low anisotropy (often below the threshold for streamlining algorithms). Streamlines will also tend to terminate well before cortex as anisotropy reduces and uncertainty in fiber direction increases.
In such circumstances a probabilistic algorithm has significant advantages. First, in regions where fiber direction is uncertain (these often coincide with regions of low anisotropy), the algorithm has available to it a direct representation of this uncertainty. Hence, even though it cannot progress along a single direction with high confidence, it can progress in many directions. The uncertainty in this area will be represented by voxels further along the path having lower probabilities associated with them; however, a high probability of connectivity to the seed voxel may still be associated with the region into which the paths progress. A second useful advantage of a probabilistic algorithm is robustness to noise. It can be difficult to track beyond a noisy voxel using a nonprobabilistic algorithm, as it may initiate a meaningless change in path. However, with a probabilistic algorithm, paths which have taken errant routes tend to disperse quickly, so that voxels along these paths are classified with low probability. In contrast, “true” paths tend to group together, giving a much higher probability of connection for voxels on these paths.
These advantages significantly reduce the need to anisotropy and curvature stopping criteria. The results presented here are generated with no anisotropy threshold and with a local curvature threshold of ±80° for each sample. This curvature threshold is required, as, without it, the sampled streamlines may track back along a path similar to one already visited, artificially increasing the probability along the path. In order to reduce this effect further, we check, at every step, whether the path is entering an area it has already visited and terminate those that are.
A Note on Interpretation
The implication of accounting for the uncertainty in local fiber directions, and hence estimating a spatial probability distribution of connectivity from the seed point, is that the recovered connectivity distribution is spread in space (see Global Connectivity Estimation: Results). It is tempting to think of this distribution as a distribution of connections from the seed point. This is manifestly not the case. According to the model used earlier in this section, this spatial pdf represents confidence bounds on the location of the most probable single connection. It is certainly true that some of the uncertainty estimated locally is likely to be due to partial volume effects, such as a spread of fiber directions in the voxel, and therefore the presence in the brain of multiple connection sites from the seed may result in a diffuse spatial pdf. However, while the model of diffusion at each voxel includes only a single fiber direction, the global inference is clearly on a single pathway.
Global Connectivity Estimation: Methods
Diffusion-weighted data were acquired with an optimized method based on echo planar imaging, implemented on a General Electric 1.5 T Signa Horizon scanner with a standard quadrature head-coil and maximum gradient strength of 22 mTm−1.
The diffusion weighting followed an optimized scheme (21) where the diffusion weightings were isotropically distributed along 54 directions. With the diffusion parameters d and D equal to 34 and 40 ms, respectively, the b-value was 1150 smm−2, the optimum for white matter DTI measurements. Six diffusion-weighted volumes were acquired with b-value 300 smm−2, and six volumes were acquired with no diffusion weighting. Each volume covered the whole brain with 60 slices of 2.3 mm slice thickness, field of view 220 × 220 mm2. An imaging matrix of 96 × 96 was used, giving isotropic voxels of 2.3 × 2.3 × 2.3 mm3 and the images were reconstructed on a 128 × 128 matrix, giving a final resolution of 1.7 × 1.7 × 2.3 mm3. An optimized cardiac gating scheme (21) was used to minimize artifacts arising from cerebrospinal fluid pulsatile flow. The total scan time for the DTI protocol was approximately (depending on heart rate) 20 min.
The high-resolution T1-weighted scan was obtained with a 3D inversion recovery prepared spoiled gradient echo (IR-SPGR). Parameters for the acquisition were: FOV = 310 × 155; matrix size = 256 × 128; in-plane resolution = 1.2 × 1.2 mm2; 156 slices of 1.2 mm slice thickness; inversion time = 450 ms; repetition time = 2 sec; echo time = 53 ms.
Estimation was carried out exactly as before, except that, for reasons of computational storage, when carrying out estimation on the whole brain, as opposed to a single slice, we drew samples every 20th jump instead of every 2nd.
Global Connectivity Estimation: Results
In this section, we present some results of applying this methodology to the estimation of connectivity distributions from voxels in human thalamus. Knowledge of thalamo-cortical connectivity is sparse in human, but rich in primate. These results, and others, are analyzed, interpreted, and compared with the nonhuman literature in detail in Ref. 8. Here we present them as a first step toward validation of probabilistic tractography as presented in this article and evidence suggesting that connectivity studies are feasible with diffusion weighted imaging, even between gray matter structures.
Results From the Thalamus
Figure 4a,b shows results from seeding different parts of the visual system. In Fig. 4a the seed point was in the lateral geniculate nucleus (LGN), a thalamic nucleus which processes visual information. The connectivity distribution heads anteriorly into the optic tract and posteriorly into the visual cortex, consistent with the known connections of the LGN in nonhuman primate (22, 23). However, when the seed is placed in the optic tract (Fig. 4b), two distinct pathways emerge. The two coronal scans in this figure show these two pathways just after they split (near the (coronal) level of LGN) and around 10 mm posterior to the split. The righthand pathway follows the route of the pathway in Fig. 4a. The lefthand pathway (see the axial slice in Fig. 4b), heads inferior to LGN, around the posterior ventral edge of thalamus to the superior colliculus. These distributions correspond to the two known branches of the primate visual system (23); the optic radiations (via LGN) and the superior-collicular brachium.
Figure 4c,d shows connectivity distributions seeded in different areas in thalamus. Figure 4c shows a distribution seeded in a medial dorsal area in thalamus. In primate, nuclei in the medial dorsal nuclear cluster of thalamus receive projections from anterior temporal lobe (24–26) and maintain reciprocal projections with prefrontal cortex (27, 28). The connectivity distribution in Fig. 4c progresses anteriorly into the prefrontal cortex, and initially posteriorly around the posterior edge of thalamus followed by anteriorly into the anterior temporal lobe. Figure 4d shows a distribution seeded in a ventral lateral area in thalamus. In primate thalamus, the ventral lateral nuclear group processes motor information and maintains strong connections with other motor zones (29, 30), such as the primary motor cortex and cerebellar cortex. The connectivity distribution in 4d progresses superiorly to primary motor cortex and inferiorly to cerebellar cortex and brainstem.
The interpretation issues discussed in the previous section are particularly relevant to the distributions shown in Fig. 4a–d. Figure 4a,b,d shows pathways which mainly exist in large white matter pathways, with correspondingly low uncertainty in fiber direction. Hence, the distributions seen in these figures are narrow. This should not be interpreted to mean that true connections from the seed voxel are necessarily correspondingly focused, but rather that the uncertainty on the pathway defined by the principal diffusion directions is low. The pathway seen in 4c spreads as it passes through a region of uncertainty while approach the temporal lobe and also encounters uncertainty before entering prefrontal cortex. Again, this should be interpreted as uncertainty in the connection defined by the principal diffusion directions. To reiterate the point previously: In order to infer on diffuse connections from a single seed, the model of diffusion within a voxel must allow for multiple fibers passing through the voxel. However, as can be seen in Fig. 4b,d, the presence of local fiber divergence may well be reflected in the local pdf at, for example, branching points in the pathways. In these two examples, branches which are known to occur in primate brain are found by accounting for uncertainty in the principal diffusion direction.
To test the consistency of the results throughout the thalamus, we seeded every voxel in thalamus (manually outlined on the T1-weighted image), and classified the results by the cortical area with the highest probability of connection to the seed. Four cortical areas were manually outlined on the T1 image to correspond with the principal projection and reception sites of thalamic nuclear clusters in primate brain (23) (Fig. 4e):
Figure 4f shows the nuclei in human thalamus, as defined by histological staining (31) with, overlaid, a color map showing predictions derived from primate data of the strongest cortical connection sites.
We skull-stripped the diffusion-weighted image (32) and performed affine registration between the diffusion-weighted and T1 images (33, 34), taking care never to resample the diffusion image. We then ran probabilistic tractography seeded from every voxel in the structural scan, classifying the results as above. The results can be seen in Fig. 4g. The classification of thalamic seed voxels by their connectivity distributions reveals a segmentation of thalamic nuclear clusters broadly consistent with the histological prediction (underlaid in Fig. 4f), and most probable connected cortical zones consistent with predictions from primate data (overlaid in color in Fig. 4f). Furthermore, the results show approximate bilateral symmetry in thalamic seed voxels.
These results are examined and extended in detail in Ref. 8, including a finer segmentation of the thalamic nuclei resulting from an increased number of cortical zones and a detailed look at the information available in the probability values themselves.
In general, analysis of diffusion-weighted data has involved the fitting of a model of local diffusion to the diffusion-weighted data at each voxel. This model may assume that local diffusion is Gaussian in profile (the diffusion tensor model (10)) or may allow a more complex structure for local diffusion (e.g., a spherical harmonic decomposition (35, 36)). However, in all cases, the assumed model is of the diffusion profile and not of the underlying fiber structure, and any analysis which has occurred after the fitting of this local model has made the assumption that the parameters in this model are known absolutely.
There are two important, but separate, issues here. The first is that the parameters of real interest to the scientist are ones which relate directly to the underlying fiber structure, and not to the diffusion profile. These underlying parameters may have convincing markers within the fitted diffusion profile (for example, anisotropy measures (2, 37) from the diffusion tensor fit have been shown to be a marker for collinearity of fibers within a voxel), but any attempt to recreate the fiber structure from these profiles is essentially an educated guess. There has been no model proposed to predict how a specific structure or distribution of fiber directions within a voxel will reflect itself in the measured diffusion-weighted NMR signal. The second issue is that, even when fitting a model of local diffusion, the resulting parameters have uncertainty associated with them. Factors such as noise in the NMR signal (both physical and physiological) and, crucially, the inadequacy of the proposed model, lead to this uncertainty, which should be incorporated in any further processing (such as tractography schemes).
In this article we have presented a method for the full treatment of this uncertainty. We have shown how, using Bayes' equation, along with well-established methods for its numerical solution, it is possible to form a complete representation of the uncertainty in the parameters in any generative model of diffusion, in the form of posterior probability density functions on these parameters. We have applied this Bayesian estimation technique to two simple local models of diffusion, the diffusion tensor model and a simple partial volume model, with only a single anisotropically diffusing direction in the voxel. We have examined the results in these two cases, comparing the posterior distributions with empirical measurements of uncertainty.
We then consider uncertainty at a global level. We outline the theory behind moving from the pdfs on local PDD to an estimate of the probability distribution on global connectivity. When estimating global connectivity, we first have to choose between the available local models of diffusion. We have chosen to use a simple partial volume model. The reason for this choice is that, by choosing a model which allows for only a single fiber direction within a voxel, we maximize the chance that the effect of diverging or splitting fibers will be seen as uncertainty in the principal diffusion direction, and not as a change in the diffusion profile, as might be the case if the diffusion tensor model were chosen. However, the similarity in uncertainty between the two models that we find in the empirical validation suggests that this decision is made largely for conceptual completeness, and that the results would have been similar if the diffusion tensor model had been chosen.
The next stage is to define a model of global connectivity. The model we chose is identical to that used in streamlining algorithms (e.g., Refs. 4–6, 19). That is, given absolute knowledge of local fiber directions, connectivity is assumed between two points if, and only if, there exists a connected path between them through the data (see Eq. ). The crucial difference between the probabilistic tractography proposed here and the streamlining algorithms referenced above can be seen in Eq. . Put simply, the result of this equation incorporates every possible fiber orientation at every voxel and the probability of each of these fiber directions given the acquired MR data. We simply allow for uncertainty in fiber direction when computing streamlines. The practicality of solving this equation is an algorithm similar in nature to others presented, along with this method, at ISMRM 2002 (38–40), effectively repeatedly sampling local pdfs to create streamlines and regarding these streamlines as samples from a global pdf. A crucial difference between these methods and our method is that we choose to compute the local pdfs in a rigorous fashion given the MR data. The methods referenced above all use heuristic experience-based relationships between the shape of the fitted diffusion tensor and the assumed pdf on local fiber orientation.
An important result of our procedure is that the recovered “connectivity distributions” are strictly probability distributions on the connected pathway through dominant fiber directions. That is, there is no explicit representation of splitting or diverging fibers in either the local or global model. We are strictly inferring on a single pathway leading from the seed point, and therefore in order to find, for example, splitting pathways, the effect of fiber divergence within a voxel must reveal itself as uncertainty in the PDD. It can be seen from the local results section that, at least in the cases presented here, this effect can be seen. Figure 4b shows sensitivity to the splitting of fibers from the optic tract, into the superior collicular brachium, and the direct fibers of the optic radiations. Figure 4d also shows sensitivity to branching fibers. Descending fibers from the ventral lateral (motor) nucleus of the thalamus split into two distinct branches, as is to be expected from primate studies. The first heads down to the brainstem and the second into the superior cerebellar cortex. However, because fiber divergence within a voxel is treated as uncertainty in principal diffusion direction, this sensitivity to diverging and branching fibers will be dependent on the experimental design; in general, the more information in the MR measurements, the lower the uncertainty in principal diffusion direction. Taking this effect to its logical extreme, if we were to gather an infinite number of MR measurements, there would be no uncertainty in principal fiber direction, and the marginal probability distribution on the dominant streamline would be infinitely narrow, i.e., the simple streamlining solution. Ideally we would like to infer, not on connectivity via a single connection, but on an anatomical distribution of connectivity. In order to do this we must allow for divergence, branching, and crossing of fibers in our local model of diffusion. We propose one such model which will allow for inference on an underlying distribution of fiber orientations.
Probably the most important result in this article is in Fig. 4e,f,g. Here we seed every voxel in the thalamus and compute the respective connectivity distributions, recording the probability of connectivity to each of four cortical masks. There are two striking features in this figure. The correspondence of the connectivity-based thalamic segmentation between the left and right thalami (4g) provides strong evidence for the robustness of the technique, even when seeding from deep gray matter areas. This is backed up by the marked similarity between the predicted cortical zones from primate data (4f) and the connectivity based segmentation (4g). This second feature also provides strong, albeit indirect, validation for the use of diffusion based tractography in any guise.
In summary, we have presented a technique for characterizing the uncertainty associated with parameter estimates in diffusion-weighted MRI and for propagating this uncertainty through the diffusion-weighted data. This allows us to compute the probability distribution on the location of the dominant fiber pathway so that we may quantify our belief in the tractography results.
The authors thank Claudia Wheeler-Kingshott, Phil Boulby, and Gareth Barker from the Institute of Neurology, Queen's Square, London, and the UK Multiple Sclerosis Society for providing data for this work.
This may seem an odd way to span the angular space. The reason we chose to define these angles is that it allows us to sample directly from the principal diffusion direction (θ,ϕ)
A description of the Γ distribution may be found in the Appendix
x has a two-parameter gamma distribution, denoted by Γ(a, b), with parameters a and b, if its density is given by:
where Γ(a) is the Gamma function. A χ2 distribution with ν degrees of freedom corresponds to the distribution Γ(ν/2, 1/2). The b parameter is a scale parameter. The one-parameter gamma distribution corresponds to Γ(a, 1). A sample from Ga(a, b) can be obtained by taking a sample from Γ(a, 1) and dividing it by b. Note that a gamma distribution has mean = a/b and variance = a/b2.
Full Conditional Distribution for Precision Parameters
The full conditional distribution for Gibbs sampling from the precision parameters 1/σ2 in both models is:
where Y is the data, Ω− is the set of all parameters except σ, n is the number of acquisitions, Yi is the value of the data at the ith acquisition, a and b are the parameters in the Gamma prior on the precision, and μi is the value for the ith acquisition predicted by the model. μi, for the diffusion tensor model is given by Eq. , and for the simple partial volume model, by Eq. .