### Abstract

- Top of page
- Abstract
- Introduction
- Materials and Methods
- Results
- Discussion
- Conclusion
- Acknowledgment
- Conflict of Interest
- References

This study investigates the potential of independent component analysis (ICA) to provide a data-driven approach for group level analysis of magnetic resonance (MR) spectra. ICA collectively analyzes data to identify maximally independent components, each of which captures covarying resonances, including those from different metabolic sources. A comparative evaluation of the ICA approach with the more established LCModel method in analyzing two different noise-free, artifact-free, simulated data sets of known compositions is presented. The results from such ideal simulations demonstrate the ability of data-driven ICA to decompose data and accurately extract components resembling modeled basis spectra from both data sets, whereas the LCModel results suffer when the underlying model deviates from assumptions, thus highlighting the sensitivity of model-based approaches to modeling inaccuracies. Analyses with simulated data show that independent component weights are good estimates of concentrations, even of metabolites with low intensity singlet peaks, such as scyllo-inositol. ICA is also applied to single voxel spectra from 193 subjects, without correcting for baseline variations, line-width broadening or noise. The results provide evidence that, despite the presence of confounding artifacts, ICA can be used to analyze in vivo spectra and extract resonances of interest. ICA is a promising technique for decomposing MR spectral data into components resembling metabolite resonances, and therefore has the potential to provide a data-driven alternative to the use of metabolite concentrations derived from curve-fitting individual spectra in making group comparisons.

### Introduction

- Top of page
- Abstract
- Introduction
- Materials and Methods
- Results
- Discussion
- Conclusion
- Acknowledgment
- Conflict of Interest
- References

Proton magnetic resonance spectroscopy (^{1}H-MRS) is a powerful, noninvasive method that allows in vivo estimation of metabolite concentrations in a tissue volume. It has enabled extensive investigation and characterization of biochemical profiles in a variety of healthy and pathological tissues. Many neurological studies have shown the importance of ^{1}H-MRS in diagnosis, treatment monitoring, and prognosis of major diseases including Alzheimer's, cancer, dementia, and multiple sclerosis (Jansen et al. 2006). Significant and sustained research has been conducted over the years using ^{1}H-MRS in an effort to fulfill its potential as a clinical tool.

A typical in vivo brain ^{1}H-MRS spectrum consists of resonances from metabolites of interest along with features such as residual water signal, baseline fluctuations, and other artifacts not of interest. A common approach to making meaningful comparisons across subjects, brain regions, or pathologies involves quantifying metabolites in terms of concentrations. Popular methods such as LCModel (Provencher 1993), a frequency-domain approach, or JMRUI, a time-domain approach (Naressi et al. 2001), fit a model function derived from an in vitro or simulated set of metabolite profiles to data. Both time- and frequency-domain quantification approaches employ a variety of data preprocessing techniques to remove or model confounding features in order to improve estimation accuracy (Helms 2008) and often allow semiautomated processing of data to produce consistent quantitation, without special expertise. While model-based approaches bring the ability to resolve overlapping resonances, the sensitivity of their estimates to modeling inaccuracies is a serious concern and making an appropriate choice of model spectra is essential (Kreis 2004).

In this article, we present a data-driven approach for group level analysis of MR spectra based on independent component analysis (ICA). This approach is applied collectively to all analyzed spectra as a group, and resolves individual spectra into linear combinations of a set of components maximally independent of each other. Unlike model-based approaches, ICA makes no assumptions on underlying source distributions, and therefore, can be relatively robust when applied to diverse data sets. In addition, ICA can potentially extract coherent variations between resonances from the whole spectra, which may be useful in identifying metabolites that covary. Furthermore, features of the spectra that are generally not of interest, such as line broadening and baseline fluctuations, can often be resolved into separate components, allowing the resonances of interest to be quantified without the potential confound of these artifacts.

A statistical technique that has been used for multivariate analysis of spectroscopy data is the model-independent principal component analysis (PCA) (Stoyanova and Brown 2001). ICA is a conceptually similar technique that has been widely used in functional magnetic resonance imaging analysis (Calhoun et al. 2003, 2009) and has been shown to model individual subject variations well (Allen et al. 2012). It has also been used in few prior studies to resolve ^{1}H-MR spectra and extract independent components (ICs) that could separate pathologic tissues (Ladroue et al. 2003; Pulkkinen et al. 2005). Both of those studies demonstrated, using fast ICA (Hyvarinen 1999), that components maximizing independence can group resonances effectively to classify healthy brain tissue and grades of tumor tissue. Additionally, a few simulation studies examining the effects of line broadening and noise on the extracted components have also been published (Ladroue et al. 2003; Hao et al. 2009).

However, no previous published study directly compared PCA or ICA results with more established methods, such as LCModel, which could present a more convincing case for the use of ICA in MR spectral analysis. In this article, we present comparative evaluations of ICA and LCModel in analyzing two simulated data sets, each composed of metabolites typically found in human brain, but generated using different sets of basis spectra. Though LCModel has been compared to other model-based methods (Hofmann et al. 2002; Kanowski et al. 2004), to our knowledge, the present study is the first to compare the model-based LCModel with the model-independent ICA.

Simulation results highlight the sensitivities of model-based approaches to modeling inaccuracies and the advantages of a data-driven approach in this respect. Further, we demonstrate that the components extracted based on independence criterion alone are good approximations of the underlying basis spectra and that the component weights can be used instead of concentration estimates as parameters in comparing spectra. Finally, we also apply ICA analysis to an in vivo single voxel data set of 193 spectra and compare components and component weights to the basis spectra and concentration estimates from LCModel analyses. We show that ICA component weights and LCModel results correlate to different degrees depending on the metabolite. ICA is also able to capture low intensity singlet peak signals such as those that may arise from scyllo-inositol (s-Ins). Overall, our results show that the data-driven ICA could be a valuable tool for group analysis of MR spectra.

### Materials and Methods

- Top of page
- Abstract
- Introduction
- Materials and Methods
- Results
- Discussion
- Conclusion
- Acknowledgment
- Conflict of Interest
- References

ICA is a linear time-invariant method that decomposes a set of observations into a linear combination of basis signals. It may be seen as a higher order generalization (Comon 1994) of PCA, often employed for dimension reduction prior to ICA. Unlike PCA, which imposes independence up to second order and defines orthogonal directions, ICA minimizes statistical dependence between its components, and is uniquely defined when at most one component is Gaussian (Bell and Sejnowski 1995). As MR spectra are made of contributions from individual metabolite spectra that can vary independently, estimated ICs are expected to characterize well any independently varying signals from metabolites.

The linear construct in equation (2) expresses a composite spectrum or observation **x**_{n}, as a linear combination of a set of *k* components or sources **s**_{i}, weighted by mixing coefficients *a*_{i}.

- (1)

ICA estimates the matrix **W** that demixes multivariate data **X** to extract estimates **Y** of sources **S**, such that **Y = WX** are mutually independent. If the sources are mutually independent, then **Y** is close to **S** and **W** is the pseudoinverse of **A**. A variety of algorithms implementing the iterative learning and estimations of **W** exist. They construct unmixing matrix **W** such that negentropy, or distance from normality, of **Y** is maximized. As negentropy is difficult to compute, many algorithms rely on kurtosis as its estimate. In our implementation, we use the infomax algorithm (Bell and Sejnowski 1995) on the real part of input spectral data from our simulation experiment or in vivo and demonstrate ICA's ability to resolve spectra and extract resonances having different statistical properties.

#### Data simulation

The design objective of our simulation experiment was to assess how well ICA extracts underlying components and ground truth-mixing coefficients from simulated data resembling in vivo human brain MR spectra; and to explore how ICA results compare to LCModel results from the same data. Data simulated with two different sets of modeled resonances, with no added noise or artifacts, provided a means to compare ICA approach with LCModel analysis, as well as to establish upper bounds of ICA's ability in MR spectral applications.

The composition of our basis set of metabolites was based on a list of metabolites typically included in a LCModel basis set with analysis window of 1.8–4.2 ppm, the analysis window used in a prior report on these data (Yeo et al. 2013). The basis set was composed of 12 metabolites: aspartate (Asp), creatine (Cr), γ-amino butyric acid (GABA), glucose (Glc), glutamine (Gln), glutamate (Glu), *N*-acetyl aspartate (NAA), the *N*-acetyl peak of *N*-acetylaspartylglutamate (NAAG), the trimethyl amine group of phosphocholine (PCh), taurine (Tau), and myo-inositol (m-Ins) and its isomer scyllo-inositol (s-Ins).

For each of the metabolites in our basis set, we obtained two sets of modeled resonances with matching experimental parameters: a point resolved spectroscopy (PRESS) sequence acquired at 3T with a 40-msec echo time and 1600-Hz bandwidth. The first set of spectra, called here LCModel basis, was generated from the LCModel basis set provided by the developer of LCModel. The spectra in this basis set were resampled to match the resolution and bandwidth of in vivo spectra and saved in a matrix of length 512. The second set of spectra, called here GAVA basis, was simulated using a predefined library of pulse sequences in GAVA (Soher et al. 2007), a user friendly front end for the GAMMA MRS simulation libraries; the 1024 data point timed-domain model data were converted into spectral domain using the discrete fast fourier transform (FFT) and saved in a matrix of the same dimensions as LCModel basis. We omitted Glc from GAVA basis set, but replaced it with Glycine (Gly), which was not part of the LCModel basis set we used to analyze the data.

In order to closely mimic in vivo spectra, we used concentration estimates from LCModel analysis of in vivo data as ground truth-mixing coefficients. For Cr, we used combined estimates of Cr and phosphocreatine (PCr) as the reference; likewise, we used combined estimates of PCh and glyco-phosphocholine (GPC) as the reference for PCh. For Gly in the GAVA basis, which LCModel does not use, we used concentration estimates of Glc, present in normal adult human brain at levels comparable to Gly (~1 mmol/kg) (Govindaraju et al. 2000). We obtain 193 sets of mixing coefficients from LCModel analysis of in vivo data.

Each composite spectrum was generated by linearly mixing a chosen set of basis spectra, weighted by any one set of mixing coefficients. Using the entire set of mixing coefficients, two sets of 193 simulated spectral data were generated: one using LCModel basis and the other using GAVA basis. Such simulated data can be directly analyzed by ICA, but for use with LCModel, each composite spectrum was converted into 1024 data point complex time-domain data using inverse FFT and stored in individual files.

#### In vivo acquisition

MR data were collected from 141 male, 90 female subjects (*N* = 231), aged between 18 and 56, with a median age of 30, enrolled in three substance abuse studies at the Mind Research Network, conducted in accordance with protocols approved by the human research review committee of the University of New Mexico. Subjects, none of whom are controls, provided informed consent prior to their admission to the studies, and were compensated for their participation. None of the participants were taking psychoactive medications, or had any history of a substance dependence disorder other than alcohol or tobacco dependence in the 6 months preceding enrollment.

All spectroscopic and image data were acquired on a Siemens (Erlangen, Germany) TimTrio 3T scanner equipped with 40 mT/m gradients, body coil, and 12-channel receive-only phased array head coil. *T*1-weighted structural images acquired with a single excitation 5-echo magnetization prepared rapid gradient echo sequence with TE = [1.64, 3.5, 5.36, 7.22, 9.08] msec, TR = 2.53 sec, *T*1 = 1.2 sec, flip angle = 7°, slice thickness = 1 mm and resolution = 256 × 256 mm^{2} was used to prescribe a single 12 cc (20 × 20 × 30 mm^{3}) ^{1}H-MRS voxel in the anterior cingulate region of the brain. Data were collected with body coil excitation, in conjunction with head matrix coils in receive mode, using a PRESS sequence with TR/TE = 1.5 sec/40 msec, 1600-Hz bandwidth and 192 averages. Scanner preprocessing software corrects zero-order phase differences before combining individual spectra from different channels (Natt et al. 2005), averages acquisitions from multiple scans, and saves acquired data in 1024 complex time-domain data points. For use with LCModel, a water spectrum with 16 averages was also acquired from the same voxel.

In the ICA analysis, we used water-suppressed data, which had been normalized by the scanner software using a single scan water reference acquisition (Natt et al. 2005). As ICA works collectively on all spectra, data from all subjects were read and stored in a matrix. Also, as our ICA approach requires complex, frequency-domain data, the acquired complex time-domain data were converted into spectral domain using FFT.

In vivo spectra were corrected for B_{0} variation by using real part of the *N*-acetyl peak of NAA spectrum from LCModel basis to align spectra. Following spectral alignment, we sought to exclude spectra that could unduly bias component estimation and extraction. Spectra with suspect LCModel results, such as those with large full-width half-maximum (FWHM > 0.072 ppm) or poor signal-to-noise ratio (SNR < 15) or simply a bad fit were excluded. We also excluded spectra if the associated LCModel concentration estimates of any metabolite were more than 3.5 standard deviations from the corresponding mean. Finally, we applied an objective data-driven quality control that excluded any spectrum with any data point in the analysis window more than 3.5 standard deviations from corresponding point in the mean spectrum (generated from all included spectra). Though arbitrary, such a choice allowed us to exclude very few poor quality spectra and realize an in vivo data set with no variance outliers (*N* = 193).

#### ICA analysis

ICA was performed over the same analysis window used in the LCModel analysis (1.8–4.2 ppm), using the real part of the spectra. Such an approach is suitable for the linear unmixing problem in ICA and also suits the infomax algorithm, which works well with real valued data. Without any further preprocessing, the spectra were mean centered (demeaned) and factorized using singular value decomposition to perform PCA. The number of retained principal components was determined using minimum description length criteria (Rissanen 1983; Ojanen et al. 2004) along with a priori information on the data matrix, and the expected number of biochemically interpretable components. We carried out multiple (10) ICA runs to check for spurious or false convergences and observed that, in each run, ICA converged in 1–2 min to a consistent set of components, suggesting a single run was enough. All the components were normalized and sign corrected if necessary, to resolve a permutation ambiguity that arises with ICA. The components from different runs were then clustered into groups based on correlation distances and cluster centroids were used as ICs in further analysis; corresponding component weights were extracted by projecting the components onto the data. In contrast to the LCModel estimates, which are quantifications of concentrations of individual metabolites in the basis set, ICA estimates are the weights associated with the independent resonances, which may correspond to metabolite resonances and can capture ground-truth concentrations accurately. Hereafter, the terms ICA estimates and component weights will be used interchangeably.

The extracted ICs were compared with the underlying basis spectra, to identify and associate components with modeled resonances. Each component was automatically paired with a basis spectrum based on their similarity, as measured by the Pearson product-moment correlation coefficient (*r*), called spectral correlation of the matched pair. We also calculated a weights correlation, measured by Pearson correlation coefficient of the component weights with the ground truth-mixing coefficients. For in vivo data, due the absence of absolute references, we used LCModel basis to match and identify components, and used LCModel concentration estimates as a form of ground-truth reference.

#### LCModel analysis

LCModel analysis was carried out with no explicit eddy-current compensation within a 1.8–4.2 ppm analysis window, which results in automatic exclusion of alanine, macromolecules, and lipids from the basis set. LCModel fits each individual spectrum using the remaining resonances in the window. For in vivo analysis, we use all those resonances, but for both simulation analyses, we omitted negative creatine CH_{2} singlet (-CrCH_{2}) and guanidinoacetate (Gua) from the basis. This ensures LCModel is posed the simpler problem of fitting the data with the known composition. Also, while our in vivo analysis used the acquired water spectrum as internal water reference to estimate absolute concentrations, our simulated data estimates were normalized by the Cr + PCr intensity.

#### Additional analyses

We closely examined how ICA resolves our basis set containing a mix of weak and strong metabolites having a wide array of resonances, all of which are not necessarily mutually independent. In particular, we investigated the effect of setting the number of extracted ICs to a number different than the number of basis spectra underlying our simulated data. As previously, the real part of the GAVA-simulated spectra within the analysis window was demeaned and dimension reduced. Though the simulated data were generated with 12 GAVA basis spectra, the number of components retained successively varied from six to 18 and each time ICA was performed as previously and the extracted ICs were matched with basis spectra; spectral and weights correlations were also estimated each time.

We also carried out some analysis to examine how well the ICA estimates could capture a certain relationship between ground truth and an attribute of interest (phenotype). Toward this effort, we generated a set of pseudorandom vectors, each correlating with ground truth, with a preset correlation score (*r* = 0.5). Each vector mimics a neurological, physiological, or physical attribute correlated with ground truth. We then computed how those vectors correlated with the ICA estimates. By generating multiple (*N* = 100) realizations of such vectors, and computing their correlations with the ICA estimates each time, we observed how accurately could the ICA estimates capture the relationship between “phenotypes” and ground truth.

As the ICs from in vivo analysis do not have a ground truth, we sought to show that the select ICs indeed originated from metabolic sources, and not from confounds or nuisance artifacts associated with real data. To this end, we examined how the fractional tissue volumes in the spectroscopic voxel correlated with LCModel estimates or component weights. As cerebrospinal fluid (CSF) is mostly void of observable metabolites (Gasparovic et al. 2006), when the fractional tissue volume is high, more metabolites exist in the spectroscopic voxel and therefore both the estimates are expected to correlate positively with the fractional tissue volume. However, this effect is expected to disappear when the estimates are normalized with a reference metabolite estimate from within the voxel. The fractional tissue volumes in the spectroscopic voxel were estimated by segmenting high-resolution *T*1-weighted images into gray matter, white matter, and CSF using the unified segmentation approach available in SPM5 (Ashburner and Friston 2005) and averaging fractional tissue volumes of the *T*1-pixels within the spectroscopic voxel.

### Discussion

- Top of page
- Abstract
- Introduction
- Materials and Methods
- Results
- Discussion
- Conclusion
- Acknowledgment
- Conflict of Interest
- References

Our simulation results show that ICA unmixes noise-free, multivariate data and extracts components closely resembling underlying spectra and that the ICA estimates closely track the ground truth-mixing coefficients. We also demonstrate that ICA offers superior consistency of results with simulated data compared to LCModel; while both results are nearly identical in the ideal case for LCModel, ICA is much more robust than LCModel in the nonideal case where the actual ground truth deviates from the assumed basis set, illustrating the effects of modeling inaccuracies. A close look at the effects of spectral correlations of the two sets of basis spectra reveals that the varying degrees of correlations of LCModel estimates in the nonideal case are due to the extent of the differences of spectral patterns between the models.

A wealth of information can be gleaned from the ICA results alone, by closely examining ICA's performance in extracting modeled resonances having different statistical properties. The illustration in Figure 4, where the modeled resonances of m-Ins and Gly are compared with their matching ICs, helps bring out the limitations and advantages of the ICA approach. The modeled spectra are correlated to each other, due to their common peak at ~3.56 ppm. However, as the variability associated with that peak does not covary with other peaks in the modeled m-Ins resonance, ICA resolves the peak at 3.56 ppm separately and thus provides a clean estimate of Gly. As ICA minimizes mutual information among the components, the 3.56-ppm peak does not appear in the m-Ins like component, even though modeled spectrum has a 3.56-ppm signal, thus clearly exposing the limitation of ICA in extracting such resonances. Nevertheless, the extracted component substantially resembles m-Ins and, moreover, provides highly accurate estimates of m-Ins. So, rather than a limitation, it is an opportunity that ICA provides to extract resonances with singlet peaks, even in the presence of spectrally colocated strong resonances.

At the same time, resonances with multiple peaks that tend to be correlated with other (modeled) resonances, are not likely strictly independent to begin with, and therefore are difficult to resolve exactly using ICA, as evident from the slightly lower spectral correlations of such resonances (Table 1). However, even the lowest spectral correlation (other than m-Ins), that of Glc due to strong overlaps with Tau (*r* ~0.41), is at ~0.95. The low spectral correlations do not necessarily hurt ICA estimation, especially when the resonances are strong, for an error in their estimation is acutely felt.

Our in vivo results demonstrate that ICA can resolve signals of interest from the confounding artifacts and can group covarying resonances together. The estimates of identified components resembling Cr, NAA, PCh, and m-Ins signals, while including other covarying resonances (Fig. 7), nonetheless demonstrated strong correlations with the LCModel estimates of the identified metabolites. The weak correlation involving NAAG may be attributable to LCModel's limitation in resolving NAAG from NAA; though it makes sense to present NAA + NAAG for real data, we could not present that as our estimates are NAA normalized. An ICA component associated with the s-Ins signal is also consistently extracted by ICA, perhaps due to the lack of overlap with any other signal. Elevated s-Ins in the current data set may be due to effects of alcohol abuse (Viola et al. 2004) or aging (Kaiser et al. 2005).

The ICs that are unidentified include baseline and broadening components and resonances of interest, such as those from Asp, Glu, Gln, and GABA, indiscernible from such confounds. We acknowledge the difficulty in discerning resonances with multiple peaks, such as those from Glu + Gln, from the in vivo data, which LCModel estimates with reasonable accuracy. In our future study, we will provide modifications to ICA, by incorporating prior information, in the form of constraints in the ICA algorithm (Lin et al. 2010) to improve the estimations of such metabolites. Appropriate preprocessing steps to effectively reduce noise or baseline artifacts may also improve ICA's estimation accuracy, as our simulations indicate. Finally, the ICA approach may benefit from the use of all available complex time-domain data, rather than just the real part of the data that we used in this study, with very good performance. These strategies to improve ICA performance will also be explored in the future study.

Clearly ICA, which cannot analyze spectra individually, cannot replace the curve-fitting methods, such as LCModel, in individual spectral analysis. However, ICA can be very useful in the analysis of multiple spectra, and possibly offer a systemic advantage in such cases, as it makes full use of the information by collectively analyzing “complete” spectra, rather than just the quantified estimates. The ICA approach described in this report can be potentially applied to spectra from multiple voxels in a spectroscopic imaging data set from one or more subjects. It can also be applied to analyze data from two different study populations or tissue types and discriminate one group from the other. Conceivably, ICA can also play a complimentary role to model-based methods by identifying hidden structures underlying the data and help choose better “model”.

### Conclusion

- Top of page
- Abstract
- Introduction
- Materials and Methods
- Results
- Discussion
- Conclusion
- Acknowledgment
- Conflict of Interest
- References

We systematically compare the performances of ICA and LCModel in analyzing MR spectra and demonstrate, using noise- and artifacts-free simulations, that the data-driven ICA approach is more robust to variations in the spectral profiles underlying the data. Further, we show that composite spectra can be resolved to extract components substantially resembling modeled metabolite resonances, using independence criteria alone and that ICA can extract components from simple singlet signals, such as Gly, that overlap with other signals. We discuss the limitations and advantages of ICA in spectral decomposition in detail, and show that the ICA estimates, which exhibit a highly linear relationship with ground truth, can be very useful in analyzing a group of spectra. Furthermore, we apply ICA to analyze in vivo ^{1}H-MRS spectra and show that ICA can extract components associated with NAA, Cho, Cr, and m-Ins in the presence of confounding artifacts. Finally, we show that ICA can be very useful, in extracting certain weak metabolites with singlet resonances, such as s-Ins and can provide visibility of resonances that covary. Together, these results suggest that ICA could be useful for collective analysis of multiple MR spectra.