During the last three years, INTERPRET (19), a European project, has been building a large collection of formally validated brain tumor MR data. Formal validation means that spectra were only included when the diagnosis and grade of the tumor had been agreed upon by a panel of radiologists and histopathologists, after reviewing the clinical (patient's history), MRI, and histological data. In this ICA study, we used single-voxel spectra from the INTERPRET database that had been acquired from 333 patients and normal volunteers. The study was performed with the patients' informed consent and with approval from the local ethics committees. The dataset is fairly heterogeneous: spectra were acquired in four centers across Europe, on three types of machines (GE (N = 196), Philips (N = 116), Siemens (N = 21)) and with two types of data acquisition (PRESS (N = 197) and stimulated echo acquisition mode (STEAM) (N = 136)). All spectra are short-echo (TE = 20–30 ms) phased spectra. The data had 220 dimensions, representing the ppm range of [4.1827, −0.0212] ppm. Details of the data acquisition and preprocessing steps can be found in Ref. 20. Twenty-two tissue types are present: glioblastoma (N = 88), meningioma (N = 64), metastases (N = 49), normal brain (N = 26), astrocytoma grade II (N = 26), oligodendroglioma (N = 10), astrocytoma grade III (N = 10), lymphoma (N = 9), pnet (N = 8), abcess (N = 8), schwanoma (N = 5), and a few samples of chordoma (N = 1), anaplastic oligoastrocytoma (N = 2), anaplastic oligodendroglioma (N = 2), atypical meningoma (N = 4), oligoastrocytoma (N = 5), piloccytic astrocytoma (N = 4), ganglioma (N = 1), germinoma (N = 1), hemangioblastoma (N = 8), anaplastic meningioma (N = 1), and melanoma (N = 1).
We used the same analysis protocol as with the artificial data: first, data were denoised with PCA, and then the ICA was performed. The choice of how many PCs were used to describe the data was made by observing the variance plot and by the number of ICs extracted. When 10 were used, 10 ICs were found, and when 15 PCs were found only 13 independent components were found. Using more PCs to describe the data carried the risk of adding too much noise (cf. next section), hence we decided to use those 13 ICs.