Metabolite selection for machine learning in childhood brain tumour classification

MRS can provide high accuracy in the diagnosis of childhood brain tumours when combined with machine learning. A feature selection method such as principal component analysis is commonly used to reduce the dimensionality of metabolite profiles prior to classification. However, an alternative approach of identifying the optimal set of metabolites has not been fully evaluated, possibly due to the challenges of defining this for a multi‐class problem. This study aims to investigate metabolite selection from in vivo MRS for childhood brain tumour classification. Multi‐site 1.5 T and 3 T cohorts of patients with a brain tumour and histological diagnosis of ependymoma, medulloblastoma and pilocytic astrocytoma were retrospectively evaluated. Dimensionality reduction was undertaken by selecting metabolite concentrations through multi‐class receiver operating characteristics and compared with principal component analysis. Classification accuracy was determined through leave‐one‐out and k‐fold cross‐validation. Metabolites identified as crucial in tumour classification include myo‐inositol (P < 0.05, AUC=0.81±0.01 ), total lipids and macromolecules at 0.9 ppm (P < 0.05, AUC=0.78±0.01 ) and total creatine (P < 0.05, AUC=0.77±0.01 ) for the 1.5 T cohort, and glycine (P < 0.05, AUC=0.79±0.01 ), total N‐acetylaspartate (P < 0.05, AUC=0.79±0.01 ) and total choline (P < 0.05, AUC=0.75±0.01 ) for the 3 T cohort. Compared with the principal components, the selected metabolites were able to provide significantly improved discrimination between the tumours through most classifiers (P < 0.05). The highest balanced classification accuracy determined through leave‐one‐out cross‐validation was 85% for 1.5 T 1H‐MRS through support vector machine and 75% for 3 T 1H‐MRS through linear discriminant analysis after oversampling the minority. The study suggests that a group of crucial metabolites helps to achieve better discrimination between childhood brain tumours.


| INTRODUCTION
Primary tumours of the central nervous system are the most common cause of cancer death in childhood. 1 Histology and molecular analysis of tumour specimens obtained at operation provide the definitive diagnosis for most children's brain tumours. However, providing an accurate noninvasive diagnosis prior to surgery has many advantages for optimal patient management, 2 including better informed surgery, earlier planning of adjuvant treatment and improved discussions with the child and family. Although clinically used to propose radiological diagnosis, structural features of tumours provided by conventional imaging have a limited accuracy. 3 Metabolites are direct signatures of biochemical activity and their detection in vivo can improve our understanding of brain tumours in situ. 4 Many studies have suggested specified metabolites as biomarkers of specific processes and types of childhood brain tumour. For instance, total choline, which consists of glycerophosphocholine, phosphocholine and free choline, is known as a general cancer marker and is associated with tumour aggressiveness and progression. 5 Taurine, the naturally occurring β-aminoacid related to neurodevelopment in infancy, is associated with embryonal tumours. 6,7 Lipids and glycine have been identified as valuable prognostic markers of childhood brain tumours. [8][9][10][11] Proton MRS ( 1 H-MRS) is a non-invasive tool to investigate the in vivo metabolite profile of tissues in tumours. 12 It has been shown to aid diagnosis and clinical management of childhood brain tumours. [13][14][15] Machine learning provides a computational method to classify brain tumours by using metabolite profiles and can offer high diagnostic accuracy. 16 Prior to use in a machine learning classifier, a feature selection method is commonly applied to the metabolite profile, to avoid over-fitting by reducing dimensionality. One method that has been commonly used is principal component analysis (PCA), selecting linear combinations of metabolites ranked by their contribution to the overall variability of the data. 17 However, high levels of variability across the whole dataset may not correspond to optimum discrimination between classes, and higher performing methods of feature selection are being sought. 18 Feature selection methods including PCA commonly have a complex relationship to the original data. This lack of transparency could be a barrier to clinical adoption, where tumour discrimination based on specific metabolites may be preferred by clinicians and can be supported by association with histological features. 19 A feature selection method based on selection of individual metabolites by their ability to discriminate between classes would potentially provide an intuitive approach to moving from diagnosis using single metabolites to machine learning. Whilst diagnostic performance is well defined for classification between two classes by the area under a receiver operating characteristic (ROC) curve, this is not the case for a multi-class system. Brain tumour classification usually needs to consider more than two potential diagnoses and so multiclass ROC needs to be investigated in this setting. Here we perform a thorough investigation of the accuracy of individual metabolites as discriminators between three major children's brain tumour types comparing different approaches to ROC for a multi-class problem. We then use these approaches to select the best performing metabolites for use in a machine learning classifier. We also compare this approach with feature selection using PCA, which is chosen as a benchmark, although our aim is not to determine whether metabolite selection is the optimal feature selection method when compared with all methods currently available.
The aim of this study is therefore to investigate the use of multi-class ROC for optimizing metabolite selection in the classification of childhood brain tumours by 1.5 T and 3 T short echo-time 1 H-MRS and compare the classification accuracy against PCA.

| Data acquisition
Patients with a suspected brain tumour were recruited from four sites in England, including Birmingham Children's Hospital, Alder Hey Children's Hospital Liverpool, Nottingham University Hospitals and Royal Victoria Infirmary Newcastle upon Tyne. Patients presenting with a brain tumour underwent routine MRI examination before the surgical resection of their tumours, and the diagnoses were made by histology and review at the local tumour boards. Patient data were collected from October 2004 to December 2019; those with a diagnosis of pilocytic astrocytoma, medulloblastoma or ependymoma were included in this analysis. The study was approved by the local research ethics committee (ethics number: 04/MRE04/41). Phillips, Ingenia, Intera and Achieva R3.2-5.1). Structural MRI included T 1 -weighted, T 2 -weighted and T 1 -weighted post-contrast sequences as well as diffusion-weighted imaging. 1 H-MRS with water reference acquisition was performed after conventional imaging that included gadolinium administration by using the point-resolved spectroscopy sequence (field strength 1.5 T or 3 T, head coils or head and neck coils 8-32 channels, sampling frequency 2000-2500 Hz, chemical shift displacement less than 4% per ppm, echo time 30-46 ms, number of complex points 512 or 2048, pulse repetition time 1500-2000 ms, 128 averages collected from a 20Â20Â20-80Â80Â80 mm 3 volume of interest). Water suppression was performed by using chemical shift selective saturation pulses, and no out of volume suppression was used. Volumes of interest were manually placed to be completely within the tumours according to structural images, with contrast enhancement and low apparent diffusion coefficient being used as guides where tumours exhibited some heterogeneity. 4

| 1 H-MRS quantification and quality control
All 1 H-MRS raw data were quantified using TARQUIN (Version 4.3.11), 20 which includes phasing, chemical shift calibration and metabolite amplitude estimation. The basis set used for 1 H-MRS quantification is a 1 H brain full basis set that includes lipid basis signals and macromolecule signals. The concentrations obtained were normalized based on the sum of all metabolite, lipid and macromolecule concentrations for each case.
Signal-to-noise ratios (SNRs) were calculated with TARQUIN and used to evaluate the level of noise presenting in 1 H-MRS. Overall SNR was defined as the ratio between the maximum amplitude of the spectrum minus the baseline, and twice the root-mean-square of the residual between 0.2 and 4 ppm. Full width at half maximum (FWHM) was also taken as the value calculated by TARQUIN, and is the width of the unsuppressed water peak at half its full height.
Quality control was then applied in the following manner. Patient data acquired were initially screened according to the following exclusion criteria: (1) histological diagnosis was missing; (2) water suppressed signals, water reference signals or structural MR images indicating the 1 H-MRS voxel location were missing; (3) the tumour did not occupy all the 1 H-MRS voxel as determined by visual inspection of the voxel location images produced on the scanner aided by reference to the available image set; (4) spectra showed very poor FWHM (>0.15 ppm, equivalent to 9.6 Hz in 1.5 T and 19.2 Hz in 3 T); (5) spectra showed very poor SNR (<4). For the cases that met these metric-based quality measures, the TARQUIN-processed frequency domain 1 H-MRS of all cases was assessed visually by experienced spectroscopists for general quality features, namely phasing, fitting, baseline variation and the presence of artefacts.

| Metabolite evaluation
Cramér-Rao lower bounds (CRLBs) were used to evaluate the uncertainty of metabolite quantification. Metabolites whose CRLB as a percentage was greater than 50%, for all cases in both 1.5 T and 3 T cohorts, were excluded. This eliminated alanine, aspartate, γ-aminobutyric acid and phosphatidylethanol from the analysis in both 1.5 T and 3 T cohorts. Multi-class diagnostic performance of individual metabolites, for all tumours, was evaluated using multi-class ROC. 21 The area under the curve (AUC) for a multi-class ROC is defined by ðÂðijjÞ þÂðjjiÞÞ, where AUC binary denotes the AUC of the binary ROC, c denotes the number of classes andÂðijjÞ denotes the probability that a randomly selected element of class j will have a lower estimated probability of belonging to class i than a randomly selected element of class i.
Multi-class diagnostic performance was compared with the binary diagnostic performance given by binary and pairwise ROC. For binary problems, the multi-class ROC was converted into a two-class problem by considering class A and class non-A. In pairwise ROC, two of the three tumour types were selected, and the other type was ignored. The standard deviation of then AUC was measured through leave-one-out crossvalidation. Metabolites were selected based on the diagnostic ability for the childhood brain tumours in this study derived using multi-class ROC.

| Tumour classification
The 1 H-MRS analysis ( Figure 1) and tumour classification were conducted for the 1.5 T and 3 T cohorts separately. By taking account of the known challenges of measuring certain metabolites with high levels of spectral overlap, 22 a group of 14 metabolites was selected to be ranked for diagnostic ability, including the citrate, glutathione, glycine, lactate, myo-inositol, scyllo-inositol, taurine, total N-acetylaspartate (N-acetylaspartate and N-acetylaspartylglutamate), total choline (glycerophosphocholine and phosphocholine), total creatine (creatine and phosphocreatine), combined glutamate and glutamine, and total lipids and macromolecules at 0.9 ppm, 1.3 ppm and 2.0 ppm.
Training sets and test sets were sampled from the whole set, with stratification according to tumour types. Features were extracted only from the training set and for building classifiers. Two methods of feature extraction were individually performed and compared, including PCA and multi-class ROC. The number of features was determined as the sample size of the minority group minus one. For the method based on PCA, principal components were derived by performing PCA on the matrix of all screened metabolites and ranked based on the explained cumulative variance. For the method of multi-class ROC, metabolites were ranked based on the AUC derived through multi-class ROC. Highly ranked metabolites or principal components were used as features for tumour classification.
Ependymomas were oversampled by 100% through the adaptive synthetic minority oversampling technique (SMOTE), in order to correct the skewness and class imbalance of the cohort. 23 Linear and non-linear classifiers were applied to evaluate the classification performance, including linear discriminant analysis (LDA), k-nearest neighbours, naïve Bayes, multinomial log-linear model fitting via a neural network, and support vector machine with a linear kernel. Discriminant functions derived through LDA and re-substitution were used to show poorly classified cases with low classification probability. Leave-one-out and k-fold cross-validation was used to determine classification accuracy, where k was determined based on the size of the minority class. The k-fold cross-validation is usually more accurate, 24 but it may lead to poor training sets being selected when the cohort is small and particularly when the minority class is very small. Both cross-validation methods were therefore performed to achieve some level of comparability, whilst not selecting a more appropriate method for either cohort. Overall (α overall ) and balanced (α balanced ) classification accuracies were used to evaluate the classification performance, defined based on the accuracy for each case (α i ) as

| Statistical analysis
A Kruskal-Wallis H test was performed to assess the different quality metrics, the means of metabolite concentrations across the three tumour types, and the classification accuracy. Statistically significant differences between tumour types or processing methods were determined when P < 0.05, P < 0.01, P < 0.001 and P < 0.0001. All algorithms of statistical analysis, feature extraction and machine learning were implemented using R (Version 3.6.2, R Foundation, Vienna, Austria).

| Demographics
Diagnostic 1 H-MRS was performed on 116 patients on 1.5 T and 73 patients on 3 T MR scanners. Eighty-three (66%) 1.5 T cases and 42 (34%) 3 T cases were enrolled after screening of data availability and quality control assessment (Table 1, Figure 2). Tumours were generally located in the posterior fossa, but four ependymomas were located supratentorially, including three for 1.5 T and one for 3 T. Ages of patients ranged from 1.8 months to 18 years of age, across the three groups. Forty-seven (57%) of the 1.5 T cases and 18 (43%) of the 3 T cases were male. Histological subtypes of all tumours were grouped together within each tumour type (Table 2).

| Quality assessment
Accepted cases in the final cohort showed overall SNR and FWHM as 19 ± 13 and 5 ± 1 Hz for the 1.5 T cohort (Figure 2A), and 16 ± 11 and 7 ± 2 Hz for the 3 T cohort ( Figure 2B and Table 3). Medulloblastoma cases showed a generally better signal quality than did the other two groups (P < 0.05). Median CRLB of the metabolites ranged from 18% to 299% for the 1.5 T cohort, and 21% to 247% for the 3 T cohort.  Figure 5, where multi-class ROC is compared with the various binary and pairwise ROCs. As expected, some of these metabolites have poor discriminatory ability for specific tumours, reflected in low binary or pairwise AUC, but the multi-class AUC provides a good reflection of the overall performance for each metabolite. Several metabolites showed good diagnostic ability (AUC > 0.7) at both of the field strengths. In the 1.5 T cohort ( Figure 4A), myo-inositol (Figure 5b) was ranked as the most discriminatory metabolite, followed by total lipids and macromolecules at 0.9 ppm ( Figure 5B), total creatine ( Figure 5C) and total N-acetylaspartate ( Figure 5D). In the 3 T cohort ( Figure 4B), total N-acetylaspartate ( Figure 5E) was ranked as the most discriminatory metabolite, followed by glycine ( Figure 5F), total choline ( Figure 5G) and taurine ( Figure 5H). Compared with the results of 1.5 T 1 H-MRS, several metabolites showed clearly improved diagnostic ability in the 3 T 1 H-MRS, including glycine, total N-acetylaspartate, total choline, taurine, and total lipids and macromolecules at 2.0 ppm. However, some other metabolites showed decreased diagnostic ability, including combined glutamate and glutamine, lactate, myo-inositol, total lipids and macromolecules at 0.9 ppm, and total creatine ( Figure 4A and 4B).

| Principal component analysis
The results of PCA showed that similar numbers of principal components accounted for the same proportions of the total variance in metabolite profiles in the 1.5 T and 3 T 1 H-MRS (Figure 4). Four principal components were able to explain around half of the total variance, which is also the T A B L E 1 The multi-site cohorts of 1 H-MRS for childhood brain tumours Nottingham (2%) 0 0 1 1 *Three of the 13 ependymomas were located supratentorially. **One of the four ependymomas was located supratentorially. maximum number of principal components allowed to be used in 3 T 1 H-MRS before oversampling the minority class ( Figure 4C and 4D). Eleven principal components were able to account for around 95% of the variance and were used in feature extraction for oversampled 1.5 T 1 H-MRS ( Figure 4C). Meanwhile, seven principal components were available in oversampled 3 T 1 H-MRS, and they were able to explain around 70% of the total variance ( Figure 4D).

| Classification performance
Classification performance was evaluated using classification accuracy, with a further evaluation of misclassified cases. Poorly classified cases were compared between the two feature extraction methods (Figures 6). Machine learning showed limited ability in classifying tumours that were not representative of their diagnostic types. Oversampling for the minority class showed assistance for classifying the ependymomas.
Significantly improved classification accuracy was obtained by using selected metabolites through multi-class ROC, compared with PCA-based feature selection (P < 0.01, Figure 7A and 7B). The combination of multi-class ROC and oversampling showed further improved classification accuracy (P < 0.01, Figure 7C and 7D). The improvement was seen for both overall and balanced classification accuracy (P < 0.01) and remained significant through k-fold cross-validation (P < 0.01, Figure 8). The improvement of classification performance was consistent between different classifiers in 1.5 T ( Supplementary Figures 1 and 3) and 3 T (Supplementary Figures 2 and 4) 1 H-MRS, which is validated through leave-one-out crossvalidation ( Supplementary Figures 1 and 2) and k-fold cross-validation ( Supplementary Figures 3 and 4). Optimal classification accuracy was achieved through a support vector machine and oversampling for the 1.5 T 1 H-MRS as overall accuracy of 88% and balanced accuracy of 85%. At the same time, LDA and oversampling provided the best classification accuracy for the 3 T 1 H-MRS as overall accuracy of 84% and balanced accuracy of 75%.

| DISCUSSION
In this study, the role of metabolite selection for optimizing childhood brain tumour classification from 1.5 T and 3 T short echo-time single-voxel 1 H-MRS has been investigated. A method of metabolite selection in childhood brain tumour classification through multi-class ROC was presented.

F I G U R E 2 Images showing quantification examples with the corresponding baseline under the same scale for the acquired 1.5 T and 3 T 1 H-MRS for childhood brain tumours, an ependymoma case from the 1.5 T cohort (A) and a medulloblastoma cases from the 3 T cohort (B)
Across the commonly used machine learning methods, this method is able to achieve an improved classification accuracy for the three main tumour types when compared with the conventional PCA for feature selection. A combination of metabolite selection through multi-class ROC and oversampling for the minority through SMOTE is able to achieve the optimal classification accuracy, providing an accurate, efficient and transparent method for the use of metabolites in children's brain tumour diagnosis. With the focus of the feature selection being the metabolite concentrations, careful consideration of the most appropriate metabolite set is important. All metabolites in the basis set were initially quantified separately in TARQUIN as part of its standard application. Some metabolites were then combined. Creatine and phosphocreatine have 1 H-MRS signals that almost completely overlap at 1.5 T and 3 T, and we combine them.

T A B L E 2 Demographic and clinical variables of patients
Phosphocholine, glycerophosphocholine and free choline all have spectra that are dominated by a singlet around 3 ppm, and whilst potentially they could be separated by their multiplets between 3 ppm and 4 ppm, these are of low intensity and in tumours this region of the spectrum has signals from many metabolites, so we decided to combine them. Glycine and myo-inositol differ from the creatine and choline containing metabolite sets in that their molecular structures and spectra are substantially different from each other, and whilst there is significant overlap in the spectra of the two metabolites, particularly at 1.5 T, this gives the opportunity to quantify them separately. Previous publications on children's brain tumour 1 H-MRS have shown that glycine and myo-inositol can be quantified separately, albeit with limited accuracy, and provide useful information using LCModel 8 and TARQUIN 11 with short echo-time point-resolved spectroscopy at 1.5 T. We would expect glycine and myoinositol to be quantified more accurately with increasing field strength, since the complex multiplet structure of myo-inositol becomes more evident, reducing the overlap in the spectra. 25 It is important to have some comparison of metabolite selection against a commonly used method, and PCA was chosen, as it has been used successfully on similar datasets. 17 It is an unsupervised learning method with categories of cases not considered, and so is valid for use in highly imbalanced data, a situation commonly encountered in children's brain tumour classification, since some tumour types are more common than others. However, the features are ranked by their variability across the whole cohort and highly variable features may not be the best at discriminating between classes. In contrast, selecting metabolites by their ability to discriminate between the various classes and using these as the features for machine learning should improve feature selection, since it is a supervised method. ROC is well defined between pairs of classes, but its generalization to the multi-class problem is less commonly used in biomedical applications. The multi-class problem can be reduced to a series of binary problems either by selecting pairs of classes or in a one versus the rest approach. However, this produces multiple metrics for each metabolite. This is well illustrated in Figure 5, where it is seen that the binary and pairwise AUC values show considerable variability across the methods. Most metabolites are better at diagnosing some tumours than others; for example, myo-inositol ( Figure 5A) is seen as a good discriminator of ependymomas in the 1.5 T cohort by high binary and pairwise AUC values, but it has a low ability to discriminate between medulloblastomas and pilocytic astrocytomas on the pairwise ROC. Here we use a single combined multi-class ROC parameter, which is contributed equally by all tumour groups and removes this challenge to metabolite selection. One advantage of using the metabolites directly as features is that metabolite concentrations are meaningful in tumour biology and can be compared with recent findings from biopsy studies, which helps to validate their use as features. 26 Metabolites have been identified that are characteristic for specific tumours. 17 For instance, previous studies suggested that posterior fossa pilocytic astrocytomas have significantly altered levels of creatine, choline, glutamate, glutamine and myo-inositol. 27,28 Glycine was identified as a key metabolite for classifying high-grade and low-grade childhood brain tumours. 8 These findings correspond to the selected metabolites in our Metabolite concentrations prior to normalization are shown above. P values were calculated using non-parametric Kruskal-Wallis H test between ependymomas (EP), medulloblastomas (MB) and pilocytic astrocytomas (PA). Significance of differences for metabolite concentrations between the three tumour types is identified as *P < 0.05 and **P < 0.01. For the metabolite combinations, total choline includes glycerophosphocholine and phosphocholine, total creatine includes creatine and phosphocreatine and total N-acetylaspartate includes N-acetylaspartate and N-acetylaspartylglutamate. results. However, some metabolites were found to be less useful for discriminating between certain tumour types. For instance, total lipids and macromolecules at 0.9 ppm performed well in discriminating between ependymomas and pilocytic astrocytomas but were less useful for identifying medulloblastomas ( Figure 5B). This finding demonstrated the benefits of selecting key metabolites for tumour classification. Comparing the metabolite selection from the 1.5 T and 3 T cohorts, a small group of metabolites showed useful discriminators at both field strengths, including total N-acetylaspartate, total choline, and total lipids and macromolecules at 1.3 ppm. In this study, glycine was suggested as the most discriminatory biomarker for classification from the 3 T cohort, whereas myo-inositol was more important in the 1.5 T cohort. This may be a result of glycine being more accurately discriminated from myo-inositol at 3 T than at 1.5 T, as these metabolites have very similar chemical shifts. There are also some other metabolites showing increased diagnostic ability from the 1.5 T to the 3 T cohort, including scyllo-inositol and taurine. Again, this could be due to the more accurate metabolite estimation in higher field-strength scanning. There are also some other metabolites performing with a varied diagnostic ability dependent on the field strength, such as combined glutamate and glutamine in the 3 T cohort and citrate in the 1.5 T cohort. Some metabolites are easier to quantify with 1 H-MRS at 3 T than at 1.5 T due to J-coupling effects reducing spectral overlap; however, there are some metabolites that have different diagnostic abilities at the two field strengths not readily explained by this. We note that T 1 and T 2 values vary between metabolites and with field strength, which will affect quantification. 29 The ROC was originally proposed for binary classification problems as applied to radiology research. 30 Its implementation for multi-class classification is still being debated. Our cohort contains imbalanced data, due to the rareness of ependymomas, which makes the problem even more complicated. The multi-class ROC method, employed here, converges with the binary ROC when the problem is converted from multiple to two classes. After evaluating the performance of ROC in binary, pairwise and multi-class evaluation, the results of multi-class ROC performed like a combination of binary and pairwise evaluations, making it ideal for situations where there are multiple classes.
Metabolite selection was used in the current study in combination with number of machine learning classifiers. Whilst the list of classifiers was not exhaustive, major types were included and the purpose was to show the robustness of metabolite selection across a range of classifiers rather than to determine the optimum method. Some insight can be gained from the differences in the results between them. Taking the difference between SVM and LDA as an example, SVM would allow the boundary to be specified more precisely in regions where two classes have neighbours that are very close, while LDA provides a straight boundary, which may work poorly for neighbouring cases. This may well explain why SVM outperforms LDA in the 1.5 T dataset, which has several neighbouring cases from different tumour types. In the 3 T dataset, the small number of ependymomas have disparate metabolite profiles, which overlap other tumour types, leading to poor performance of all machine learning methods for this class.
The limitations of this study include the relatively small cohort sizes, particularly at 3 T, and the challenges in quantifying metabolites from 1 H-MRS acquired clinically at multiple centres. The cohorts used in the 1.5 T and 3 T analyses are different; indeed, they are mutually exclusive and caution should be exercised in any comparison of results between the two cohorts. The data are also acquired from multiple centres with some variation in protocol, which will lead to variability in the data. Many confounders may be present in addition to the effects of field strength. We have given some detail for both the patients and the methods to aid comparison. One factor that is worthy of comment is that the cohorts have somewhat different median ages, and it is known that tumour molecular subtype has an incidence that depends on age. Since it is known that molecular subtypes can have different 1 H-MRS, this is a potential confounder. 31 The SNR is not better in the 3 T compared with the 1.5 T cohort, as might have been expected, because a smaller voxel size is commonly used in the 3 T acquisition to reduce the partial volume effects even after increasing the number of excitations. In addition, the small number of ependymomas in the 3 T cohort leads to the balanced classification accuracy being a rather unstable measure of accuracy, with a large reduction in its value if just one case is misclassified. Including supratentorial ependymomas to increase the number of these tumours also makes this class more diverse in biology and probably in metabolite profile. This combination of factors probably explains the poorer accuracy of classifications for the 3 T than the 1.5 T cohort.
Many metabolites might be highly interrelated through their biochemical pathways, but the relationships between metabolites have not been considered in the current study. Where significant correlation between metabolites is established, this could be used to reduce the number of metabolites required to achieve high classification accuracy. Future work to improve the accuracy of metabolite determination particularly for metabolites present at lower concentrations will be important.

| CONCLUSION
Metabolite selection provides an effective method of feature selection for childhood brain tumour classification from 1 H single-voxel MRS, comparing favourably to the classic method of PCA. The technique has the advantage of identifying the key discriminatory metabolites, thereby bridging transparently from diagnosis based on single metabolites to machine learning, making it attractive for clinical and biomedical uses. Multi-class ROCs is the preferred implementation for metabolite selection in situations where there are more than two diagnoses that need to be discriminated, since it provides a single metric for each metabolite combined with high accuracy.

SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.
How to cite this article: Zhao D, Grist