Noise suppression of proton magnetic resonance spectroscopy improves paediatric brain tumour classification

Proton magnetic resonance spectroscopy (1H‐MRS) is increasingly used for clinical brain tumour diagnosis, but suffers from limited spectral quality. This retrospective and comparative study aims at improving paediatric brain tumour classification by performing noise suppression on clinical 1H‐MRS. Eighty‐three/forty‐two children with either an ependymoma (ages 4.6 ± 5.3/9.3 ± 5.4), a medulloblastoma (ages 6.9 ± 3.5/6.5 ± 4.4), or a pilocytic astrocytoma (8.0 ± 3.6/6.3 ± 5.0), recruited from four centres across England, were scanned with 1.5T/3T short‐echo‐time point‐resolved spectroscopy. The acquired raw 1H‐MRS was quantified by using Totally Automatic Robust Quantitation in NMR (TARQUIN), assessed by experienced spectroscopists, and processed with adaptive wavelet noise suppression (AWNS). Metabolite concentrations were extracted as features, selected based on multiclass receiver operating characteristics, and finally used for identifying brain tumour types with supervised machine learning. The minority class was oversampled through the synthetic minority oversampling technique for comparison purposes. Post‐noise‐suppression 1H‐MRS showed significantly elevated signal‐to‐noise ratios (P < .05, Wilcoxon signed‐rank test), stable full width at half‐maximum (P > .05, Wilcoxon signed‐rank test), and significantly higher classification accuracy (P < .05, Wilcoxon signed‐rank test). Specifically, the cross‐validated overall and balanced classification accuracies can be improved from 81% to 88% overall and 76% to 86% balanced for the 1.5T cohort, whilst for the 3T cohort they can be improved from 62% to 76% overall and 46% to 56%, by applying Naïve Bayes on the oversampled 1H‐MRS. The study shows that fitting‐based signal‐to‐noise ratios of clinical 1H‐MRS can be significantly improved by using AWNS with insignificantly altered line width, and the post‐noise‐suppression 1H‐MRS may have better diagnostic performance for paediatric brain tumours.


| INTRODUCTION
Paediatric brain tumours (PBTs) remain amongst the most lethal cancers in childhood. 1Half of PBTs arise in the posterior fossa, 2 among which the most frequent tumour types are pilocytic astrocytomas, medulloblastomas, and ependymomas. 3Pilocytic astrocytomas, a subset of gliomas, are the most common PBTs in the posterior fossa, and they are classified as World Health Organization (WHO) grade I tumours. 4Medulloblastomas are the next most common tumours, which are classified as WHO grade IV.They have four main molecular groups that are associated with diverse prognoses. 5Ependymomas 6 are less common and are classified as grade II or III, 7 although grading is challenging and not closely linked to their malignant behaviour.The three tumour types require different treatment strategies that usually involve surgical resection, with adjuvant treatment, including radiotherapy and chemotherapy, being dependent on diagnosis and a set of clinical, pathological, and radiological factors. 8r pilocytic astrocytomas, surgical resection alone is usually curative and, even if resection is not complete, patients often require no further treatment.Surgical strategy is therefore aimed at minimising morbidity whilst aiming for as complete a resection as possible. 9For medulloblastomas, surgical resection is an important part of their treatment but small residuals can be successfully treated with adjuvant radiotherapy and chemotherapy.Many medulloblastomas have metastases at diagnosis and the prognostic value of extensive resection of the primary is less certain in these cases, although often still attempted. 10For ependymomas, a complete resection is the key to maximising the chances of survival and second-look surgery is advocated where the tumour has not been completely resected and the residual is thought amenable to further resection. 11Given the different surgical strategies required for these three tumour types, a pre-operative diagnosis is a significant contribution to their clinical care. 12In addition, an early diagnosis can allow timely planning of adjuvant treatment and more informed discussions with the family. 13An initial noninvasive diagnosis is made from clinical and imaging information, with this being confirmed or refined by histology and molecular analysis after surgical resection. 14nventional MRI is the standard diagnostic imaging modality, and it can present structural details of tissue and is normally involved in clinical diagnostic determination. 8It allows limited differential classification between the three PBTs with conventional machine learning and requires heavy computational analysis to achieve better performance. 15In contrast, proton magnetic resonance spectroscopy ( 1 H-MRS) 16 can reveal metabolite profiles of human tissue in vivo 17 by observing the T 2 relaxation time variations of metabolites.Metabolite profiles were observed to be cancer-specific 18 and could reflect malignancy, 19 supporting their use in the clinical diagnosis of PBTs. 20However, clinical 1 H-MRS is challenging to acquire at high quality and in particular suffers from limited signal-to-noise ratio (SNR). 21This is due to the relatively low concentration of protons from metabolites in comparison with bulk water and fat protons and the limitation on voxel size required for tumour localisation.Even for well-designed MR systems, noise may still exist due to thermal motion of charged particles and electrons in the receiver coil. 22Spectral noise 23 can introduce errors in estimating metabolite concentrations, which may lead to inaccurate metabolite-or spectrum-based PBT classification.In clinical practice, the 1 H-MRS SNR may not always be of acceptable quality, and it is affected by various factors such as the number of signals averaged and voxel size. 24Apodisation is the commonly used method for clinical 1 H-MRS postprocessing. 25Despite being able to increase the SNR to some degree, apodisation decreases the spectral resolution, which shows as increased full width at half-maximum (FWHM).Consequently, apodisation makes overlapping spectra more difficult to separate and metabolite quantification no longer reliable. 24Wavelet analysis is a unified multiresolution processing technique for nonstationary signals. 26Wavelets were reported to be useful in 1 H-MRS for quantification 27 , analysis, 28 noise suppression (NS), 29 and clinical imaging biomarker identification. 30To reduce the detrimental effects of noise in clinical 1 H-MRS, wavelet analysis could potentially improve the accuracy of metabolite concentration estimation.This article hypothesises that the metabolite concentrations that are estimated from post-noise-suppression (postNS) 1 H-MRS have improved diagnostic accuracy for PBTs.The aim is to investigate the potential of 1 H-MRS NS for improving PBT classification.This retrospective study (Figure 1) was approved by the local research ethics committee (ethics number: 04/MRE04/41).Informed consent was obtained from parents or guardians of all patients.

| Data acquisition
Structural imaging and 1 H-MRS data of PBT cases, with a diagnosis of ependymoma (posterior fossa or supratentorial), medulloblastoma, and pilocytic astrocytoma (posterior fossa), were collected from our four hospitals nationally between October 2004 and December 2019.Each patient underwent MRI and 1 H-MRS before surgical resection.Histological diagnosis was reviewed by local tumour boards.Histological subtypes of the medulloblastomas and ependymomas were grouped together.Multisite imaging data were acquired by using scanners including Siemens Symphony 1.5T, GE Signa LX 1.5T, Philips Ingenia 1.5T, Philips Intera 1.5T, Philips Achieva 3T, Philips X-series 3T, and Siemens Verio 3T (Tables D1-D3).Structural images were acquired by using T 1 -weighted, T 2 -weighted, T 1 -weighted post-contrast and diffusion-weighted MR imaging sequences.
Following conventional imaging that included gadolinium administration, 1 H-MRS with water reference was acquired by using the pointresolved spectroscopy sequence (field strength 1.5T or 3T, head coils or head and neck coils 8-32 channels, sampling frequency 1000-2500 Hz, chemical shift displacement < 4% per ppm, echo time 30-41 ms, number of complex points 512, 1024, or 2048, number of signals averaged 128, pulse repetition time 1500-2200 ms, voxel size 13 Â 13 Â 13-20 Â 20 Â 20 mm 3 : Tables D1-D3).Water suppression was performed through chemical shift selective saturation pulses without outer-volume suppression.Volumes of interest were placed within the tumour by clinical radiologists and MR technicians with reference to structural images.Contrast enhancement and low apparent diffusion coefficient were used as guides where tumours exhibited some heterogeneity.

| Spectroscopy quantification and quality control
1 H-MRS were quantified by using Totally Automatic Robust Quantitation in NMR (TARQUIN) (version 4. 3.11) with the 1 H brain full basis that includes the basis of lipids and macromolecules. 31Quality-control parameters, which were obtained through quantification, included fitting-based SNR (fSNR), whole-spectrum SNR (wSNR), FWHM (Appendix A.1), and Cramér-Rao lower bound (CRLB). 24,32 1H-MRS filtering was subsequently performed according to the aforementioned quality-control parameters and assessed visually by experienced spectroscopists for general quality features, namely phasing, fitting, baseline drifting, and the presence of artefacts.
The following exclusion criteria were applied for raw patient 1 H-MRS screening (Table D4): (1) missing histological diagnosis; (2) missing water suppressed signals, water reference signals, or structural MR images that indicated the 1 H-MRS voxel location; (3) 1 H-MRS voxels only partially containing tumours, as determined by visual inspection of the voxel location images produced on the scanner aided by reference to the available image set; (4) very poor FWHM (> 0.15 ppm) of the spectrum; (5) very poor fSNR (< 4) of the spectrum.All the cases that passed quality-control filtering described above were included in the final cohort for the following analysis in this study.

| Spectroscopy noise suppression
1 H-MRS passing quality screening were processed by using a designed framework, adaptive wavelet noise suppression (AWNS), that combines wavelets and a data-driven approach (Figures 1 and E1, Algorithm 1).Firstly, raw 1 H-MRS was initially processed by a series of wavelet variations (Table D5) in the frequency domain. 33Secondly, quantification of both original and postNS 1 H-MRS was performed by using TARQUIN as described above.Finally, quality metrics were used as the criteria for selecting the final postNS results.Among the multiple spectra produced from the initial input spectrum, the spectrum was selected based on the fSNR and FWHM for the rest of the study.Apart from overall comparison, 1 H-MRS were additionally divided into three groups according to the noise level in each cohort to perform postNS evaluation, providing a poorquality group (4 < fSNR ≤ 10), a medium-quality group (10 < fSNR < 20), and a good-quality group (fSNR > 20). 1 H-MRS NS was conducted by using MATLAB (version 2020a, MathWorks, Natick, Massachusetts, United States).

| Metabolite analysis
For each individual case, metabolite concentrations were normalised based on the sum of all metabolite concentrations, including lipids and macromolecules, as the features for tumour classification.Metabolites whose CRLB percentage values, according to the raw spectrum, were greater than 50% in all cases were excluded (Table D6).The diagnostic ability of a given metabolite to discriminate between the three tumour types was calculated through a multiclass area under the curve (mAUC).The area under the curve (AUC) of the metabolite combinations that consist of highly overlapped metabolites in chemical shifts was further evaluated.Such combinations include (1) creatine and phosphocreatine, at 3.9 ppm; F I G U R E 1 Flowchart showing the methods used for clinical magnetic resonance spectroscopy in this project.Abbreviations: 1 H-MRS, proton magnetic resonance spectroscopy; AWNS, adaptive wavelet noise suppression; FID, free induction decay; NS, noise suppression; SMOTE, synthetic minority oversampling technique; fSNR, fitting-based signal-to-noise ratio; FWHM, full width at half-maximum; mAUC, multiclass area under the curve; PreNS, pre-noise suppression; PostNS, post-noise suppression; PreOS, pre oversampling; PostOS, post oversampling.
(2) glucose and glutathione, at 3.8 ppm; (3) glycine and myo-Inositol, at 3.6 ppm; (4) scyllo-Inositol and taurine, at 3.3 ppm; (5) glycerophosphocholine, phosphocholine, and free choline, at 3.2 ppm; (6) citrate, glutamate, and glutamine, at 2.35 ppm; (7) N-acetylaspartate, N-acetylaspartylglutamate, lipids at 2.0 ppm, and macromolecules at 2.0 ppm; (8) lipids and macromolecules at 1.3 ppm and lactate; (9) lipids and macromolecules at 0.9 ppm.Between the three PBT types, bootstrapped multiclass receiver operating characteristics (ROC) 34 were used to evaluate the diagnostic ability of individual metabolites and metabolite combinations for discriminating between the three tumour types, and bootstrapped multivariate and multiclass ROC with the multinomial logistic regression were used to evaluate the diagnostic ability for all the metabolites.

| Tumour classification
Normalised metabolite concentrations were used as the potential features for tumour classification.Metabolite concentrations were ranked and selected according to their mAUC, which demonstrates their diagnostic value across the three tumour types. 35Features used for classification were set as no more than the number in the minority group minus one.Classification of tumours was performed by using linear and nonlinear classifiers, including linear discriminant analysis, k-nearest neighbours, Naïve Bayes, multinomial log-linear models via neural networks, and support vector machines with a linear kernel.Oversampling was performed with an oversampling rate of 100% or 200% for the minority class by using the adaptive synthetic minority oversampling technique (SMOTE) for classification only. 36The decision tree 37 of tumour classification was assessed by using resubstitution on linear discriminant functions.Classification accuracy was determined by using both leave-one-out and k-fold cross-validation, and the statistical significance values of the difference between classification accuracies were adjusted by using Bonferroni correction (Appendix A.2). Test sets were generated through stratified sampling, and k was set based on the sample size of the minority class as 10 for the 1.5T cohort and 4 for the 3T cohort.

| Statistical analysis
The chi-squared test was performed to evaluate the association between patient sex and tumour types.
The Wilcoxon rank-sum test was performed to evaluate the patient age between tumour types and the tumour classification accuracy between preNS and postNS 1 H-MRS.The Kruskal-Wallis H test was performed to evaluate the noncategorical variables between the three tumour types.The Wilcoxon signed-rank test was performed to evaluate the variables between preNS and postNS, namely the quality-control parameters (fSNR, wSNR, FWHM, CRLB, and CRLB percentage values) of 1 H-MRS and the AUC of metabolites.Specifically, the multivariant AUC 38 was used to compare multiple metabolites for their diagnostic ability, where metabolites were combined by performing regression before being used for prediction and calculating ROC.Significance of the statistical analysis was determined when P < .05.All statistical analysis and machine-learning experiments were conducted by using R (version 4.2.2,The R Foundation, Vienna, Austria).

| Demographics
In the final cohort, a total of 83 patients (Table 1) were scanned at 1.5T (57% were male) and 42 patients scanned at 3T (43% were male).Ependymomas were located in the posterior fossa except for three cases in the 1.5T cohort and one in the 3T cohort.Demographic statistics did not suggest significant differences in sex (Table D7) and age (Table D8) between groups (P > .05),with the exception of significantly younger ependymoma patients than the other two tumour types in the 1.5T cohort (P < .05).

| Spectral quality
PostNS 1 H-MRS showed improved spectral quality across the three tumour types and three quality levels in both 1.5T and 3T cohorts (Table D9).

| Metabolite concentrations
Metabolite concentrations showed significant differences between different tumour types (Table 2) in both the 1.5T and the 3T postNS 1 H-MRS, some of which had their significance in postNS 1 H-MRS.Most metabolites were consistently significantly different between the three tumour types (P < .05)for either prenoise suppression (preNS, Table D12) or postNS 1 H-MRS.For the 1.5T cohort, such metabolites included citrate, total choline, total creatine, glutathione, glycine, total lipids and macromolecules at 0.9 ppm, 1.3 ppm, and 2.0 ppm, myo-Inositol, scyllo-Inositol, and taurine.For the 3T cohort, such metabolites included total choline, total creatine, glutathione, glycine, lactate, macromolecules at 0.9 ppm and 1.3 ppm, scyllo-Inositol, and taurine.However, some metabolites were originally significantly different (P < .05)but lost their significance (P > .05) in postNS 1 H-MRS between the three tumour types.Such metabolites included glucose and lactate for the 1.5T cohort and total N-acetylaspartate for the 3T cohort.Nevertheless, there were still some other metabolites that were significantly different (P < .05) in postNS 1 H-MRS and were originally not significantly different (P > .05) between the three tumour types, namely citrate, glucose, total lipids, and macromolecules at 0.9 ppm from the 3T cohort.Grade IV, demoplastic-nodular 5 0

Pilocytic astrocytomas
Grade I 39 21 † Three of the 13 ependymomas were located supratentorially.‡ One of the four ependymomas was located supratentorially.
F I G U R E 2 Plots showing an example of (A) pre-and (B) post-noise suppression 1 H-MRS spectrum for a clinical case, and violin plots comparatively showing the quality-control parameters for (C, E, G) 1.5T and (D, F, H) 3T pre-and post-noise suppression proton magnetic resonance spectroscopy (1.5T, N = 83; 3T, N = 42), including (C-D) fitting-based signal-to-noise ratio (fSNR) and (E-F) whole-spectrum ratios (wSNR) as well as (G-H) full width at half-maximum (FWHM)

| Classification assessment
The decision tree of tumour classification showed misclassified cases in each group (Figure 5).Misclassified cases are dominated by borderline cases, most of which are ependymomas.In general, postNS 1 H-MRS with or without oversampling both showed fewer misclassified tumours compared with the preNS and pre-oversampling 1 H-MRS.By performing NS on the 1.5T cohort, misclassified tumours were reduced from 7 (8.4%) to 5 (6.0%) before and from 7 to 3 (3.6%)after performing oversampling.Among these misclassified tumours, 2 out of 7 were ependymomas, reduced to 1 out of 5 before performing oversampling.For post-oversampling classification, misclassified cases were reduced from 7 to 3 for postNS 1 H-MRS, among which there was still one misclassified ependymoma.As for the 3.0T cohort, such overall reduction only showed in the post-oversampling 3.0T cohort as 5 (11.9%) to 4 (9.5%) by performing 1 H-MRS NS and two ependymomas remained misclassified after performing NS.
T A B L E 2 Estimated metabolite concentrations in mmol from 1.5T and 3T post-noise suppression proton magnetic resonance spectroscopy.Overall and balanced classification accuracies showed significant improvement after NS generally (P < .05,Figures 6 and E6-E9, Tables D15-D18).Prior to oversampling, NS had significantly improved tumour classification for both the 1.5T and 3T cohorts (P < .05).For the 3T cohort in particular, oversampling improved classification further (P < .05).
Optimised cross-validated classification for the 1.5T and 3T cohorts was achieved through postNS 1 H-MRS with Naïve Bayes.The balanced classification accuracy of the 1.5T cohort was improved to 86% from 76% through postNS 1 H-MRS.After oversampling, the optimal classification accuracy was achieved as 88% overall and 86% balanced.For the 3T cohort, the overall classification accuracy was improved to 74% from 69% and balanced accuracy to 55% from 51%.After performing oversampling, the overall and balanced classification accuracy was improved to 76% from 62% and 56% from 46% through postNS 1 H-MRS, respectively (Table 3, Figure 6).The results of k-fold cross-validation also showed significant improvement of classification accuracy after NS and oversampling (Figures 6,E6-E9).
According to classification measures, postNS 1 H-MRS showed optimal overall classification accuracy when combined with oversampling for both the 1.5T and 3T cohorts (Table 3).Ependymoma identification was improved after performing NS in postNS 1 H-MRS, showing as the improved F 1 score from 0.67 to 0.76 for the 1.5T cohort and 0 to 0.29 for the 3T cohort.After performing oversampling, optimal ependymoma identification was achieved for the 1.5T cohort, showing as an F 1 score improved from 0.69 AE 0.02 to 0.80 AE 0.03.For the 3T cohort, ependymoma identification was not improved clearly through oversampled 1 H-MRS.concentrations estimated from such 1 H-MRS spectra may be inaccurate, which will lead to poor clinical performance of 1 H-MRS.Higher field strength with an optimised scanning protocol may be able to increase the resolution of MR spectra and the certainty of metabolite concentration determination.However, the spectral noise of 1 H-MRS may still be a problem in clinical practice and be a barrier to adoption.For example, the size of voxels is restricted for a small tumour to avoid partial volume effects, and such a small voxel will inevitably lead to noisy 1 H-MRS.Wavelets can provide a flexible solution for 1 H-MRS with a wide range of noise levels.The frequency-uniform filter used by wavelets can preserve the probability distribution of noise (Appendix C), 39 indicating the valid use of CRLB for postNS 1 H-MRS spectra. 25I G U R E 4 Box and scatter plots describing the diagnostic ability alteration of metabolites as (A) combinations or (B) individuals through multiclass (C) univariant or (D) multivariant receiver operating characteristics in the 3T cohort (N = 42), where the difference was compared between pre (left) and post (right) noise suppression.
AWNS achieves a robust performance by combining wavelet analysis with a data-driven approach.The use of wavelets has been considered for 1 H-MRS NS, but previous studies are limited to exploring the performance of specific wavelet bases and variations. 29However, the unpredictable noise in clinical 1 H-MRS makes it challenging to select a universal wavelet variation that is optimal in all situations.This question remains unanswered in previous wavelet-related approaches.AWNS is designed to address this issue, where wavelet variations are selected adaptively through a data-driven approach, where the selection is guided by quality-control parameters.In such a way, AWNS can theoretically preserve the metabolite-related spectral components whilst largely removing signals originating from other sources.In addition, the improvement of fSNR brought by AWNS remains robust for high-SNR spectra, and the CRLB for most metabolites can be reduced as well.This is because AWNS considers the existence of noise and tries to suppress even low-level noise instead of only showing efficiency for low-SNR spectra.
Previous approaches to 1 H-MRS NS evaluated their performance by observing the SNRs of experimental data, where the SNRs are defined by using regional noise. 29,40Although recommended by the recent consensus, 24 regional SNRs are unlikely to be able to reflect the accuracy of quantified metabolite concentrations, because the fitting error that is relevant to 1 H-MRS quantification has not been considered.In contrast, AWNS uses fSNR (Equation A.1) to guide NS, thus the fitting performance of metabolites has been considered.Furthermore, the 1 H-MRS obtained from simulated, phantom, or healthy human brain in previous studies may contain only a few metabolites that have relatively simpler line shapes. 27,29,40This leaves the question about how useful such approaches can be in clinical studies.To address this question, AWNS is designed by considering the need of clinical questions and assessed by evaluating the case-by-case fSNR alteration and targeting the diagnostic performance of 1 H-MRS.Given the complexity of the brain tumour 1 H-MRS used in this study, AWNS could be a more robust, powerful, and practical solution for real-world 1 H-MRS.
AWNS might be able to improve the 1 H-MRS quantification performance.A naive wavelet decomposition and reconstruction process on a time-domain signal simply removes the signal components according to their locations in the frequency domain, and in most cases the signal components with higher frequency are considered as the noise to remove.Considering 1 H-MRS, the self-repeated spectral components, which contain little metabolite information, are considered as noise.To make sure the removed spectral components are dominated by noise and the preserved ones are mainly contributed to by metabolites, AWNS optimises wavelet variations by maximising the fSNR and keeping the FWHM within an acceptable range.As a result, postNS 1 H-MRS showed significantly lower median CRLB for all metabolites and CRLB percentage values for most metabolites.This is because CRLB 41 assesses whether the observation has unknown probability distributions by providing precision estimators, and the removal of the irrelevant information makes the distribution clearer.In this process, AWNS does not add any new information to spectra, but it learns from the prior knowledge of existing metabolites in the acquired 1 H-MRS and attempts to remove the spectral components that are irrelevant to such prior knowledge as much as possible.Since quantification 24 is undertaken by fitting spectra with prior knowledge, AWNS may thereby be able to improve the quantification performance.Compared with apodisation, a technique that increases the line width theoretically and therefore is only recommended for visualisation purposes, 42 AWNS can keep the line width stable.Some metabolites are highly overlapping in spectra, which makes it a challenge to increase the fitting performance for these overlapping metabolites and a NS method that increases line width is undesirable.
The diagnostic ability of some metabolites has been different in postNS 1 H-MRS.The results show that the alteration of metabolites' diagnostic ability, determined through mAUC, varies after NS, even though most metabolites, as well as most metabolite combinations, showed significantly improved mAUC.The potential reasons for this phenomenon include the vulnerability of some metabolite spectra to noise, overlapping metabolite spectra, and the nature of metabolites itself.Firstly, metabolites that have slightly higher intensity than noise can be fitted successfully, but the results are affected by noise, thus these metabolites can potentially be estimated more accurately in postNS 1 H-MRS.For instance, the combination of glutamate, glutamine, and taurine showed stably better diagnostic ability in both 1.5T and 3T postNS 1 H-MRS.Secondly, some metabolites overlap to such an extent that they are difficult to identify separately even if the noise has been suppressed and so little may be gained by suppressing spectral noise.A typical example is the refined glycine in the 3T results that showed high mAUC, whilst it was mixed with myo-Inositol in 1.5T spectra and therefore showed relatively lower mAUC.Lastly, it is assumed that more accurate metabolite concentrations can provide improved classification performance of tumours, but some metabolites may then have less powerful diagnostic ability.
PostNS 1 H-MRS improves the classification performance for these PBTs.The results of classification accuracy through metabolite selection showed a significant improvement by postNS 1 H-MRS rather than preNS 1 H-MRS.Metabolite selection is a newly proposed method of feature extraction for machine learning in PBT classification, which showed advantages over PCA. 35Our previous results showed improved classification accuracy by using PCA on postNS 1 H-MRS as well, 33 indicating the robustness of noise suppression for 1 H-MRS based tumour classification.It remains unknown whether postNS 1 H-MRS will provide more accurate metabolite concentration determination, but the classification performance is improved, showing the advantage of 1 H-MRS noise suppression in clinical decision making.
Decision evaluation through resubstitution showed that postNS 1 H-MRS provides fewer misclassified cases, which indicates that postNS 1 H-MRS has the potential of improving clinical diagnosis of PBTs.However, misclassified cases were still presenting and also mostly borderline cases.Oversampling through SMOTE was performed in this study because the ependymomas had a much smaller sample size than the other two tumour classes and the balanced classification accuracy was consequently limited.The improvement of classification accuracy contributed by oversampling ependymomas depended on the machine-learning classifiers used.Whether postNS 1 H-MRS provides more sustainably improved classification performance than oversampling is unclear.The results have not suggested whether the combination of NS and oversampling always outperforms only NS or oversampling, although either of them can be helpful.NS aims to improve the classification accuracy through providing more accurately estimated metabolite concentrations, while oversampling aims to rebalance the group size by creating artificial cases for the minority group.Therefore, whether NS or oversampling is more helpful will be dependent on the cohort size and the 1 H-MRS spectral quality.
The clinical utility of 1 H-MRS NS for the tumours in this study should also be considered.The tumours all have surgical resection as their preferred initial treatment.However, the extent of surgical resection required depends greatly on the tumour type.A complete macroscopic resection is crucial if long-term survival is to be achieved in ependymomas, with even small fragments of residual tumour being difficult to control with adjuvant radiotherapy and chemotherapy.Conversely, in cases of pilocytic astrocytoma, residual tumour post-surgery often requires no further intervention, since many cases do not experience further tumour growth.In medulloblastomas, small tumour residuals post-surgery can often but not always be successfully treated with radiotherapy and chemotherapy.Prior knowledge of the tumour type is therefore important in surgical decision making.Furthermore, complex planning of radiotherapy and in particular proton therapy, often at a centre remote to the surgery, benefits from early initiation rather than when a final histological or molecular diagnosis is available several days later.At the same time, families find the time between the initial diagnostic MRI and definitive diagnosis particularly challenging, and an early noninvasive diagnosis can improve the quality of discussions with the family.In addition to the direct clinical improvement that could result for the tumour types used in this study, T A B L E 3 Classification measures comparing pre-and post-noise suppression proton magnetic resonance spectroscopy with Naive Bayes. the method should be readily applicable to other tumour types for which surgical resection is not the preferred initial treatment and an accurate noninvasive diagnosis is particularly important.
Although more accurate determination of metabolite concentrations could be obtained through optimising MR sequences, increasing scanning durations, or applying higher field strength, limitations imposed by clinical practice determine that 1 H-MRS still suffers from low SNR and improvements in postprocessing remain important.Current clinical MR investigations are often undertaken on 1.5T scanners, particularly where spinal imaging is required.Where 1 H-MRS is undertaken at higher field strength, smaller voxel sizes may be used, negating the advantages.Clinically applicable sequences are also limited in acquisition time, especially for children, as longer scanning duration may not be tolerated.Instead, NS as postprocessing can assist in improving the diagnostic accuracy of metabolite concentrations, which can be applied widely across different scanners and protocols.In addition to improving the diagnostic performance of metabolites for clinical 1 H-MRS, NS may also make it possible to use some 1 H-MRS data that fail to meet the SNR quality control screening.
The study has a series of limitations.For the method itself, the selection of wavelets is unsupervised and only determined according to the fSNRs that are dominated by fitting residual, thus the risk of introducing signal-dependent noise variance 43 has not been addressed.Regarding the cohort, there is a limited sample size, particularly in the 3T cohort for its relatively smaller voxel size that leads to more noise in spectra.Meanwhile, the ependymoma patients are relatively younger than the remaining two groups in the 1.5T cohort, which could be due to higher frequency of ependymomas in younger population. 44The ground truth of metabolite concentrations of in vivo brain tissues is not known and so there is no definitive proof that the concentrations are determined more accurately in postNS 1 H-MRS.Therefore, this study was undertaken with the hypothesis that the metabolite concentrations determined from postNS 1 H-MRS might provide better classification accuracy.Meanwhile, the study of PBT classification is limited by the methodology and sample size.It is challenging to know the ground truth of metabolite concentrations of in vivo brain tissues.Consequently, this study is under the hypothesis that the metabolite concentrations determined from postNS 1 H-MRS might provide better classification accuracy.Cell-signalling pathways in PBTs suggest association between metabolites 45 , whilst metabolites were generally considered as independent during this classification.The limited classification accuracy suggested by the 3T cohort could be due to the small ependymoma group, which meant that the range of 1 H-MRS for this diagnosis would not be fully represented but also limited the number of features we allowed in the classification.Higher accuracy would be expected in a dataset containing more ependymoma cases.
Considering noise suppression for clinical 1 H-MRS, this article only presents a method that can improve the diagnostic ability of 1 H-MRS by suppressing the noise as an initial step.As following work, further optimisation for wavelet computing 46 and selection 28 is required, prior to making the method available as a software package for being translated into clinical practice.The observed improvement of fSNR by AWNS seems to be related to metabolite concentration levels, which means the performance of AWNS could be related to the spectral line shape.Therefore, further assessment of AWNS will address not only simulated and phantom 1 H-MRS, but also in vivo 1 H-MRS acquired from multiple types of tissues.

| CONCLUSIONS
Noise suppression for clinical 1 H-MRS can provide significantly improved spectral quality, metabolite concentrations with increased diagnostic ability, and better classification performance for paediatric brain tumours.The fitting-based signal-to-noise ratio (fSNR) was calculated based on the amplitude of the highest peak L of the spectrum and the fitting residual σ 1 between 0.2 and 4 ppm as

Conceptualisation
where σ 1 denotes the fitting residual, derived as the difference between the original spectrum and the combination of fitting bases finally determined, at the range of 0.2-4 ppm, and L is the height of the highest peak of the spectrum.

A.1.2 | wSNR
The whole-spectrum signal-to-noise ratio (wSNR) was calculated based on the amplitude of the highest peak L of the spectrum and the fitting residual σ 0 of the whole spectrum as where σ 0 denotes the fitting residual derived as the difference between the original spectrum and the combination of fitting bases finally determined, throughout the whole spectrum, and L is the height of the highest peak of the spectrum.

A.1.3 | FWHM
The full width at half-maximum (FWHM) was calculated based on the width of the unsuppressed water peak at half its full height. 242 | Classification accuracy parameters where α LOOCV denotes the leave-one-out cross-validated overall classification accuracy, β LOOCV the leave-one-out cross-validated balanced classification accuracy, α k-fold the k-fold cross-validated overall classification accuracy, β k-fold the k-fold cross-validated balanced classification accuracy, CP X the correct predictions for tumour type X, TP X the total predictions for tumour type X, CPF X the correct predictions of the fold for tumour type X, and TPF X the total predictions of the fold for tumour type X.

Notation xðnÞ
A 1 H-MRS signal defined in the time domain, where n denotes the time points.

Ψ
A set of wavelet variations, including wavelet basis functions, wavelet transform methods, wavelet decomposition methods, thresholding methods, and decomposition levels, where a wavelet variation is denoted by ψ i .

F
Performing discrete Fourier transform on a time-domain signal.
xðsÞ A 1 H-MRS spectrum defined in the frequency domain, where s denotes the chemical shift.
fĉ ψ i ,τ g A set of spectral components derived by using the wavelet ψ i and indexed by τ.
The reconstructed spectrum from spectral components fĉ ψ i ,τ g with the cut-off τ 0 .

Q
The quality-control metric, i.e.Q 0 for that of xðsÞ and Q i for that of x0 i ðsÞ.ŷðsÞ The finally chosen spectrum with optimal Q from all x0 i ðsÞ.yðnÞ The post-noise suppression 1 H-MRS signal that is defined in the time domain.

APP E NDIX C : PROOF
Consider that we have the noise ωðnÞ presenting in the free induction decay signal xðnÞ: Assuming the original noise follows the Gaussian distribution with the mean of zero, then Considering that the noise will be decomposed by using the orthonormal basis B ¼ fb k g 0 ≤ k < N , the decomposed noise will be calculated as Such decomposed noise will also satisfy Ef ωB ½k 1 ωB ½k 2 g ¼ E indicating that the suppressed noise in the processed 1 H-MRS signals also meets the Gaussian distribution.
T A B L E D 1 The selected clinical variables, scanning parameters, and quality-control parameters of the 1.5T cohort, Part I.  showing the statistical results of quality-control parameters (QCPs), namely fitting-based signal-to-noise ratio (fSNR), whole-spectrum signal-to-noise ratio (wSNR), and full width at half-maximum (FWHM), from pre-noise suppression (preNS) and post-noise suppression (postNS) clinical brain tumour proton magnetic resonance spectroscopy.The evaluation was presented for specific cases based on tumour type or spectral quality, namely ependymomas (EP), medulloblastomas (MB), and pilocytic astrocytomas (PA) for tumour type-based groups, and poor quality (PQ), medium quality (MQ), and good quality (GQ) for spectral quality-based groups.The difference between preNS and postNS QCPs was compared by performing Wilcoxon signed-rank tests, where the significant levels are determined by the conditions as P < .05(*), P < .01(**), P <.001 (***), and P < .0001(****).

T A B L E 1
Demographic and clinical variables of patients.

F I G U R E 3
Box and scatter plots describing the diagnostic ability alteration of metabolites as (A) combinations or (B) individuals through multiclass (C) univariant or (D) multivariant receiver operating characteristics in the 1.5T cohort (N = 83), where the difference was compared between pre (left) and post (right) noise suppression.This study aimed to address the issue of 1 H-MRS spectral quality in PBT classification through performing NS on a multisite dataset.PostNS 1 H-MRS showed significantly improved fSNR and insignificantly altered FWHM across all three tumour types and quality levels.Machine-learning based classification performance was significantly improved through postNS 1 H-MRS.Although some metabolites showed decreased mAUC, most metabolites showed increased mAUC in postNS 1 H-MRS.This finding corresponds to the final improved classification accuracy through all the classifiers and indicates the potential utility of postNS 1 H-MRS for PBT diagnosis.The improved classification performance of postNS 1 H-MRS might indicate better metabolite concentrations for tumour diagnosis.Wavelets are a useful tool and expected to enhance the clinical value of 1 H-MRS.Clinical 1 H-MRS often has a limited fSNR (< 10) due to inherent lack of sensitivity and limitations on acquisition time.The metabolite scatter plots showing the classification performance of common paediatric brain tumours with discriminant functions (DF) through resubstitution and eligible metabolites.Classification was comparatively evaluated between (A, C, E, G) pre-and (B, D, F, H) post-noise suppression (NS) and (A-B, E-F) pre-and (C-D, G-H) post-oversampling (OS) 1.5T (A-D, preOS N = 83, postOS N = 96) and 3T (E-H, preOS N = 42, postOS N = 46) proton magnetic resonance spectroscopy.Investigated tumour types include ependymomas as square markers, medulloblastomas as circular markers, and pilocytic astrocytomas as triangle markers, among which ependymomas were oversampled, shown by white square markers, to 200% as comparison (C-D, G-H).Uncertainly classified cases are shown as transparent, with the contrast indicating the probability of classification.Misclassified cases are marked with a cross mark, and misclassified ependymomas are additionally marked with a plus mark.

F
I G U R E 6 Box plots showing the significantly improved overall and balanced classification accuracy (AccOvra and AccBlcd) of the three brain tumour types, ependymomas, medulloblastomas, and pilocytic astrocytomas, determined through (A-B, E-F) leave-one-out (LOO) and (C-D, G-H) k-fold cross-validation (CV) and Naïve Bayes for (A, C, E, G) pre-and (B, D, F, H) post-oversampling (OS) 1.5T (A-D, preOS N = 83, postOS N = 96) and 3T (E-H, preOS N = 42, postOS N = 46) pre-(grey) and post-(black) noise suppression (NS) proton magnetic resonance spectroscopy, where oversampling was performed for ependymomas with an oversampling rate of 100%.Level of significance: ****, P < .0001It is noteworthy that resubstitution indicates ependymomas were classified well, but classification accuracy estimated through cross-validation indicates ependymomas were poorly classified.This indicates that ependymomas are diverse and often different from other groups, and such ependymomas are hard to classify, since prior knowledge of them does not exist in the training set when performing cross-validation, even when oversampling has been performed.At the same time, these results also indicate that postNS 1 H-MRS may have limited ability for classifying these diverse ependymomas but can improve the classification for the rest tumours.

:
Teddy Zhao and Andrew C. Peet.Methodology, software, validation, formal analysis, investigation, data curation, writing-original draft, visualisation: Teddy Zhao.Resources: all authors.Writing-review and editing: Teddy Zhao, Andrew C. Peet, Barry Pizer, Dorothee P. Auer, Heather E. L. Rose, Lesley MacPherson, James T. Grist, Martin Wilson, Nigel P. Davies, and Theodoros N. Arvanitis.Supervision: James T. Grist and Andrew C. Peet.Project administration, Funding acquisition: Andrew C. Peet.FINANCIAL DISCLOSURE None reported.APP E NDIX A: DEFINITIONS A.1 | Quality-control parameters A.1.1 | fSNR APP E NDIX D : TABLES APP E NDIX E : FIGURES F I G U R E E 1 The flowchart of adaptive wavelet noise suppression (AWNS).F I G U R E E 2 Diagrams showing the quality-control parameters of the 1.5T proton magnetic resonance spectroscopy data, including (A-D) fitting-based signal-to-noise ratio (fSNR), (E-H) whole-spectrum signal-to-noise ratio (wSNR), and (I-L) full width at half-maximum (FWHM), for each brain tumour type, including (A, E, I) ependymoma (EP), (B, F, J) medulloblastoma (MB), and (C, G, K) pilocytic astrocytoma (PA), with (D, H, L) all the cases as the reference.FI G U R E E 3Diagrams showing the quality-control parameters of the 3T proton magnetic resonance spectroscopy data, including (A-D) fitting-based signal-to-noise ratio (fSNR), (E-H) whole-spectrum signal-to-noise ratio (wSNR), and (I-L) full width at half-maximum (FWHM), for each brain tumour type, including (A, E, I) ependymoma (EP), (B, F, J) medulloblastoma (MB), and (C, G, K) pilocytic astrocytoma (PA), with (D, H, L) all the cases as the reference.FI G U R E E 4 Diagrams showing the quality-control parameters of the 1.5T proton magnetic resonance spectroscopy data, including (A-D) fitting-based signal-to-noise ratio (fSNR), (E-H) whole-spectrum signal-to-noise ratio (wSNR), and (I-L) full width at half-maximum (FWHM), for each level of spectral quality, including (A, E, I) poor, (B, F, J) median, and (C, G, K) good quality, with (D, H, L) all the cases as the reference.FI G U R E E 5Diagrams showing the quality-control parameters of the 3T proton magnetic resonance spectroscopy data, including (A-D) fitting-based signal-to-noise ratio (fSNR), (E-H) whole-spectrum signal-to-noise ratio (wSNR), (I-L) and full width at half-maximum (FWHM), for each level of spectral quality, including (A, E, I) poor, (B, F, J) median, and (C, G, K) good quality, with (D, H, L) all the cases as the reference.F I G U R E E 6 Box plots showing the significantly improved overall and balanced classification accuracy (AccOvra and AccBlcd) of the three brain tumour types, ependymomas, medulloblastomas, and pilocytic astrocytomas, determined through (A-B, E-F) leave-one-out (LOO) and (C-D, G-H) k-fold (k = 10 for the 1.5T cohort and k = 4 for the 3T cohort) cross-validation (CV) and k-nearest neighbours for (A, C, E, G) pre-and (B, D, F, H) post-oversampling (OS) (A-D) 1.5T and (E-H) 3T pre-(grey) and post-(black) noise suppression proton magnetic resonance spectroscopy data, where oversampling was performed for ependymomas with an oversampling rate of 100%.Level of significance: ns, P > 0.05; **, < 0.01; ****, P < 0.0001.F I G U R E E 7 Box plots showing the significantly improved overall and balanced classification accuracy (AccOvra and AccBlcd) of the three brain tumour types, ependymomas, medulloblastomas, and pilocytic astrocytomas, determined through (A-B, E-F) leave-one-out (LOO) and (C-D, G-H) k-fold (k = 10 for the 1.5T cohort and k = 4 for the 3T cohort) cross-validation (CV) and linear discriminant analysis for (A, C, E, G) preand (B, D, F, H) post-oversampling (A-D) 1.5T and (E-H) 3T pre-(grey) and post-(black) noise suppression proton magnetic resonance spectroscopy data, where oversampling was performed for ependymomas with an oversampling rate of 100%.Level of significance: ns, P > 0.05; **, P < .01;***, P < .001;****, P < .0001.F I G U R E E 8 Box plots showing the significantly improved overall and balanced classification accuracy (AccOvra and AccBlcd) of the three brain tumour types, ependymomas, medulloblastomas, and pilocytic astrocytomas, determined through (A-B, E-F) leave-one-out (LOO) and (C-D, G-H) k-fold (k = 10 for the 1.5T cohort and k = 4 for the 3T cohort) cross-validation (CV) and single-layer neural network for (A, E, G) preand (B, D, F, H) post-oversampling (A-D) 1.5T and (E-H) 3T pre-(grey) and post-(black) noise suppression proton magnetic resonance spectroscopy data, where oversampling was performed for ependymomas with an oversampling rate of 100%.Level of significance: ns, P > 0.05; **, P < .01;***, P < .001;****, P < .0001.F I G U R E E 9 Box plots showing the significantly improved overall and balanced classification accuracy (AccOvra and AccBlcd) of the three brain tumour types, ependymomas, medulloblastomas, and pilocytic astrocytomas, determined through (A-B, E-F) leave-one-out (LOO) and (C-D, G-H) k-fold (k = 10 for the 1.5T cohort and k = 4 for the 3T cohort) cross-validation (CV) and support vector machine with a linear kernel for (A, C, E, G) pre-and (B, D, F, H) post-oversampling (A-D) 1.5T and (E-H) 3T pre-(grey) and post-(black) noise suppression proton magnetic resonance spectroscopy data, where oversampling was performed for ependymomas with an oversampling rate of 100%.Level of significance: *, P < .05;****, P < .0001.
The selected clinical variables, scanning parameters, and quality-control parameters of the 1.5T cohort, Part II.Three of the 13 ependymomas were located supratentorially.‡ One of the four ependymomas was located supratentorially.List of wavelet variations used.Statistics of patient age.Table showing the statistics for patient age between two tumour types.Significant levels are determined by the conditions as P < .05(Table showing the statistics for patient sex between two tumour types.Significant levels are determined by the conditions as P < .05(*), P < .01(**), P < .001(***), and P < .0001(****).
(Continues) T A B L E D 1 (Continued) T A B L E D 2 T A B L E D 2 (Continued) Abbreviations: AHCH, Alder Hey Children's NHS Foundation Trust, Liverpool; BCH, Birmingham Children's Hospital, Birmingham Women's and Children's NHS Foundation Trust; QMC, Queen's Medical Centre, Nottingham University Hospitals NHS Trust; RVI, Royal Victoria Infirmary, The Newcastle upon Tyne Hospitals NHS Foundation Trust.NCC, number of coil channels; FS, sampling frequency; TE, echo time; TR, repetition time; NCP, number of complex points; fSNR, fitting-based signal-to-noise ratio; preNS, pre-noise suppression; postNS, post-noise suppression.T A B L E D 3 The selected clinical variables, scanning parameters, and quality-control parameters of the 3T cohort.(Continues)T A B L E D 3 (Continued) Abbreviations: AHCH, Alder Hey Children's NHS Foundation Trust, Liverpool; BCH, Birmingham Children's Hospital, Birmingham Women's and Children's NHS Foundation Trust; QMC, Queen's Medical Centre, Nottingham University Hospitals NHS Trust; RVI, Royal Victoria Infirmary, The Newcastle upon Tyne Hospitals NHS Foundation Trust.NCC, number of coil channels; FS, sampling frequency; TE, echo time; TR, repetition time; NCP, number of complex points; fSNR, fitting-based signal-to-noise ratio; preNS, pre-noise suppression; postNS, post-noise suppression.T A B L E D 4 Details of quality filtering for the acquired brain tumour proton magnetic resonance spectroscopy.T A B L E D 4 (Continued) T A B L E D 6 Excluding metabolites according to duplication and Cramér-Rao lower bound percentage values.*),P < .01(**),P <.001 (***), and P < .0001(****).T A B L E D 7 Statistics of patient sex.T A B L E D 9 Statistics of quality-control parameters.
Table comparing the uncertainty of metabolites as well as lipids and macromolecules (LM) measured from 1.5T and 3T pre-and post-noise suppression proton magnetic resonance spectroscopy that is given by Cramér-Rao lower bound (CRLB) values.Significant levels are determined by the conditions as P < .05(*),P<.01(**),P<.001(***), and P < .0001(****).Statistics of Cramér-Rao lower bound percentage values.Estimated metabolite concentrations in mmol from 1.5T and 3T pre-noise suppression proton magnetic resonance spectroscopy.The concentrations of metabolites as well as lipids and molecules (LM) prior to normalisation.P values, which evaluated the difference between the three tumour types, were calculated through Kruskal-Wallis H tests. Significant levels are determined by the conditions as P < .05(Statistics of multiclass areas under the curve for metabolites as well as lipids and macromolecules.Table showing the multiclass area under the curve of metabolites as well as lipids and macromolecules (LM), comparatively between pre-and post-noise suppression proton magnetic resonance spectroscopy.Significant levels are determined by the conditions as P < .05(*),P<.01(**),P<.001 (***), and P < .0001(****).Statistics of multiclass areas under the curve for the combinations of metabolites as well as lipids and macromolecules.Statistics of the overall classification accuracy for the 1.5T cohort.Tablecomparativelyshowing the overall classification accuracy (ACC) for the 1.5T cohort between pre-and post-noise suppression (NS) in either pre-or post-oversampling (OS) conditions.The ACC was generated by using either leave-one-out cross-validation (LOOCV) or k-fold cross-validation (k = 4), and the machine-learning classifiers considered include k-nearest neighbours (kNN), linear discriminant analysis (LDA), naïve Bayes (NB), single-layer neural network (NN), and support vector machine (SVM).Significant levels are determined by the conditions as P < .05(*), P < .01(**), P <.001 (***), and P < .0001(****).Statistics of the balanced classification accuracy for the 1.5T cohort.Statistics of the overall classification accuracy for the 3T cohort.