Integration of Raman spectra with transcriptome data in glioblastoma multiforme defines tumour subtypes and predicts patient outcome

Abstract Raman spectroscopy is an imaging technique that has been applied to assess molecular compositions of living cells to characterize cell types and states. However, owing to the diverse molecular species in cells and challenges of assigning peaks to specific molecules, it has not been clear how to interpret cellular Raman spectra. Here, we provide firm evidence that cellular Raman spectra (RS) and transcriptomic profiles of glioblastoma can be computationally connected and thus interpreted. We find that the dimensions of high‐dimensional RS and transcriptomes can be reduced and connected linearly through a shared low‐dimensional subspace. Accordingly, we were able to predict global gene expression profiles by applying the calculated transformation matrix to Raman spectra and vice versa. From these analyses, we extract a minimal gene expression signature associated with specific RS profiles and predictive of disease outcome.


| INTRODUC TI ON
Glioblastoma multiforme (GBM; WHO Grade IV), the most aggressive form of primary brain tumours, harbour a dreadful prognosis despite invasive treatments including maximal safe resection, radiotherapy and chemotherapy. 1 Although all glioblastomas share common histopathological characteristics, they form a highly heterogeneous group of tumours in terms of underlying molecular and genetic alterations. Some alterations already were linked to strong therapeutic and survival outcomes, such as the methylation of the MGMT promoter and the presence of mutations in the isocitrate deshydrogenase (IDH) gene. 2,3 As such, one of the keys of future targeted therapies is the optimization of the stratification of patients and tumours. Currently, two main molecular sub-classifications of glioblastomas are widely used. The first is based on the presence of a pre-existing lowgrade tumour (primary vs. secondary GBM, correlated with IDH mutations) whereas the second derives from a multiparametric clustering, mostly based on genomic data, which differentiated four subtypes of GBM (mesenchymal, pro-neural, neural and classical). 3,4 This classification is strongly correlated with clinical features, but the limited therapeutic arsenal against GBM, as well as the genetic information, it requires preclude it from being used in clinical settings. Furthermore, no metabolic data have been so far used in either classification, thus, limiting a complete exploration of tumour physiology.
Raman or vibrational spectroscopy (RS) is a non-invasive, labelfree technique allowing chemical analysis based on the analysis of the reflection of a monochromatic light on a sample. As the wavelength shift of the scattered light is correlated with some molecular structural properties, RS is able to provide an immediate chemical fingerprint of a sample. RS is widely used in chemistry, but its application in biology has long been restricted due to technical limitations and issues in processing complex spectra related to the multiplicity of different chemical compounds found in biological samples such as tumours. 5,6 Two main spectroscopy techniques were described in the literature (Raman and infrared spectroscopy), with highly variable conditions across studies, and numerous subtypes of technologies. 7 Raman and IR spectroscopy might provide complementary information, considering that they do not assess the same physical phenomena. However, one of the main advantages of RS is the absence of need for sample preparation, thus, allowing direct analysis of tissues.
Several studies showed that RS is able to discriminate normal brain from tumour tissue with high specificity, and even differentiate different types of primary brain tumours. [8][9][10] Recent developments of miniaturization now enable RS to be performed during surgical removal, thus, providing continuous information about the nature of the tumour and resection margins. 8,9,11 However, if the diagnostic power of RS has been studied considering the whole spectra, no detailed analysis of the data provided by individual peaks has been performed to our knowledge.
Herein, we hypothesized that tumour RS might correlate with genomic data and, therefore, allow the rapid and accurate determination of tumour features and prognosis. To test this hypothesis, we utilized transcriptome data on RS-based GBM clustering. We identified a RS-based signature that correlates with the expression of specific genes and predicts specific tumour features associated with aggressiveness. We discuss how this tool could be used in a clinical context.

| Raman spectroscopy
Raman spectra were acquired with a ThermoFisher Raman Microscope DRX2. Frozen samples were cut in slices of 30 µm thickness. Three fields of acquisition were randomly selected in non-necrotic areas (100 µm × 100 µm, 121 spectra by field). The acquisition conditions were as follows: 532 nm laser, laser power 3.5 mW, exposure time 0.2 s, 45 iterations, pinhole diameter 25 µm.
Each sample's final spectrum corresponded to a mean of all acquired spectra.

| Clustering of Raman spectra
Model-based clustering of spectra was performed, using the R package mclust in order to find the optimal number of clusters. The selected optimal model comprised 2 clusters maximizing the distance amongst distinct groups. Each spectrum was then labelled according to its belonging to either group (Raman Group 1/Group 2). Clustering of Raman spectra: Model-based clustering of spectra was performed, using the R package mclust to find the optimal number of clusters. The selected optimal model comprised 2 clusters maximizing the distance amongst distinct groups. Each spectrum was then labelled according to its belonging to either group (Raman Group 1/Group 2).  15 The analysis considered only peaks from 600 to 1800 cm −1 and 2800 to 3100, which correspond to biologically meaningful signal ranges. MFA produced composite Eigen vectors (linear combinations of variables) consisting of Raman peak frequencies and gene symbols. Table 2 shows the prioritized genes using as ranking measure their contribution to the first two Eigen vectors (principal components). The top 36 genes were selected as the minimum subset that maximizes the distance between the two clusters.

| Survival analyses
We performed a cross-study analysis to assess the prognostic value of the RS-based signature. Gene expression was used as a predictor and survival time (in months) as the response. We used the TCGA cohort (Affymetrix microarray data) to train the prognostic model.

| Histochemistry and immunohistochemistry
Paraffin blocks corresponding to the tumour samples analysed using Raman were processed for immunohistochemistry (IHC). IHC were carried out on the H2P2 core facility. The sections were incubated 1 h at room temperature with anti-IBA1 (1:10,000 dilution; proteintech). Immunostaining was carried out using the discovery ultra (Ventana Medical Systems) with the Rhodamine kit (a "biotin-free" system using multimer technology, Roche) and a Tris borate EDTA pH8 buffer for antigen retrieval. Sections were converted on to digital slides with the scanner Nanozoomer 2.0-RS and immunostaining were quantified with the NIS software (Nikon).

| Histological classification of primary brain tumours using Raman spectroscopy
Our initial objective was to evaluate whether Raman spectroscopy (RS) was a relevant tool to analyse primary brain tumour in order to classify them and to potentially predict patient outcome. To this end, we designed an experimental approach relying on a collection of frozen tumour samples conserved in our institutional biobank, the "Centre de Resources Biologiques (CRB) de Rennes" and analysed using RS and histochemistry ( Figure 1A). Sixteen primary brain tu- with Oligo II and III as part of the same group. However, GBM shared some characteristics with oligo II suggesting similar metabolic pathways involvement in pathogenesis ( Figure 1D). As such, one might conclude from these analyses that RS spectra could sort the tumours based on their nature as determined by histopathology.

| GBM stratification using Raman spectroscopy
To further push this idea, we next sought to test whether the power of RS would be sufficient to identify different groups in GBM, a tumour type known for its high heterogeneity. 16  methylene (-CH2-) groups. When these spectra were used to hierarchically cluster the tumours, two highly distant groups were obtained ( Figure 2B). Next, in an attempt to functionally annotate the two groups of tumours, we performed MFA in order to identify the genes whose expression correlated with Raman group 1 and those whose expression correlated with Raman group 2. A total of 36 genes were selected, the expressions of which maximized the distance between the two groups in the clustering analysis ( Figure 2C). Of those genes, 24 correlated with group 1 and 12 with group 2 ( Figure 2C, Table 2 results in more intense peaks around ~3000 cm −1, and less intense peaks around 1000-1600 cm −1 , whilst a negative correlation value signifies the exact opposite. Interestingly, tumour clustering based on the expression of these 30 genes almost perfectly recapitulated the segregation obtained using RS data. At last, we functionally analysed the genes identified through this approach and found that genes whose expression correlated with Raman profiles might relate to an altered immune cell infiltration ( Figure 2D). This prediction was evaluated using immunohistochemistry with anti-IBA1 antibodies (for the staining of macrophages and microglial cells) in matched tumour sections and revealed that RS-based Group2 tumours were more infiltrated by myeloid cells than RS-based Group1 tumours ( Figure 2E), which could be indicative of a differential tumour aggressiveness provided that the infiltrate could be immune promoting or suppressive.

| Establishment of GBM RS-based molecular signatures
Since we established that RS-based classification of GBM provided a relevant tool to characterize the tumour features, we next sought to further document the links between RS analysis and tumour characteristics. The first step in this initiative was to identify the nature of the peaks marking Group1 and Group2.
The most remarkable difference between the spectra identified two groups corresponding to the intensity of the two peaks 1156 and 1518, four times more important in group 1. These peaks are part of the peaks representing the carotenoids. 5 We also noticed a smaller increase of peak 957 and peak 1004, the latter being a mixed Raman peak, with contribution of carotenoids and phenylalanine ( Figure 3A). As the main peaks identified in Group1 corresponded to retinoids, we next evaluated how the expression of genes involved in the biosynthesis of vitamin A was altered in both groups. Interestingly, the expression of seven selected genes involved in this pathway did not yield to a clear segregation of the two groups as performed following RS-based analysis ( Figure 3B). This observation might indicate a non-transcriptionally dependent regulation of retinoid production in GBM and led us to further inquire the existence of a minimal gene signature correlating with the RS-based groups.
An in-depth analysis of the data presented in Figure 2 led us to identify six genes (WSDC, RIMS4, PAQR9, F13A1, CBF and RASSF9) whose expression variation robustly reflected the classification obtained using RS. The expression of these genes was then evaluated in a validation cohort of 10 tumours classified using RS ( Figure 3C) and confirmed the results obtained on the test cohort ( Figure 3D). The approach described above allowed us to demonstrate that RS-based classification of GBM correlates with specific gene expression signatures and, thus, might have a predictive value regarding tumour outcome.

| RS-based molecular signature and prediction of tumour outcome
To follow up on this analysis, we used the TCGA cohort, the genes included in the prognostic model based on the genes found to correlate with RS showed differential expression between high-risk and low-risk patients ( Figure 4A). This was also observed in the validation GBM-MARK cohort ( Figure 4B). In the latter cohort, the expression of these signature genes was also significantly lower for patients from the Raman Group 1 compared to those in the Raman Group 2 ( Table 3). The multi-gene survival model based on this molecular signature was used to stratify each cohort into high-risk and low-risk patients. In the TCGA cohort, the overall survival (OS) was significantly higher in low-risk patients (n = 56) than in high-risk patients

| DISCUSS ION
In a recent study, Riva and colleagues 10 demonstrated that Raman spectra effectively and accurately discriminated glioma tissue from healthy brain ex-vivo in fresh samples. This observation, together with the use of intraoperative Raman spectroscopy 9 to delineate the tumour boundaries, point towards RS as a relevant and versatile tool to classify and characterize brain tumours. In the present study, we have pushed forward this concept by testing the depth of RS-based brain tumour classification using transcriptome-related information as our reference. We first showed that RS effectively discriminated different types of primary brain tumours and also at least two major types of the very heterogeneous GBM. As such, we used a well-defined GBM cohort for which transcriptome data and F I G U R E 2 Correlation between GBM Raman profiles and gene expression data. (A) Raman spectra (after baseline correction, smoothing and normalization) corresponding to 28 GBM that segregate in two groups based on their respective profiles (blue and red). (B) Hierarchical clustering of patient tumours based on the corresponding Raman spectra (blue and red). (C) Heat map representation of gene expression profiles matching the groups formed based on Raman spectra. Gene profiles corresponding to the blue and red groups are indicated. (D) String-derived network comprising genes correlating with Raman Group 1 (red) and 2 (blue). Functional enrichment might be indicative of an immune infiltration in tumours from group 1. (E) Immunofluorescence analysis of tumour sections from group 1 and 2 using anti-IBA1 antibodies (staining macrophages and microglial cells; left panels, scale bar: 1 mm) and quantitation of the staining (right panel) the corresponding tumour tissues were available to test whether RS profiles corresponded to specific gene expression patterns.
As previously illustrated in many tumour types 5 and in brain tumours, 6,9,10 we first evaluated whether RS could discriminate between different types of primary brain tumours and normal brain using frozen sections. Our results indicated that not only RS allowed to discriminate brain tumours from normal tissue but also classify tumours based on their lineage (astrocytoma vs. oligodendrioglioma vs. glioblastoma [GBM]; Figure 1). In our principal components analysis, glioblastoma was classified closer to oligodendroglioma than astrocytomas. Considering the very different oncogenesis mechanisms and prognosis of these two tumoral entities, it is somehow surprising that the Raman spectra showed similar patterns. Two main hypotheses could be considered to explain this phenomenon, first, the existence of a strong oligodendroglial pattern in the GBM sam-  Some of the genes related to survival have showed to play a role in the regulation of stem cells and resistance (IL6, CFB), as well as migration and invasion (TFPI2). [20][21][22] Interestingly, F13A1 was found in both minimal genes set. Factor XIII is an enzyme, which plays a role in the stabilization of fibrin and has been showed to be associated with survival. 23,24 However, its role in glioblastoma aggressiveness has been scarcely studied. These six genes were expressed at higher level in group 2 than in group 1 and were associated a better survival with group 2 than in group 1 in two independent cohorts of patients (GBM-MARK and TCGA; Figure 4). As such, one might conclude that RS-based classification of GBM can document almost in real-time during surgery not only some characteristics of the tumour (e.g., immune infiltration) but also of its aggressiveness (as indicated by patient survival), both potentially of interest in the decision-making regarding follow-up patient handling.
In summary, this study provides the first association between gene expression and Raman profiles associated with tumour phenotypes (e.g., immune infiltrate). The approach presented in this study could, therefore, pave the way for near real-time intraoperative tumour characterization and could represent a relevant tool for helping in patient management at a very early stage.

CO N FLI C T O F I NTE R E S T
EC is a founder of Cell Stress Discoveries Ltd (https://cells tress disco veries.com/) and of Thabor Therapeutics. AC is the founder of e-NIOS Applications PC (https://e-nios.com/).

CO N S E NT FO R PU B LI C ATI O N
All the authors read and approved the current version of the manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
All transcriptomic data are available on open web portals. Raman spectra will be provided upon request.