The synergism of spatial metabolomics and morphometry improves machine learning‐based renal tumour subtype classification

Dear Editor, Althoughmass spectrometry imaging generates both morphometric and metabolomics data, they have never been combined to improve machine learning-based tumour typing. We demonstrate that the synergy of spatial metabolomics andmorphometric data improves the classification of tumours of the kidney and thus has the potential to improve artificial intelligence (AI)-based diagnostics. Tumours of the kidney are a heterogeneous group of various types of cancer with characteristic histologic or genetic features that require tumour type-specific therapies.1 Chromophobe renal cell carcinomas (chRCC) and renal oncocytomas – two tumour types that can sometimes be difficult to distinguish based on morphology alone – are associatedwith different prognosis, and the former has the potential to progress and metastasize.2,3 Both immunoncological and targeted therapies are investigated; however, immunotyping and genotyping are laborious, fall short of standardization, and immunohistochemicalmarkers have been shown to be unreliable.4,5 We used matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) because one of its greatest strengths is the ability to combine in situ mass spectrometric data with conventional histology or immunohistochemistry, making it a powerful and very useful tool for multiparametric high-dimensional multiomics analyses.6–10 This potential has also been successfully applied for biomarker discovery and machine learning-based renal tumour subtyping using unique molecular data.11–15 Our study was performed on a large patient cohort (n = 853, Table 1) and on clinically relevant FFPE tissue samples to distinguish clear cell renal cell carcinomas (ccRCC, n = 552), papillary renal cell carcinomas (pRCC, n = 122), chRCC (n = 108) and renal oncocytomas (RO, n= 71). For details about theMALDI imaging and themorphometrics analysis of H&E stained tissue sections, see the supporting information.


The synergism of spatial metabolomics and morphometry improves machine learning-based renal tumour subtype classification
Dear Editor, Although mass spectrometry imaging generates both morphometric and metabolomics data, they have never been combined to improve machine learning-based tumour typing. We demonstrate that the synergy of spatial metabolomics and morphometric data improves the classification of tumours of the kidney and thus has the potential to improve artificial intelligence (AI)-based diagnostics.
Tumours of the kidney are a heterogeneous group of various types of cancer with characteristic histologic or genetic features that require tumour type-specific therapies. 1 Chromophobe renal cell carcinomas (chRCC) and renal oncocytomas -two tumour types that can sometimes be difficult to distinguish based on morphology alone -are associated with different prognosis, and the former has the potential to progress and metastasize. 2,3 Both immunoncological and targeted therapies are investigated; however, immunotyping and genotyping are laborious, fall short of standardization, and immunohistochemical markers have been shown to be unreliable. 4,5 We used matrix-assisted laser desorption/ionization (MALDI) mass spectrometry imaging (MSI) because one of its greatest strengths is the ability to combine in situ mass spectrometric data with conventional histology or immunohistochemistry, making it a powerful and very useful tool for multiparametric high-dimensional multiomics analyses. [6][7][8][9][10] This potential has also been successfully applied for biomarker discovery and machine learning-based renal tumour subtyping using unique molecular data. [11][12][13][14][15] Our study was performed on a large patient cohort (n = 853, Table 1) and on clinically relevant FFPE tissue samples to distinguish clear cell renal cell carcinomas (ccRCC, n = 552), papillary renal cell carcinomas (pRCC, n = 122), chRCC (n = 108) and renal oncocytomas (RO, n = 71). For details about the MALDI imaging and the morphometrics analysis of H&E stained tissue sections, see the supporting information.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. Morphometric features (n = 110) describing tissue or cell compartment colour, shape or size (Table S1) and untargeted metabolomic features (n = 2,111) were used for classifier training (Figure 1). Patients were randomly split into training (2/3) and independent validation set (1/3). The data were normalized for training and validation set, separately. Feature selection was done by calculating F I G U R E 1 Workflow to analyze the synergistic effect of morphometric and molecular data on classifier performance. Eight hundred fifty-three patient tissues were analyzed with MALDI mass spectrometry imaging (MSI), followed by hematoxylin & eosin staining and morphometric image analysis. Morphometric and molecular data of tumour regions were used both separately and in synergy to train three random forest classifiers. Patients were split into training and independent validation sets, followed by data normalization, feature selection and classifier training and validation a Kruskal-Wallis test (p < 0.01) on the training sets. Details about the classifier training are in the supporting information.
The third classifier was trained on the synergy of both data sets -morphometric data and molecular dataand outperformed the two previous classifiers for each tumour subtype (Figure 2A). It reached a mean accuracy of 88.04% and F1-scores of 92.54% (ccRCC), 76.73% (pRCC), 77.15% (chRCC) and 84.65% (RO). When comparing each statistical measure, the synergistic classifier trained on both data sets almost consistently outperforms the other two ( Figure 2B).
The synergy of morphometric and metabolite data not only improves general performance, but also seems to compensate for weaknesses of the two individual classifiers trained on either morphometric or metabolite data. For instance, the metabolomic classifier performed better compared to the morphometric classifier for ccRCC (F1-scores: 91.06% vs. 86.22%), chRCC (F1-scores: 76.46% vs. 52.17%) and RO (F1-scores: 83.04% vs. 64.75%), while the morphometric classifier performed better for pRCC (F1-scores: 63.66% vs. 67.56%). The classifier trained on F I G U R E 2 Synergistic effect of morphometric and molecular data on classifier performance on renal cell carcinoma subtypes. (A and B) Classifier performance of the random forests revealing a synergistic effect of morphological and molecular data. In (B), the best performing classifier is visualized with a larger marker. The synergistic effect is best seen for papillary renal cell carcinoma (pRCC), where the performance is improved by up to 10 per cent. (C-E) Feature importance of the top 50 features for the random forest trained on morphometric data (C), metabolite data (D) and the synergy of both (E). In the latter, the top 50 comprise a mixture of both types of features The Gini importance of each feature was calculated for the three classifiers, and the top 50 features were compared ( Figure 2C-E). The top 50 features from the classifier trained on both data sets represent an even mixture of morphometric and metabolomic features ( Figure 2E). Example metabolite ion images of the top features in the classifier are illustrated in Figure S1. Interestingly, the four most important features are metabolites, and while half of the top features are morphometric, the importance of the best metabolite is twice as high as that of the top morphometric feature. This further illustrates the synergistic impact both data sets have on the classifier's performance.
Up to this point, the classifiers were trained to distinguish four different tumour subtypes, which is a much more difficult task than separating only two tumour subtypes. However, since ccRCC and pRCC are relatively easy to recognize by a pathologist using histology alone, we also tested our approach exclusively on the two remaining tumour subtypes -chRCC and RO ( Figure 3A).
The same synergistic effect can be observed for the three classifiers ( Figure 3B). On morphometric data, a mean accuracy of 81.27% is achieved, while on metabolomics data, the mean accuracy is higher with 89.49%. The synergy of the data further increases the accuracy to 91.0% with an F1-score of 92.70% for chRCC and 88.07% for RO. As these two subtypes can be histologically similar, the morphometry plays a minor role, while metabolite data are of higher importance for classification. Hence, fewer morphometric features are ranked among the top 50 (40%), but they are still beneficial for tumour subtype classification. The synergistic effect is reflected by the high ranking of morphometric features within the classifier ( Figure 3F).
Even though morphometric data are readily available in any MSI experiment, it has not been exploited to improve the predictive quality of molecular classifiers so far. This study provides evidence that the synergy of morphometric and molecular data improves renal carcinoma subtyping. Our study was performed on a large patient cohort (n = 853) and on clinically relevant FFPE tissue samples using metabolite data. The classifier trained on the combined data set or morphometric and metabolite data outperformed the classifiers trained on the individual data sets for each tumour subtype and reached an accuracy of 88.04%. Finally, the classifier was trained on chRCC and RO -two tumour subtypes that are sometimes difficult to distinguish based on histology alone -and was able to distinguish the subtypes with high accuracy (91%). In conclusion, we propose to utilize the so far unrecognized potential and synergy of computer-aided image analysis and spatial metabolomics -both types of data available in all MSI experiments -to improve AI-based diagnostics and tumour subtyping in general.