Identification of a metabolic gene panel to predict the prognosis of myelodysplastic syndrome

Abstract Myelodysplastic syndrome (MDS) is clonal disease featured by ineffective haematopoiesis and potential progression into acute myeloid leukaemia (AML). At present, the risk stratification and prognosis of MDS need to be further optimized. A prognostic model was constructed by the least absolute shrinkage and selection operator (LASSO) regression analysis for MDS patients based on the identified metabolic gene panel in training cohort, followed by external validation in an independent cohort. The patients with lower risk had better prognosis than patients with higher risk. The constructed model was verified as an independent prognostic factor for MDS patients with hazard ratios of 3.721 (1.814‐7.630) and 2.047 (1.013‐4.138) in the training cohort and validation cohort, respectively. The AUC of 3‐year overall survival was 0.846 and 0.743 in the training cohort and validation cohort, respectively. The high‐risk score was significantly related to other clinical prognostic characteristics, including higher bone marrow blast cells and lower absolute neutrophil count. Moreover, gene set enrichment analyses (GSEA) showed several significantly enriched pathways, with potential indication of the pathogenesis. In this study, we identified a novel stable metabolic panel, which might not only reveal the dysregulated metabolic microenvironment, but can be used to predict the prognosis of MDS.

pro-inflammatory of bone marrow microenvironment and so on. 3,4 At present, the MDS Revised International Prognostic Scoring System (IPSS-R) is one of the gold standards for risk stratification and prognostic assessment in MDS patients, in which, patients are categorized into five well-defined risk groups according to platelet count, haemoglobin levels, absolute neutrophil count (ANC), marrow blast percentage and cytogenetics. 5,6 Although patients in intermediate-risk group are reported to have an intermediary survival, it is possible that the disease course might vary, with variable outcome actually. 7 In the meantime, MDS lacks a diversified prognostic classification system at present. Therefore, identification of more diversified prognostic models would better guide therapeutic decisions, further assisting to design more perfect clinical trials.
Furthermore, MDS is a stem cell-derived disorder affecting multiple lineages. 8 MDS stem cells with CD123 + have been reported to have higher levels of protein synthesis and change cellular energy metabolism, 9 which are similar with AML. 10,11 The anti-leukaemia mechanism of B cell lymphoma 2 (BCL-2) inhibitor (venetoclax) combined with demethylated drugs (azacytidine) is the eradication of LSCs by disrupting the tricarboxylic acid (TCA) cycle for further and durable remissions for older AML patients. 12 Moreover, isocitrate dehydrogenase 2 (IDH2) enzyme inhibitor has been approved by US Food and Drug Administration (FDA) in 2017 for refractory or relapsed AML patients by targeting tumour energy metabolism for.
BM microenvironment is vitally involved in the pathogenesis of MDS according to the 'seed soil' theory, which consists of cellular compo-

nents (haematopoietic cells and stromal cells at various stages) and
non-cellular components (metabolites, cytokines, hormones and angiogenic factors). 13 Leukaemia cells use oxidative phosphorylation for survival, while HSCs depend on glycolysis for energy production. 12 Leukaemia cells are likely to uptake mitochondria from stromal cells by endocytosis. 14 As a consequence, metabolism plays key roles for non-cellular components. Accumulative studies have revealed that the relationship between pathogenesis, treatment and metabolism of MDS recently. Therefore, we established a prognostic panel of metabolic gene by downloading data from Gene Expression Omnibus (GEO) datasets in the training cohort, which was further validated in an independent external cohort. In conclusion, we constructed a metabolic panel to predict the prognosis of MDS and revealed that metabolism played significant roles in the prognosis of MDS.

| Data collection
The mRNA expression profiles and relevant clinical information were downloaded from GSE58831 15 and GSE11 4922 16 datasets from the GEO database. The metabolic gene sets utilized as the candidate metabolic gene lists were retrieved from 'c2.cp.kegg. v7.0.symbols' in gene set enrichment analysis (GSEA). In addition, perl scripts were used to retrieve metabolic genes for further analysis.

| Identification of differentially expressed (DE) mRNA in MDS
Transcripts per million normalization and log2 transformation were performed on the expression profiles. DE analysis was conducted on 861 annotated metabolic-related genes with protein coding functions by the Limma. 17 The expression pattern of metabolic genes was examined in training cohort. Genes were subjected to prognostic analysis in the case of consistent expression pattern in training cohort and independent external cohort.

| Establishment of the prognostic metabolic gene panel
GSE53381 dataset was used as the training cohort to construct metabolic risk panel. The LASSO regression penalizes the data fitting criteria in a way that eliminates less informative predictor variables to yield simpler and more interpretable models. Therefore, the metabolic panel was constructed according to the penalized maximum likelihood estimator with 1000-fold cross-validation.
The least criteria of the penalized maximum likelihood estimator were employed to determine the optimal values of penalty parameter λ. In addition, GSE11 4922 dataset served as an independent external validation cohort. The unified formula determined in the training cohort was used to generate the metabolic risk score in every patient, who were further categorized into high-and lowrisk groups according to the median metabolic risk score.

| Independence of the prognostic panel
Univariate and multivariate forwarding stepwise Cox regression analyses were conducted in both training and validation cohorts. A P < .05 indicated statistical significance.

| Statistical analysis
Time-dependent receiver operating characteristic (ROC) curve was performed to assess the predictive performance of metabolic signature in the raining and validating cohorts, followed by calculation of area under the curve (AUC) using survival ROC package. Overall survival (OS) was defined as the primary outcome, which was calculated as the date of the study entry until death due to all causes. Kaplan-Meier curve was plotted by 'survival' package, followed by comparison by log-rank test. Univariable and multivariable Cox analyses were used to evaluate the prognostic performance of clinical and genetic features. Categorical variables were compared by chi-square test or Fisher's exact test. SPSS ® version 24.0 (IBM) and R software (version 3.6.0) were used for statistical analysis. A two sided P < .05 indicated statistical significance.

| Establishment and validation of the prognostic metabolic gene panel
Among the 861 metabolic genes subjected to DE analysis by the Limma, 140 genes were differently expressed between healthy sample and MDS sample ( Figure 1A,B). Further, the prognostic values of these 140 genes were analysed via Univariate Cox regression analysis. Ultimately, 22 genes that were differentially expressed as well as survival-related (P < .05) were identified.
( Figure 1C). Afterwards, the Lasso-penalized Cox analysis regression was used to select the most useful predictive genes from the 22 genes. A penalized maximum likelihood estimator was performed with 1000 bootstrap replicates. The regularization parameter lambda was used to identify the optimal weighting coefficients via the least criteria (Figure 2A,B). Afterwards, 15 metabolic genes were selected and the coefficient was estimated to construct the

| Independent prognostic role of the metabolic gene panel
The

| Association between the metabolic risk level and clinicopathological features
In total, 201 patients with complete clinical data including age, gender, WHO category, karyotype, IPSS, transfusion dependent, haemoglobin, bone marrow blasts cells, platelet count and absolute neutrophil count were included in the training and validation cohort. High-risk patients were associated with male, higher numbers of bone marrow blast cells, higher IPSS score and lower absolute neutrophil count (Table 1) Figure 6A,B.

| GSEA
GSEA identified 36 significantly enriched KEGG pathways in the training or validation cohort. The majority of the metabolism-associated pathways were enriched in the low-risk group, and the metabolic pathways ranked by NES were cysteine and methionine metabolism, glycine serine and threonine metabolism, fatty acid metabolism and pyrimidine metabolism. On the contrary, the majority of the non-metabolism-associated pathways were enriched in the high-risk group.
Additionally, most enriched pathways were correlated with cancer (such as the cell cycle and phosphatidylinositol signalling system) or metabolism (such as the glycine serine and threonine metabolism, cysteine and methionine metabolism) ( Figure 7A,B).

| External validation using online database
The mutation variants of the metabolic gene panel were explored in CCLE database by the cBioportal for Cancer Genomics. 18,19 As was expected, the gene amplification, which can change  Fifteen metabolic genes were identified to construct the metabolic model, most of which have been reported to be involved in malignancy. PFKFB2, a vital regulator of glucose metabolism, has been defined as a candidate gene for GC-triggered apoptosis according to comparative expression profiling in childhood acute lymphoblastic leukaemia (ALL). 23 Interestingly, PFKFB2 was suppressed by miR-613 in gastric cancer, which could further inhibit cell proliferation and invasion. 24 The expression pattern is reported for the first time as a potential marker in MDS PLCB2, involved in inositol phosphate metabolism, has been narrowly linked to the poor prognosis in patients with hepatocellular carcinoma, lung cancer and mammary carcinoma. 25 In our study, PLCB2 was negatively correlated with study should also be acknowledged. Firstly, we are unavailable to more clinical information due to the data driving from GEO database.

| D ISCUSS I ON
Secondly, the significance of the metabolic panel should be further confirmed in real-world research, and further basic experiments are simultaneously necessary to explore the underlying pathogenesis.
In summary, we constructed a novel prognostic prediction model based on metabolic genes from GEO database for MDS, and further validated in the validation cohort. The prognostic model was not only an independent prognostic predictor for MDS but also reflected the disordered metabolism of MDS. Moreover, this panel could be utilized as an effective approach for prognostic prediction in MDS patients in clinical practice.

ACK N OWLED G EM ENTS
The authors sincerely thank GEO for sharing the huge amount of data.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The GSE58831 and GSE11 4922 datasets were collected via the Gene Expression Omnibus (GEO) database, which were utilized for retrieving clinicopathological data and RNA expression patterns. All data or code generated or used during this study are available from the corresponding author by reasonable request.