Genetic analysis of potential biomarkers and therapeutic targets in ferroptosis from coronary artery disease

Abstract Ferroptosis plays a key role in the death of cells including cardiomyocytes, and it is related to a variety of cardiac diseases. However, the role of ferroptosis‐related genes (FRGs) in coronary artery disease (CAD) is not well characterized. We downloaded CAD‐related information and FRGs from the gene expression omnibus (GEO) database and Ferroptosis Database (FerrDb) respectively. A total of 10 CAD‐related DE‐FRGs were obtained, which were closely linked to autophagy regulation and immune response. Subsequently, CA9, CBS, CEBPG, HSPB1, SLC1A4, STMN1 and TRIB3 among the 10 DE‐FRGs were identified as marker genes by LASSO and SVM‐RFE algorithms, which had tolerable diagnostic capabilities. Subsequent functional enrichment analysis showed that these marker genes may play a corresponding role in CAD by participating in the regulation of immune response, amino acid metabolism, cell cycle and multiple pathways related to the pathogenesis of CAD. Furthermore, a total of 58 drugs targeting 7 marker genes had been obtained. On the contrary, the ceRNA network revealed a complex regulatory relationship based on the marker genes. Also, CIBERSORT analysis showed that the changes in the immune microenvironment of CAD patients may be related to CBS, HSPB1 and CEBPG. We developed a diagnostic potency and provided an insight for exploring the mechanism for CAD. Before clinical application, further research is needed to test its diagnostic value for CAD.

symptoms in the early stage, only abnormal ST-T changes during exercise plate electrocardiogram examination. Therefore, it is an urgent need for biomarkers that can be detected in peripheral blood to facilitate the early detection of CAD.
Ferroptosis is an iron-dependent programmed cell death mode newly discovered in 2012, 2 with its mechanism different from apoptosis, necrosis, pyroptosis and autophagy. Ferroptosis is characterized by mitochondrial atrophy and increased mitochondrial membrane density, the accumulation of iron and lipid reactive oxygen species (L-ROS) and the involvement of a unique set of genes. 3,4 The biochemical mechanism of iron is catalysed by the formation of lipid radicals and the depletion of glutathione (GSH) or the inactivation of lipid peroxidase 4 (GPX4). 5 Circulating iron plays a key role in the development of iron death. The use of iron chelating agents can inhibit iron death induced by Erastin, and the expression of transferrin on the cell membrane also increases the sensitivity of cells to iron death. 2 Ferrostatin 1 (FER-1), a specific inhibitor of ferroptosis, significantly reduced the cardiotoxicity induced by doxorubicin (DOX) and effectively improved the survival rate in mice treated with cell death inhibition and cell death pathway-related gene knockout, revealing that ferroptosis is an important mechanism of myocardial injury. 6 In addition, high levels of L-ferritin were observed in the coronary arteries of patients with coronary heart disease, indicating iron accumulation in atherosclerotic plaques. 7 Non-transferrin-bound iron (NTBI) is thought to be the pathologic trigger of iron overload, and iron in NTBI is more likely to be utilized by various plaque cell types, including endothelial cells, macrophages and vascular smooth muscle cells. 8 The use of deferoxamine can inhibit the development of atherosclerotic lesions and reduce ferroptosis of cardiomyocytes following cardiac ischaemic-reperfusion. 9,10 Therefore, we analysed and verified the accuracy of iron death-related genes as biomarkers of CAD and their roles in the cardiac immune microenvironment via bioinformatics analysis.

| Data source
In this study, the gene expression data for CAD and normal samples were obtained from the GEO database. The GSE20680 dataset embodied a total of 139 samples, including 52 normal samples and 87 CAD samples. This dataset was considered as a training set for analysis by the main body of this research. The GSE20681 dataset containing 99 normal samples and 99 CAD samples was used to verify the expression of the marker genes. Additionally, the FRGs (n = 259) used in this study were obtained from FerrDb, and the detailed genes were shown in Table S1.
Drug Gene Interaction Database (DGIdb) was used to predict drugs targeting marker genes. Also, the structural information of the targeted drugs of the marker gene was retrieved from the DrugBank database.

| Differential expression analysis
We first extracted expression data of 237 FRGs (only 237 FRGs expressed in this dataset) in normal samples and CAD samples from the GSE20680 database. Subsequently, the student's t-test in R was used to detect the FRGs that were differentially expressed in two different samples (Table S2). Genes with p < 0.05 were considered significant.

| Functional enrichment performed in Metascape
Metascape (http://metas cape.org/) was used to analyse the potential functions associated with DE-FRGs. These analyses included Gene Ontology (GO), Reactome pathway enrichment, and Immunologic Signatures enrichment analysis (Table S3). Among them, Immunologic Signatures database used in the immunologic signatures enrichment analysis is based on the integration of the immune-related enrichment analysis of the target gene in the published literature. In addition, Reactome is a database of articles written by experts and peer-reviewed on various reactions and biological pathways in the human body.

| Identification of optimal diagnostic gene biomarkers for CAD
The least absolute shrinkage and selection operator (LASSO) algorithm was applied with the glmnet package to reduce the dimensions of the data. 11,12 The DE-FRGs between CAD patients and normal samples were retained for feature selection, and gene biomarkers for CAD were identified with the LASSO algorithms. Meanwhile, a support vector machine-recursive feature elimination (SVM-RFE) model was established with a SVM package, which was compared by the average misjudgement rates of their 10-fold cross-validations. 13 Furthermore, optimal gene biomarkers for CAD were identified by overlapping biomarkers derived from the two algorithms. The diagnostic ability of the optimal gene biomarkers was assessed by calculating the receiver operating characteristic (ROC) curve, and measuring the area under the curve (AUC), accuracy, sensitivity and specificity. Furthermore, a logistic regression model based on seven marker genes was constructed to predict the sample types in the GSE20680 dataset using the predict function through the R package glm. Similarly, the diagnostic power of the logistic regression model was evaluated using ROC curves.

| Single-gene Gene Set Enrichment Analysis (GSEA) enrichment analysis
This analysis is implemented in the GSEA (V.4.1.0) package in R. To further explore the related pathways of the seven marker genes, we calculated the correlation between the marker genes and all other genes in the GSE20680 dataset. Subsequently, all genes were sorted according to their correlations from high to bottom, and these sorted genes were considered to be the gene set to be tested. Meanwhile, the KEGG signalling pathway set was invoked as a predefined set to detect its enrichment in the gene set. Specific enrichment results of each marker gene were integrated into Table S4.

| Single-gene Gene Set Variation Analysis (GSVA) enrichment analysis
This analysis was implemented in the GSVA (V.1.38.0) package in R. GSVA is a gene set variation analysis. 14 In this study, we utilized the KEGG pathway set as the background gene set to perform GSVA analysis on each marker gene. Simultaneously, we applied the limma package to analyse the difference in GSVA score of the marker gene's high-and low-expression group samples. The difference screening condition was |t| >2, p < 0.05. If t > 0, we considered the pathway to be activated in the high-expression group, on the contrary, if t < 0, we considered the pathway to be activated in the low-expression group. Specific enrichment results of each marker gene were integrated into Table S5.

| Immune infiltration analysis
CIBERSORT, a method to characterize the cell composition of complex tissues from the gene expression profile. 15 In this study, we predicted the proportion of 22 types of infiltrating immune cell types in each tissue from the GSE20680 dataset by CIBESORT software (Table S6). For each sample, the sum of all evaluated immune cell type fractions equalled 1. 16

| Construction of ceRNA network
The starBase database was used to predict mRNA-miRNA interaction pairs based on the 7 marker genes. Meantime, RNA sequences of 7 marker genes were downloaded from National Center for Biotechnology Information (NCBI), and the human miRNA sequences were obtained from miRbase. The miranda software predicted the binding of mRNA-miRNA nucleic acid, and the binding score threshold was increased to 170 (the default was 140). Then, we searched the predicted miRNA in starBase and screened miRNA-lncRNA so that we obtained the ceRNA network of mRNA-miRNA-lncRNA.

| Statistical analysis
The comparison between the two groups used the Student's t-test.
Pearson correlation analysis was used to reveal the relationship between 10 DE-FRGs. The drawing of the Venn diagram was achieved through the Jvenn package. Cytoscape was used to visualize the ceRNA network. p < 0.05 was regarded as significant. All analyses were performed in R.

| Identification of DE-FRGs in the GSE20680 cohort
Ten of 237 FRGs were differentially expressed between CAD and normal samples, including 5 up-regulated and 5 down-regulated genes, which were identified from the GSE20680 dataset ( Table 1).
The clustering heatmap showed the expression pattern of DE-FRGs among samples ( Figure 1A). The correlation between these genes was presented in Figure 1B. HSPB1 had a negative correlation with ALOX5, CBS, CEBPG, ATG3 and HMGB1. ATG3 was positively correlated with ALOX5, CBS, CEBPG and HMGB1. Interestingly, TRIB3 and CA9 were not correlated with any DE-FRGs.

| Functional analyses for the DE-FRGs
To elucidate the biological functions and pathways that were asso-

| 7 DE-FRGs were identified as diagnostic genes for CAD
For considering the variation between CAD patients and healthy people, we aimed to estimate the diagnostic potential of DE-FRGs.
Next, we performed two distinct machine learning algorithms in the GSE20680 dataset, the LASSO and SVM-RFE, to screen the significant DE-FRGs to distinguish CAD from normal people.
LASSO logistic regression algorithm, with penalty parameter tuning conducted by 10-fold cross-validation, was used to select 9 CAD-related features ( Figure 3A,B). We then applied the SVM-RFE algorithm to filter the 10 DE-FRGs to identify the optimal combination of feature genes. Finally, 7 genes (maximal accuracy =0.759, minimal RMSE =0.241) were identified as the optimal feature genes ( Figure 3C,D). The marker genes obtained from the LASSO and SVM-RFE models were intersected, and 7 marker genes (CA9, CBS, CEBPG, HSPB1, SLC1A4, STMN1 and TRIB3) were identified for subsequent analysis ( Figure 3E).
Based on the above 7 marker genes, we constructed a logistic regression model by R package glm, and the subsequent ROC curves indicated that the 7 marker gene-based logistic regression model differentiated normal and CAD samples with AUC = 0.748 ( Figure 3F). Moreover, to elucidate the ability of individual genes in distinguishing CAD from normal samples, ROC curves were generated for the 7 marker genes. As shown in Figure 3G, the AUC for all genes was greater than 0.6. The above evidence suggested that for differentiating CAD samples from normal samples, the logistic regression model provided a superior accuracy and specificity than the individual marker genes.

| Marker genes were closely linked to a variety of CAD-related pathways
To further explore the potential function of marker genes to distinguish diseased samples from normal samples, we conducted a single-gene GSEA-KEGG pathway analysis. The top10 pathways enriched for each marker gene were illustrated in Figure 4A-G.
After a comprehensive analysis, we found that these genes were

| Immune landscape analysis
The previous results indicated that the marker genes were closely related to the immune response. Meanwhile, much pieces of evidence pointed to the inseparable connection between the immune microenvironment and CAD. [17][18][19] Therefore, we implemented the CIBERSORT algorithm to explore the differences in the immune microenvironment between CAD patients and normal samples. As shown in Figure 6A,  Table S7). These pieces of evidence indicated that changes in the immune microenvironment of CAD patients may be linked to CBS, HSPB1 and CEBPG.

| Prediction of marker gene-targeted drugs
We further revealed the drugs that may target marker genes through the DGIdb database and analysed the interaction relationship between the two parameters were set to default values; (Table S8). The results visualized by Cytoscape software were shown in (Figure 7).
We had queried 58 drugs targeting marker genes, including 17

| A ceRNA networks based on marker genes
Next, we constructed a ceRNA network based on 7 marker genes through starBase and miranda databases. The network included 408 nodes (7 marker genes, 74 miRNAs and 327 lncRNAs) and 688 edges ( Figure 8). In detail, we found that a total of 170 lncRNAs

| Expression of the marker gene in the validation set
Finally, we also verified the expression of marker genes in the GSE20681 dataset. We discovered that the expression trends of CBS, HSPB1 and STMN1 were consistent with the GSE20680 dataset. Among them, the expression of CBS (p = 0.0319) in CAD patients was greater than of normal samples, while the HSPB1 (p = 0.0279) and STMN1 (p = 0.0146) were lower in CAD samples (Figure 9).

| DISCUSS ION
Coronary atherosclerosis is a complex, persistent and progressive inflammatory disease characterized by remodelling of the coronary arteries that deliver oxygen and nutrition to the cardiac tissue. It has patients may suffer from coronary atherosclerosis for many years, even decades. 20,21 Atherosclerosis is the basic pathogenesis of CAD. Abnormal apoptosis of vascular endothelial cells (VECs), macrophages or vascular smooth muscle cells (VSMCs) is a common feature of atherosclerosis, which can lead to the formation of atherosclerotic plaques or plaque instability. 22 Endothelial cell repair after injury is associated with multiple genes, some of which are also involved in the maintenance of normal cardiac function and the formation of aortic aneurysms. [23][24][25] Autophagy is also involved in the process of coronary atherosclerosis, but its specific role is controversial. Some F I G U R E 6 Immune landscape analysis. (A) Implemented the CIBERSORT algorithm to explore the differences in the immune microenvironment between CAD patients and normal samples. (B) Pearson correlation analysis revealed that neutrophils had strong positive and negative correlations with CBS and HSPB1 respectively. CEBPG was positively correlated with monocytes (*p < 0.05, **p < 0.01) studies suggest that it is a protective factor against atherosclerosis, and that autophagy is inhibited in patients with cardiovascular disease, featured by the decrease of LC3 and Atg5 genes. 26 However, other contrary findings suggest that autophagy worsens coronary atherosclerosis. 27 It is worth noting that the relationship between coronary atherosclerosis and ferroptosis has not yet been reported.
Thus it is important for the selection of gene chip data, two kinds of disease by gene expression data analysis, enrichment of single-gene analysis, enrichment of bioinformatics methods, such as, finding a common point, to analyse genetic differences in CAD and normal ferroptosis, explore the molecular pathogenesis in CAD ferroptosis.
In the present article, a large number of gene chips most suitable for this research was selected and the use of multiple samples of genes as well as a large number of microarray data makes the experimental results more reliable and reduces the error rate, thus providing a valuable clinical reference for the treatment and prevention of CAD.
A total of 7 differential genes related to ferroptosis were After phosphorylation, the oligomer of HSPB1 is dissociated into monomer, which then acts as an inhibitor of apoptosis and an inducer of autophagy, 29,30 The GSAV analysis of HSPB1 also confirmed that the autophagy regulatory pathway was activated in the low-expression group. HSPB1 is a key regulator of ferroptosis in cancer cells, and HSPB1 is a negative regulator of ferroptosis by reducing iron-mediated production of lipid reactive oxygen species. 31 Other studies have shown that HSPB1 plays an important role in the progression of liver cancer and ferroptosis, and that HSPB1 may be regulated by transcription factor ATF3. 32 Studies have found that protease degradation is more abundant in atherosclerosis than in normal arteries. 33  CBS. H 2 S has been proved to play an important role in the process of anti-atherosclerosis. 37 Thus, CBS plays a role in both atherosclerosis and ferroptosis. Our study found that CBS is highly expressed in the CAD group, which may be a protective mechanism of the body against the disease and control the continued development of the disease.
Immune cells maintain the homeostasis of the heart and all kinds of immune cells that reside or penetrate into the heart tissue play an important role in the process of injury repair. 17 The immune cells identified in the heart include macrophages, monocytes, neutrophils, dendritic cells (DC), T and B cells, eosinophils and mast cells, which also play an important role in maintaining heart function. 38 Our analysis shows that T-cell helper and monocytes are highly expressed in the CAD group, while B-cell naive is lower than the normal group. Circulating monocytes and resident vascular macrophages are the first white blood cells known as early atherosclerotic plaques. 39 Local inflammation is caused by damaged endothelial cells that release monocyte chemotactic protein-1 (also known as C-C motifchemotactic factor ligand 2 (CCl2)). Monocyte chemokine protein-1 interacts with chemokine receptor 2 and chemokine receptor 4 expressed on circulating monocytes to recruit them to the lesions. 40,41 Migrating monocytes differentiate into macrophages, leading to inflammation and plaque development. 42 Another study shows that T-cell follicular helper promotes atherosclerosis, and its consumption reduces atherosclerosis. 43 These studies have been confirmed in our analysis.
CBS was positively correlated with neutrophils among different iron death-related genes. CBS is a key enzyme for H 2 S production in vivo, and H2S can stimulate neutrophils adhesion and neutrophils tissue infiltration. These effects are associated with the up-regulation of various adhesion receptors and proinflammatory mediators. 44,45 Other studies have found that H 2 S inhibits neutrophil tissue infiltration. 46,47 Whether the positive correlation between CBS and neutrophils in CAD patients is mediated by H 2 S remains to be verified. HSPB1 has been shown to have a cell protective effect. 48 HSPB1 is involved in a number of important physiological functions, including inhibition of cytokine expression and inhibition of neutrophil infiltration. 49 My study found that HSPB1 was negatively correlated with neutrophils, and the expression of HSPB1 was decreased in CAD patients. Therefore, HSPB1 may be a potential target for improving cardiac immune microenvironment in CAD patients.
Finally, we analysed the marker gene for gene-targeted drugs and the ceRNA network. Among the five CBS target drugs retrieved, ademetionine was confirmed to be an allosteric activator of CBS. 50 The drug has been used in mentally related diseases and liver diseases, 51,52 and its application in cardiovascular diseases has not been reported. Pyridoxal phosphate acts as a cofactor for CBS in the body, and its deficiency will lead to the loss of CBS activity. 53 The combination of the antisense oligonucleotide Apatorsen of HSPB1 mRNA and a variety of anti-cancer drugs enhances the effect of anti-cancer drugs. 54,55 As a predictive targeted drug of HSPB1, cloxyquin has been proved to play a cardioprotective effect by regulating autophagy, 56 but the study of whether HSPB1plays a role in this process has not been clarified. Non-coding RNA plays an important role in the development of atherosclerosis, miR-18a-5p, miR-27a-3p, miR199a-3p, miR-223-3p and miR-652-3p abundance and atherosclerosis Closely related to cardiovascular-related reocclusion. 57 Whether our predicted gene-targeted drugs and non-coding RNA can play a role is unclear, and the specific pathways need to be further studied. Therefore, the selected drugs and non-coding RNA can be prospectively studied.
CA9, CBS, CEBPG, HSPB1, SLC1A4, HSPB1 and TRIB3 are genes that we have screened for iron death in coronary heart disease samples. Among them, we focus on the two genes-CBS and HSPB1.
These two genes are not only related to iron death. In addition to being closely related, it may also be involved in the regulation of the cardiac immune microenvironment of CAD patients. Although gene expression may not be directly equivalent to protein expression, the biomarkers in this study should be regarded as genes, not proteins, but the significance of its research cannot be denied. We will continue to pay attention to these genes to deepen our understanding of the pathogenesis and treatment of coronary heart disease.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated and/or analysed during the current study are available in the following locations: the gene expression data for CAD and normal samples were obtained from the GEO database: ttp://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE20680 and http://www.ncbi.nlm.nih.gov/geo/query/ acc.cgi?acc=GSE20681.