Identification of MYH6 as the potential gene for human ischaemic cardiomyopathy

Abstract The present study aimed to explore the potential hub genes and pathways of ischaemic cardiomyopathy (ICM) and to investigate the possible associated mechanisms. Two microarray data sets (GSE5406 and GSE57338) were downloaded from the Gene Expression Omnibus (GEO) database. The limma package was used to analyse the differentially expressed genes (DEGs). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, Disease Ontology (DO) and Gene Ontology (GO) annotation analyses were performed. A protein‐protein interaction (PPI) network was set up using Cytoscape software. Significant modules and hub genes were identified by the Molecular Complex Detection (MCODE) app. Then, further functional validation of hub genes in other microarrays and survival analysis were performed to judge the prognosis. A total of 1065 genes were matched, with an adjusted p < 0.05, and 17 were upregulated and 25 were downregulated with|log2 (fold change)|≥1.2. After removing the lengthy entries, GO identified 12 items, and 8 pathways were enriched at adjusted p < 0.05 (false discovery rate, FDR set at <0.05). Three modules with a score >8 after MCODE analysis and MYH6 were ultimately identified. When validated in GSE23561, MYH6 expression was lower in patients with CAD than in healthy controls (p < 0.05). GSE60993 data suggested that MYH6 expression was also lower in AMI patients (p < 0.05). In the GSE59867 data set, MYH6 expression was lower in CAD patients than in AMI patients and lower in heart failure (HF) patients than in non‐HF patients. However, there was no difference at different periods within half a year, and HF was increased when MYH6 expression was low (p < 0.05–0.01). We performed an integrated analysis and validation and found that MYH6 expression was closely related to ICM and HF. However, whether this marker can be used as a predictor in blood samples needs further experimental verification.


| INTRODUC TI ON
Ischaemic cardiomyopathy (ICM) refers to the failure of the heart to pump blood normally due to myocardial damage caused by ischemia, and ICM is the leading cause of death globally according to the WHO. In addition, ICM is also the common cause of heart failure (HF) in the developed world. 1 Coronary artery disease (CAD) is one of the most common ischaemic cardiomyopathy diseases and is caused by coronary artery stenosis and myocardial insufficiency. 2 Compared with a nonischaemic aetiology, heart failure secondary to ICM has been shown to be independently associated with mortality. 3 In view of the high mortality rates caused by CAD and HF, the prevention and timely treatment of ICM are particularly important. 4 With the development of bioinformatics analysis and highthroughput sequencing technology, many sequencing data have provided notable results to identify the hub genes, interaction networks and pathways of ICM. As a complex multifactorial disease, ICM can be caused by genetic and environmental factors and their interactions, and many previous studies have been conducted on ICM from different aspects, especially at the genome level. 5 However, most of these analyses have focused on a certain aspect of ischaemic cardiomyopathy, for instance, several studies have paid more attention to the aetiologic aspect, and others have focused on the pathogenic aspect, which lacks systematic analysis. 6 In this way, to investigate the molecular mechanisms of ICM pathogenesis in depth, we first downloaded expression profile data related to CAD and acute myocardial infarction (AMI) from the Gene Expression Omnibus (GEO), and after systematic analysis, we selected potential genes for CAD pathogenesis. Then, we performed validation in several different data sets to further screen out the hub genes to identify the mechanisms of these hub genes in the pathogenesis of AMI. Finally, we validated the expression levels of these hub genes in additional datasets in relation to the onset of HF after AMI to identify the causative genes for ICM.

| Microarray data sets
A total of five microarray datasets were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/ geo/). The five microarray datasets were GSE5406, 7 GSE57338, 8 GSE23561, 9 GSE60993 10 and GSE59867, 11 among which GSE5406 and GSE57338 were mainly used for data analysis to identify hub genes related to ICM, while the other datasets were used for validation. GSE5406 was retrieved from the GPL96 Affymetrix Human Genome U133A Array. In this data set, totally 124 subjects (including 108 related to ICM and 16 controls) were analysed. GSE57338 was retrieved from the GPL11532 Affymetrix Human Gene 1.1 ST Array. A total of 231 subjects (including 95 related to ICM and 136 controls) were chosen for further analysis. Samples of both of the above datasets were derived from heart tissue. The Affy package in R 12 was used to transform CEL files into an expression value matrix, and the matrix was normalized using the RMA method. Then, we converted the probe data to genes with the Bioconductor package in R software. 13 When multiple probes corresponded to a gene, the Arithmetic mean expression value of the probe was chosen for further analysis. GSE23561, GSE60993 and GSE59867 were used as the datasets set for validation, and the analysis methods were the same as those above.
We compared ICM subjects with healthy controls to identify the differentially expressed genes (DEGs) with the limma package in R. 14 Then, we set|log 2 (fold change)|≥1.2 and adjusted p < 0.05 as the threshold for DEGs. Subsequently, we employed DOSE 15 and the clusterProfiler 16 package in R to perform the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Disease Ontology (DO) and Gene Ontology (GO) analyses for DEGs. An adjusted p-value (Qvalue) of <0.05 was regarded as statistically significant. In addition, in our analysis, we focused our attention on hits with Q < 0.05 and, to avoid very general sets, limited our final list of hits to examine pathway sets that annotated fewer than 200 genes.

| Protein-protein interaction analysis (PPI) network construction and module analysis
The STRING database (version 11.0) 17 is used to detect predicted and experimental interactions in a protein database, and we employed it to identify and predict protein-protein interactions for the DEGs in this study. The database uses several prediction methods, such as cooccurrence, gene fusion, coexpression experiments, databases, neighbourhoods and text mining. In addition, the protein pair interactions are shown with the combined fraction in the database. In our current study, a combined score >0.9 18 was set as a cut-off value. We used degrees to reveal the roles of protein nodes in the network. We defined the key protein networks as network modules and speculated that they may have a specific biological effect in previous studies. Subsequently, Cytoscape (version 3.71) software and the Molecular Complex Detection (MCODE) app 19,20 were employed to detect the major and most notable clustering modules. For further subsequent analysis, we set EASE ≤0.05 and count ≥2 as the cut-off value and MCODE score >8 as the threshold.

| Validation of interest and survival analysis
Initially, we wanted to further understand the relationships between hub genes and several diseases related to ICM and downloaded GSE23561 for the training set. Next, we compared the relative expression levels of hub genes between healthy subjects and patients with acute coronary syndrome in GSE60993.
Subsequently, in GSE59867, we compared the relative expression levels of hub genes in patients with CAD and acute myocardial infarction (AMI) and AMI patients at different time points who suffered from heart failure. We used the ggplot2 package 21 to compare the expression differences, and the 'survival' package 22 in R was used to perform overall survival (heart failure) and diseasefree survival analyses. Patients were divided into two groups (high vs. low) based on the hub gene expression level in comparison with the mean expression level of that hub gene. A Kaplan-Meier survival plot was also constructed.

| Data preprocessing
Before analysing GSE5406 and GSE57338, we first judged the quality of these two samples. Figure S1 shows that after quality control, all of the samples were well normalized. We obtained 54,560 expression probes separately from each gene expression profile, and the expression matrices of 19,537 genes were obtained from GSE5406, while 18,334 genes were obtained from GSE57338.

| Identification of differentially expressed genes and functional annotation
Upon comparing the case and control samples, we obtained a total of 1326 items with adjusted p < 0.05, though only 8 were upregulated and 7 were downregulated with|log 2 (fold change)|≥1.2 in GSE5406, while among 8922 items, 9 were upregulated and 11 were downregulated in GSE57338. All of these DEGs are shown in Table 1, and the heatmaps and volcano plots are shown in Figure 1.  The genes related to these items were selected for further analysis.
The details of these items of KEGG analysis can be found in Table 2.

| PPI network construction and the identification of hub genes
There were 1065 genes duplicated in these two data sets, with adjusted p < 0.05 (Figure 4), and we used the STRING database to elucidate the gene-gene interaction network of these selected genes. When the cut-off was set as a combined score >0.9, in total of 6625 protein pairs and 868 nodes were included. Figure 5A shows the net analysis in Cytoscape. Only three modules with a score >8 were found and are shown in Figure 5B-D for detection using the MCODE app.
After integrating the above analysis, we found that only myosin heavy chain 6 (MYH6) could satisfy the requirement at the same time, as it was differentially expressed, was related to the ICM functional pathway and was in the MCODE-enriched module. Next, we validated the function of MYH6 through several different microarray data sets to further understand its relationship with ICM.

| Hub gene validation
First, we validated the expression of MYH6 in GSE23561. This data set contained several ICM-related diseases. As shown in Figure 6A Figure 6B).
In GSE59867, we also found that MYH6 expression levels in AMI patients were lower than those in patients with CAD (p < 0.05) ( Figure 7A). Subsequently, we compared the gene expression levels at different times (1 day, 4-6 days, 30 days and 180 days) after myocardial infarction. We found that there was no difference in MYH6 expression within six months after myocardial infarction (p > 0.05) ( Figure 7B). However, during the 180day follow-up period, we found that MYH6 expression was lower in patients with heart failure than in those without heart failure (p < 0.01) ( Figure 7C). A survival analysis of heart failure events after myocardial infarction was performed. As shown in Figure 7D, we found that the incidence of heart failure was higher when the

| DISCUSS ION
The core of ischaemic cardiomyopathy is myocardial ischemia, the most common cause of which is coronary artery disease. Due to the narrowing of the coronary arteries, the blood supply to the myocardium is insufficient to meet the metabolic demands of the myocardial cells, causing myocardial necrosis, which gradually leads to myocardial fibrosis and heart enlargement, eventually progressing to heart failure. 23,24 In our current study, we found that the expression of MYH6 was lower in CAD patients than in healthy controls, lower in AMI patients than in healthy controls, lower in CAD patients than in AMI patients and lower in HF patients than in non-HF patients. However, there was no difference in the relative expression of MYH6 in AMI at different periods within half a year, and the incidence of heart failure increased when MYH6 expression was low.
Cardiac muscle myosin is a hexamer consisting of two heavy chain subunits, two light-chain subunits and two regulatory sub-

| CON CLUS ION
Two microarray data sets from GEO were systematically analysed.
After functional enrichment and protein-protein interaction analyses, only one gene (MYH6) was ultimately identified. When validated with another microarray data set, we found that the alteration in MYH6 expression was lower in CAD patients than in healthy controls, lower in AMI patients than in healthy controls, lower in CAD patients than in AMI patients and lower in HF patients than in non-HF patients. However, there was no difference in the relative expression of MYH6 in AMI patients at different periods within half a year, and the incidence of heart failure increased when MYH6 expression was low. The specific mechanism may be related to ventricular remodelling and myocardial fibrosis caused by MYH6, but further largescale experiments are needed to verify and elaborate on the specific mechanism.
F I G U R E 5 Protein-protein interaction analysis. (A) Protein-protein interaction network of the module genes. Each edge indicates the interaction between two genes. A colour scale was used to indicate the importance of protein nodes in the network; for example, dark red represents a high degree of importance, and light red represents a low degree importance.