Identification of competitive endogenous RNAs network in breast cancer

Abstract Background MiRNAs can regulate gene expression directly or indirectly, and long noncoding RNAs as competing endogenous RNA (ceRNAs) can bind to miRNAs competitively and affect mRNA expression. The ceRNA network is still unclear in breast cancer. In this study, a ceRNA network was constructed, and new treatment and prognosis targets and biomarkers for breast cancer were explored. Methods A total of 1 096 cancer tissues and 112 adjacent normal tissues to cancer from the TCGA database were used to screen out significant differentially expressed mRNAs (DEMs), lncRNAs (DELs), and miRNAs (DEMis) to construct a ceRNA network. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were used to predict potential functions. Survival analysis was performed to predict which functions were significant for prognosis. Results From the analysis, 2 139 DEMs, 1 059 DELs, and 84 DEMis were obtained. Targeting predictions for DEMis‐DELs and DEMis‐DEMs can yield 26 DEMs, 90 DELs, and 18 DEMis. We performed GO enrichment analysis, and the results showed that the upregulated DEMs were involved in nucleosomes, extracellular regions, and nucleosome assembly, while the downregulated DEMs were mainly involved in Z disk, muscle contraction, and structural constituents of muscle. KEGG pathway analysis was performed on all DEMs, and the pathways were enriched in retinol metabolism, steroid hormone biosynthesis, and tyrosine metabolism. Through survival analysis of the ceRNA network, we identified four DEMs, two DELs, and two DEMis that were significant for poor prognosis. Conclusions This study suggested that constructing a ceRNA network and performing survival analysis on the network could screen out new significant treatment and prognosis targets and biomarkers.


| INTRODUCTION
Breast cancer is one of the most common types of cancer worldwide and is the second leading cause of death in American women. 1,2 The prevalence of breast cancer is increasing. 3 Mammographic density, body fat, and diet are the risk factors for breast cancer. 4,5 The 12th St Gallen International Breast Cancer Conference proposed that breast cancer be divided into four subtypes using immunohistochemical definitions of estrogen and progesterone receptors, the human epidermal growth factor receptor 2 (HER2), and the Ki-67 labeling, index including Luminal A, Luminal B, HER2, and Triple negative (ductal) subtypes. 8 The treatment methods include surgery, radiation therapy, endocrine therapy, chemical therapy, and bio-targeted therapy. In recent years, targeted therapy has become the research focus. However, certain patients have high recurrence rates and drug resistance, the mechanisms of which are still unclear.
Currently, approximately 60% of breast cancers are first detected through mammographic screening, 9 but several studies have suggested that cancers diagnosed by mammographic screening may represent overdiagnosis. 10,11 It is urgent to find novel diagnostic and prognostic biomarkers and new targeted therapies for breast cancer.
The competing endogenous RNA (ceRNA) hypothesis revealed a new mechanism for interactions between RNAs. 12 MicroRNAs play an important role in the development of tumors, the 5' regions of which can bind to sequences with partial complementarity on the target RNAs' 3'UTRs, called microRNA recognition elements, usually inhibiting the expression of the target genes. 13,14 The hypothesis supposed that a certain concentration of ceRNAs could bind to miRNAs competitively to regulate the expression of the target mRNAs. The RNA-microRNA regulatory pathway includes microRNAs and the transcriptome (the protein coding genes, pseudogenes, and long noncoding RNAs [lncRNAs]). It has been reported that the pseudogene PTENP1 could bind to and compete for microRNAs, increasing the expression levels of PTEN. 15 Mouse breast cancer Pbcas4 is the pseudogene of human breast cancer AS4 that might regulate the expression of breast cancer through binding mir-185. 16 LncRNAs are more than 200 nucleotides in length and play an important role in many biological processes. 17 Li et al reported that SNHG7 might act as a ceRNA to regulate GALNT7 expression by inhibiting miR-34a in CRC cell lines. 18 In addition, it was found that NEAT1 might as a ceRNA for hsa-miR-377-3p, regulating its endogenous targets E2F3 in NSCLC. 19 There is increasing evidence that ceRNA might be important in the progression of cancer. In this study, we used the TCGA database to obtain normal and tumor samples of breast cancer, and the edgeR package was used to detect the differentially expressed mRNAs (DEMs), lncRNAs (DELs), and miRNAs (DEMis). We then constructed ceRNA networks and performed an overall survival (OS) analysis. In addition, we further predicted the potential functions and regulatory pathways by performing gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis.

| Gene data
The RNA sequence data of 1 208 cases of breast cancer were extracted from the TCGA database (https://portal. gdc.cancer.gov/). The 1 208 samples include 1 096 cancer tissues and 112 adjacent normal tissues to cancer. All file data were downloaded using the GDC Data Transfer Tool.
Since the data come from the TCGA database, no further approval was required from the Ethics Committee. Gene expression dataset GSE96670 was downloaded from GCBI (https://www.gcbi.com.cn/gclib/html/index, version 1.4.2).

| Data processing
The RNA and miRNA sequence data were derived from the Illumina HiSeq_RNASeq and the Illumina HiSeq_miRNASeq sequencing platforms. The RNA sequences included mRNA sequences and lncRNA sequences, and we mainly used Perl and R language to analyze and interpret the RNA data.

| Identification of mRNAs (DEMs), lncRNAs (DELs), and miRNAs (DEMis)
We was used to analyze the differential expression of mRNAs, lncRNAs, and miRNAs. P < 0.01 and |logFC| ≥ 2 were set as the cutoff criteria. The differential expression analysis of the GSE96670 dataset was performed using the GCBI website.

| Construction of a ceRNA network
First, we predicted interactions between lncRNA and miR-NAs using the miRcode database (http://www.mircode.org/ index.php). Next, the miRDB, mirTarBase and Target Scan databases were used to retrieve miRNA targeting mRNAs. Finally, a ceRNA network was constructed using the Cytoscape 3.5.1 online website (http://www.cytoscape.org/).

| GO and KEGG pathway enrichment analysis
Gene ontology enrichment analysis was performed with The Database for Annotation, Visualization and Integrated Discovery (DAVID, https://david.ncifcrf.gov/summary. jsp), and FDR < 0.05 was set as the cutoff criterion. KEGG pathway analysis was applied by an R package (http:// www.bioconductor.org/packages/release/bioc/html/DOSE. html, version 3.6.1), and the threshold was P < 0.05. GO and KEGG pathway analysis were used to predict potential functions.

| Survival analysis
The Survival package of R (https://CRAN.R-project.org/ package=survival, version 2.42-6) was used to perform survival analysis of DEMs, DELs and DEMis from the network, setting P < 0.05 as the cutoff criterion.

| Cell lines
The MCF-7 and BCAP-37 human breast carcinoma cell lines were obtained from the Chinese Academy of Sciences Committee from the Type Culture Collection Cell Bank (Shanghai, China). All cell lines were cultured in DMEM (HyClone) supplemented with 10% FBS (Gibco-BRL, Gaithersburg, MD) and were maintained in a humidified atmosphere containing 5% CO 2 at 37°C.

| RNA extraction and qRT-PCR analysis
Total RNA was isolated using TRIzol reagent (Invitrogen) according to the manufacturer's instructions. Quantitative realtime PCR (qRT-PCR) assays were performed using SYBR Premix Ex Taq (Vazyme, Nanjing, China) and were carried out in an ABI7500 RT-PCR System (Applied Biosystems, Foster City, CA). The primer sequences are listed in Table 6.
All the experiments were performed in triplicate.

| Cell transfection
The siRNAs of human LINC00536 were synthesized by GeneWiz (Beijing, China). The siRNA lentivirus

| Cell proliferation
After transfection, cells were seeded into each well of 96-well plates at a density of 2 000 cells per 100 μL.  Cell proliferation was measured using CCK-8 (Dojindo Laboratory, Kumamoto, Japan) following the manufacturer's protocol. The reagent was added in the media and was incubated for 24, 48, 72, and 96 hours. Aliquots were taken, and the absorbance of each well was measured at a 450-nm wavelength.

| Identification of DELs, DEMs, and DEMis
There were 1 096 cancer tissues and 112 adjacent normal tissues to cancer. EdgeR was applied to identify the differentially expressed DEMs, DELs, and DEMis (P < 0.01). From a total of 2 139 DEMs, 1 375 upregulated mRNAs, and 763 downregulated mRNAs (Table 1, Figures 1A and 2B), 842 upregulated DELs and 217 downregulated DELs ( Table  2, Figures 1B and 2B), and 65 upregulated DEMis and 19 downregulated DEMis (Table 3, Figures 1C and 2C) were detected. The top 10 from each list are shown in the Tables. Full information for the DEMs, DELs, and DEMis is shown in Tables S1-S3.

| GO and KEGG pathway enrichment analysis
To better understand the potential functions of the identified genes, we performed GO and KEGG pathway enrichment analysis. We imported the up-and downregulated DEMs to the DAVID and GO analysis platforms and then selected the top 10 most enriched GO terms. Four biological processes, five cellular components, and one molecular function were identified for the upregulated DEMs ( Figure 3A), and three biological processes, five cellular components, and two molecular functions were identified for the downregulated DEMs ( Figure 3B). For upregulated DEMs, the biological processes included nucleosome assembly, telomere organization, chromatin silencing at rDNA and DNA replicationdependent nucleosome assembly; the cellular components included nucleosome, extracellular region, extracellular space, nuclear nucleosome, and nuclear chromosome; and the molecular function included protein heterodimerization activity. For the downregulated DEMs, the biological processes included muscle contraction, muscle filament sliding, and cardiac muscle contraction; the cellular component included Z disk, sarcomere, I band, sarcolemma, and M band; and the molecular function included structural constituents of muscle and actin binding. The KEGG pathway analysis elucidated the potential biological functions, which were applied using the Iranges package (P < 0.05). The results showed that the DEMs were enriched in retinol metabolism, steroid hormone biosynthesis, and tyrosine metabolism (Figure 4).

| Target prediction and construction of a ceRNA regulatory network in breast cancer
Based on the Mircode database, we compared the DELs and DEMis, and the results revealed 323 DEMis-DELs interactions, with 18 DEMis targeting 90 DELs (Table 4). We next conducted target prediction for 18 modified DEMis in all

F I G U R E 4 (A-B) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment of differentially expressed mRNAs (DEMs)
(P < 0.05) three databases (miRTarBase, miRDB, and TargetScan) and obtained 506 target genes (Table S4), which intersected with 2 138 DEMs, resulting in 26 DEMs ( Figure 5). We then found that 26 DEMs were targeted by 12 DEMis (Table 5). To detect the significant relationships between the three intuitively, we constructed the ceRNA network using Cytoscape, which included 26 DEMs, 90 DELs, and 18 DEMis ( Figure S1).
Breast cancer is divided into four subtypes using the TCGA database: Ductal and Lobular Neoplasms, Cystic, Mucinous and Serous Neoplasms, Complex Epithelial Neoplasms, and other subtypes. Different breast cancer subtypes have different strategies for treatment, therefore the ceRNA networks of different subtypes were performed by using the same methods ( Figures S2-S5). LGALS8

DEMs, and DEMis in the ceRNA network
To explore the significance of the DELs, DEMs, and DEMis in the network, we investigated how their respective expression levels related to breast cancer patient survival using the Survival package of R (P < 0.05 was the threshold). The expression levels of LINC00536 (P = 8.1918e-04) and ADAMTS9-AS1 (2.94e-03) were associated with poor OS, as well as KPNA2 (P = 4.525e-02), NTRK2 (P = 1.579e-02), SFRP1 (P = 2.504e-02), SPRY2 (P = 4.432e-02), mir-204 (P = 3.251e-02), and mir-301b (P = 8.862e-04) ( Figure 6). Many studies have reported that mir-204 is significant for breast cancer. From the network, we predicted that LINC00536 and CDH2 could combine with mir-204 competitively from the ceRNA network. Using GraphPad Prism to analyze the expression levels of LINC00536, CDH2, and mir-204, the results indicated that the expression levels of LINC00536 and CDH2 were increased in breast cancer ( Figures 7A,B and 8A,B), while mir-204 expression was decreased ( Figures 7C and 8C). It can be speculated that the high expression of LINC00536 could bind to mir-204, repressing the binding of CDH2 to mir-204, thereby increasing the expression of CDH2. In addition, the LINC00536, CDH2, and mir-204 expression levels in the GSE96670 dataset from GCBI website were consistent with the ceRNA network that we constructed.

| Effects of LINC00536 expression on breast cancer cells
To explore the function of LINC00536 in breast cancer cells, we performed qRT-PCR in MCF-7 and BCAP-37 cells and in the normal breast cell line MCF-10A. LINC00536 expression levels were upregulated in MCF-7 and BCAP-37 cells ( Figure 9A). We then transfected LINC00536-siRNA1, LINC00536-siRNA2, and LINC00536-siRNA3 into MCF-7 and BCAP-37 cells. LINC00536 expression was significantly F I G U R E 5 Venn diagram of differentially expressed mRNAs (DEMs) involved in the competing endogenous RNA (ceRNA) regulation network. The mRNAs expressed in the red area are all the DEMs in breast cancer. The target number is in the blue area, and the purple area represents the DEMs that are located in both the differential expression and target groups T A B L E 5 26  decreased in LINC00536-siRNA-transfected cells compared with si-NC (P < 0.001) ( Figure 9B). LINC00536-siRNA1 was selected for the subsequent assays. CCK-8 proliferation assays suggested that suppression of LINC00536 in MCF-7 and BCAP-37 cells markedly promoted cell proliferation ( Figure 9C,D). To confirm the interaction between LINC00536, CDH2, and mir-204 in the ceRNA network in breast cancer cells, we detected the expression levels of  Figure 9E), whereas mir-204 expression was increased in transfected cells ( Figure 9F). These findings were consistent with the ceRNA regulatory network that we constructed.

| DISCUSSION
Although the 5-year survival rate of breast cancer has improved in the past 10 years, breast cancer is still the main cause of cancer death in Chinese women. 20 It is imperative to find more sensitive biomarkers and effective therapeutic targets.
In the ceRNA network constructed in this study, we found 90 DELs and 26 DEMs that were targeted by 18 common DEMis (Tables 4 and 6). Survival analysis of the genes in the network identified significant poor prognosis genes, including LINC00536, ADAMTS9-AS1, KPNA2, NTRK2, SFRP1, SPRY2, mir-204, and mir-301b.
Many studies have reported that a low level of miR-204 expression was associated with tumor progression in breast cancer. 21,22 Mir-204 can directly target FOXA1 to regulate  21 and can affect tumor angiogenesis by regulating the expression of ANGPT1/ TGFBR2. 22 Chang et al suggested that miR-130b-5p from the miR-301b-130b cluster repressed the cyclin G2 gene in breast cancer cells. 25 KPNA2 has been reported to regulate the subcellular localization of key proteins such as BRCA1, BARD1, PIAS1, RAD51, and CHK1, which are associated with poor prognosis. 26 Howe et al proved that NTRK2 is directly targeted and downregulated by miR-200c, thereby affecting the progression of EMT in breast cancer. 27 It was reported that low expression of SFRP1 may activate the Wnt/β-catenin signaling pathway to promote the proliferation, migration, and invasion of breast cancer cells. 28 Overexpression of SPRY2 promoted EMT through upregulation of ZEB1 in colon cancer. 29 In recent years, more attention has been paid to lncRNA research. In this study, we found several lncRNAs in the ceRNA network that were closely linked to breast cancer patients. Some of these lncRNAs have been shown to be associated with the development of breast cancer, such as HOTAIR, UCA1, and LINC00518, 30,31 but the research on the role of lncRNAs in breast cancer is still very scarce. From the survival analysis, LINC00536 and ADAMTS9-AS1 were significantly associated with poor prognosis of breast cancer, but they have not been reported; therefore, they may become new therapeutic targets and prognostic biomarkers for breast cancer.