Roles of HOTAIR in lung cancer susceptibility and prognosis

Abstract Background Long noncoding (lncRNA) single‐nucleotide polymorphisms (SNPs) are associated with the susceptibility to the development of various malignant tumors. The aim of this study was to investigate the roles of HOX transcript antisense intergenic RNA (HOTAIR) and its SNPs in lung cancer. Methods Initially, the expression of HOTAIR in different tumors was investigated using the online Gene Expression Profiling Interactive Analysis (GEPIA) resource. Three SNPs (rs920778, rs1899663, and rs4759314) of HOTAIR were identified using the MassArray system. Following this, the relationship between these SNPs and susceptibility to lung cancer was investigated. Results Expression of HOTAIR was found to increase in a variety of cancers, including nonsmall cell lung cancer (NSCLC). We found that the genotypes of these SNPs (rs920778, rs1899663, and rs4759314) were not significantly associated with lung cancer type, family history, lymph node metastasis, or lung cancer stage. In gender stratification, the results of rs920778 genotypes showed that, compared to genotype AA, the AG (OR = 0.344, 95% CI: 0.133–0.893, p = .028) and AG + GG (OR = 0.378, 95% CI: 0.153–0.932, p = .035) genotypes of rs920778 are protective factors against NSCLC in females. In smoking stratification, compared with AA of rs920778, the genotype AG + GG (OR = 0.507, 95% CI: 0.263–0.975, p = .042) was a protective factor against NSCLC in nonsmoking people. No statistical differences were observed in the classifications of rs1899663 and rs4759314 genotypes. Linkage disequilibrium analysis revealed a high linkage disequilibrium between the rs920778 and rs1899663 (D′ = 0.99, r 2 = .74), rs920778 and rs4759314 (D′ = 0.85, r 2 = .13), and rs1899663 and rs4759314 (D′ = 0.79, r 2 = .00). Conclusion Our study demonstrated that HOTAIR expression increased in NSCLC, and that the genotypes of rs920778 could be useful in the diagnosis and prognosis of lung cancer.


| INTRODUCTION
Lung cancer, as a common malignant tumor, poses a serious threat to human health and is a grave public health problem worldwide, with more than one million people dying from lung cancer every year (Vachani, Sequist, & Spira, 2017). Although remarkable progress has been made in conventional treatments for lung cancer, many patients experience late-stage diagnosis due to the absence of clear early-stage symptoms. The prognosis of lung cancer is poor, the incidence of recurrence is high, drug resistance is common (Ma et al., 2017), and due to delayed diagnosis, the 5-year survival rate is only 15% . With proper timely screening and the control of high-risk factors in time, patients with lung cancer can be effectively treated or receive an optimistic prognosis; therefore, diagnosis and early treatment are key to prolonging the survival time of patients with lung cancer (Xue et al., 2015). There is consequently an urgent need for finding reliable biomarkers for screening and diagnosing lung cancer.
Lung cancer can be divided into small and nonsmall cell lung cancer (NSCLC). The latter accounts for approximately 85% of the total incidence of lung cancers, including lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC). The main pathogenesis of NSCLC is epigenetic variations in chromatin, and the changes in chromatin modification products may alter the growth mode of cells and result in the loss of the original cellular characteristics (Scott, 2018).
Research from the last decade revealed that lncRNAs are involved in cancer development (Minotti, Agnoletto, Baldassari, Corra, & Volinia, 2018). LncRNA is a class of RNA with a length of more than 200 nucleotides that is rarely involved in encoding proteins. It has important molecular biological functions, such as transcriptional regulation, posttranscriptional regulation, translation regulation, and chromatin reconstruction, and plays a role in epigenetics . Increasing evidence shows that lncRNA is associated with the risk and prognosis of breast cancer, colorectal cancer, gastric cancer, and other malignant tumors (Xue et al., 2015). LncRNA expression is also correlated with tumor metastasis, late pathological stage, and prognoses in patients with lung cancer (Loewen, Jayawickramarajah, Zhuo, & Shan, 2014). Some researchers have suggested that lncRNAs could be significant biomarkers for cancer diagnosis and metastasis (Tong et al., 2015).
Single-nucleotide polymorphisms (SNPs) and somatic mutation on lncRNAs might play a critical role in the pathogenesis of cancer, indicating a strong potential for further development of lncRNAs as biomarkers (Tong et al., 2015). HOX transcript antisense intergenic RNA (HOTAIR, HGNC ID HGNC:33510) with a length of 2,158 nt, is located in the homeobox C (HOXC) gene cluster on chromosome 12, and is found in the transcriptional D group of homeobox genes . HOTAIR was initially identified as lncRNA that interacts with polycomb inhibitory complex 2 (PRC2) and further inhibits the HOXD gene by binding to its 5′ domain. Its molecular scaffolds are regulated by inhibition of expression of PRC2 and lysine demethylase (Gupta et al., 2010). HOTAIR is highly expressed in various cancers, including lung cancer, and induces the proliferation and metastasis of cancer cells . It can also induce tumorigenesis through epithelial mesenchymal conversion (Lee et al., 2016;Padua Alves et al., 2013;Tong et al., 2015). However, due to the limited understanding of its molecular mechanism, its role as a diagnostic marker in lung cancer is still unclear (Tan et al., 2018).
SNPs refer to the DNA sequence polymorphisms caused by single-nucleotide variation at the genome level (Cooper, Smith, Cooke, Niemann, & Schmidtke, 1985). It is a part of genetic change and plays an important role in gene mutation. SNP variations existing in the coding or noncoding region of genes are associated with some diseases. For example, deleterious nonsynonymous SNPs in the tumor suppressor protein TP53 gene affect the p53-estrogen receptor α interaction and are associated with breast cancer (Chitrala, Nagarkatti, Nagarkatti, & Yeguvapalli, 2019). Furthermore, the SNP rs915894 in NOTCH4 gene may be a genetic marker for the prognosis of NSCLC in the Chinese population and may have an interactive relationship with epidemiologic factors (Xu, Lin, et al., 2019). In addition, an association was observed between SNPs in adiponectin gene + 276G/T and breast cancer incidence in postmenopausal women after adjustment for all other variables (Geriki et al., 2019). The AA genotype of SNP rs10889677 was significantly correlated with increased risk of colorectal cancer (Mosallaei et al., 2019). The toll-like receptor gene 2 polymorphism, rs3804100, may be a potential prognostic biomarker for Helicobacter pylori infection-independent gastric cancer (Zhao et al., 2019). Moreover, HOTAIR SNPs are associated with the susceptibility to various malignant tumors, including lung cancer (Bayram, Sumbul, Batmaci, & Genc, 2015;Guo et al., 2015;Pan et al., 2016;Xavier-Magalhaes et al., 2017).
In order to explore the potential of HOTAIR as a diagnostic marker for lung cancer, we firstly investigated the expression of HOTAIR in different tumors using online Gene Expression Profiling Interactive Analysis (GEPIA) resources. Subsequently, the relationship between HOTAIR gene SNPs (rs920778, rs1899663, and rs4759314) and susceptibility to lung cancer was investigated. The correlation between SNP locus and gender and smoking was also tested to provide a basis for the early diagnosis of lung cancer, which will provide a scientific basis for lung cancer prognosis and therapy.

| Analyzing HOTAIR expression on GEPIA
The GEPIA server (http://gepia.cance r-pku.cn/) allows the analysis of the prevalence of a gene signature in TCGA and GTEx samples (Tang et al., 2017). Here, the online resource was used to analyze the expression of HOTAIR gene in different tumors and the expression of genes in LUAD and LUSC. Kaplan-Meier survival analysis was utilized to examine the relationship between HOTAIR expression and the prognoses of patients with lung cancer.

| Analyzing HOTAIR levels on UALCAN
UALCAN (http://ualcan.path.uab.edu/index.html) is an interactive website based on PERL-CGI that analyzes publicly available cancer transcriptome data in the TCGA database. In this study uses the website UALCAN was used to analyze HOTAIR expression in LUAD and LUSC patients of different genders, or of different races. Kaplan-Meier survival analysis was utilized to estimate the relationship between HOTAIR expression and the prognoses of patients with LUAD and LUSC.

| Sample collection
Blood samples from 196 patients were collected between January 1, 2015 and November 30, 2017, from the Dongying People's Hospital, Binzhou Medical College Affiliated Teaching Hospital. Patients in this study had been clinically diagnosed with lung cancer, but had not received radiotherapy or chemotherapy. Healthy control samples (n = 196) were collected from people who underwent physical examinations at the hospital during the same period, but who had no tumor and lung disease. All experiments were subject to approval by the Ethics Committee of Binzhou Medical University. Prior to inclusion, all the patients and controls provided written informed consent. The sample size was estimated by the formula of mismatching design by a formula described in the Supplemental Methods. Blood samples (2 ml each) were collected aseptically under aseptic condition and centrifuged at 2,000 g for 10 min (Eppendorf AG 22331). The centrifuged serum and blood cells were stored at −80°C.

| DNA isolation and genotyping
Genomic DNA was isolated using a Mammalian Genomic DNA Extraction Kit according to the manufacturer's protocol (D0063, Beyotime Biotechnology). The extracted DNA was genotyped using a chip microarray detection system (Shanghai Ouyi Biomedical Technology Co., Ltd.). Briefly, three SNPs of HOTAIR gene were identified using the MassArray system (Agena iPLEXassay). Approximately 10-20 ng of genomic DNA were used for genotype analysis, and were amplified by a multiplex PCR reaction (primers are detailed in Table S1). The PCR products were then used for locus-specific single-base extension reactions, with resulting products were transferred to a 384-element SpectroCHIP array. The alleles were discriminated by a mass spectrometry (Agena).

| Statistical analysis
Basic data were analyzed using the Student′s t test. The chisquared test was used to compare gender, ancestry, residence, occupations, the gene frequency, smoking, alcohol consumption, family history, lymph node metastasis, and lung cancer staging. The genotype of the control group was expected to meet the Hardy-Weinberg equilibrium (HWE; p > .05). A logistic regression model was used to analyze the association between the gene polymorphisms and susceptibility to lung cancer, as well as the difference between the genotypes of different sites within each stratification. Linkage disequilibrium analysis and haplotype analysis were performed on three sites by using SHEsis software (http://analy sis.bio-x. cn/myAna lysis.php). Statistical analysis was conducted using software SPSS 22.0 software (IBM Corp.) with a twosided test, and the test level was α = 0.05.

| HOTAIR expression in different types of cancers
HOTAIR gene is located in the HOXC gene cluster on chromosome 12 [30], and plays an important role in the development of malignant cancers [16,31,32]. In order to further investigate the roles of HOTAIR in cancers, the GEPIA online resource (http://gepia.cance r-pku.cn/) was used to study the expression levels of HOTAIR in different cancers. The results showed that HOTAIR is expressed in a variety of cancers, including breast cancer (BRCA), esophageal cancer (ESCA), glioblastoma multiforme (GBM), head and neck squamous cell carcinoma (HNSC), and kidney renal clear cell carcinoma (KIRC), etc. HOTAIR expression was significantly higher in the tissues of lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), kidney renal papillary cell carcinoma (KIRP), stomach adenocarcinoma (STAD), BRCA, GBM, and HNSC, compared to that in the control tissues, which supports the oncogenic role of HOTAIR in these cancers (Figure 1).

| Higher levels of HOTAIR in LUAD and LUSC
HOTAIR is highly expressed in various cancers, and induces the proliferation and metastasis of these cancers. In order to investigate its roles in NSCLC, we next analyzed the expression levels of HOTAIR in LUAD and LUSC using the GEPIA sever. Results showed that HOTAIR expression was significantly increased in patients with LUAD (n = 483) compared to that in corresponding control samples (n = 347, p < .05). Significantly higher levels of HOTAIR were also found in patients with LUSC (n = 486) compared to controls (n = 338, p < .01, Figure 2a).
Using the UALCAN website (http://ualcan.path.uab.edu/ analy sis.html), we investigated whether gender could affect the expression of HOTAIR in LUAD or LUSC tissues. Results showed that the HOTAIR levels were higher in male (n = 238) and female patients (n = 276) with LUAD compare to control samples (n = 59, p < .001, Figure 2b). Levels of HOTAIR were also found to be much higher in male (n = 366) and female patients (n = 128) with LUSC compared to control samples (n = 52, p < .001, Figure 2c). However, there was on significant difference in HOTAIR expression between male and female patients with either LUAD (p = .058, Figure 2b) or LUSC (p = .319, Figure 2c).

| HOTAIR expression and survival analysis of patients with lung cancer
In order to investigate the effect of HOTAIR expression on the overall survival of patients with lung cancer, GEPIA software was used to analyze the relationship between the HOTAIR levels and prognoses in patients with LUAD or LUSC. The results showed that there was no significant difference between high and low expression of HOTAIR and the overall survival of patients with LUAD (p = .12, Figure 3a). Similarly, in patients with LUSC, no significant difference was found in survival analysis between patients with high and low expression of HOTAIR (p = .70, Figure 3b).
The relationship between HOTAIR expression and the prognoses of male or female patients with lung cancer was analyzed using UALCAN software. No significant differences were found between high and low expression of HOTAIR and disease prognoses in male patients with either LUAD (p = .47, Figure 3c) or LUSC (p = .55, Figure 3d), or disease prognoses in female patients with either LUSC (p = .23, Figure 3e) or LUSC (p = .99, Figure 3f).

| HOTAIR SNPs and the susceptibility of patients with lung cancer
The abovementioned results indicate that the HOTAIR gene is highly expressed in various cancers, including lung cancer. Studies have shown that SNPs can affect the gene expression levels and is closely related to tumorigenesis . Therefore, we further explored the relationship between the HOTAIR gene SNPs (rs920778, rs1899663, and rs4759314) and the susceptibility to patients with lung cancer.

| Patient demographics
In order to investigate the relationship between the HOTAIR SNPs (rs920778, rs1899663, and rs4759314) and the susceptibility to lung cancer, we analyzed DNA from 196 cases of patients with NSCLC and 196 healthy controls. No statistical differences were observed in the age and gender between the case group and controls (p > .05). However, statistical differences were found between patients and the controls in terms of occupation, smoking, and alcohol consumption (p < .05, Table 1).

| Three SNPs were detected and were in accordance with HWE equilibrium law
Using a SNP MassArray system, we detected three SNP genotypes at different loci, including rs920778 in 183 controls and 184 cases, rs1899663 in 188 controls and 187 cases, and rs4759314 in 184 controls and 175 cases (Figure 4a-c).
Genetic variation of SNP rs920778, rs1899663, and rs4759314 within a population was analyzed using the population genetics HWE equilibrium law. These three SNP genotypes were all in accordance with the law of genetic inheritance (p > .05). The controls were well represented in these three genotypes (Table 2).

| HOTAIR SNPs and lung cancer risk analysis
Chi-square tests were used to analyze alleles of HOTAIR SNP rs920778, rs1899663, and rs4759314 in patients with NSCLC and in healthy controls. Gene frequencies of rs920778 alleles A and G were 78.3% and 21.7% in the NSCLC group, and 75.7% and 24.3% in healthy controls, respectively. Gene frequencies of rs1899663 alleles A and C were 16.0% and 84.0% in NSCLC cases, and 18.6% and 81.4% in healthy controls, respectively. The gene frequencies of rs4759314 alleles A and G were 95.1% and 4.9% in NSCLC patients, and 95.7% and 4.3% in controls, respectively. The gene frequencies of the three loci were not significantly different between cases and controls ( Table 3).
The genotypes of the rs920778, rs1899663, and rs4759314 in NSCLC cases and healthy controls were analyzed using chi-square tests. Results demonstrated that the rs920778 locus genotypes AA, AG, GG, and AG + GG, the rs1899663 locus genotypes AA, AC, CC, and AC + CC, and the genotypes AA and AG of the rs4759314 locus between NSCLC and healthy controls were not significantly associated with the susceptibility to lung cancer (Table S2).
We further investigated whether the genotypes of locus rs920778, rs1899663, and rs4759314 are associated with lung cancer type, lymph node metastasis, and lung cancer stage, etc. Results show that AA, AG, and GG genotypes of rs920778 were not significantly associated with lung cancer type, family history, lymph node metastasis, or lung cancer stage (p > .05 , Table S3). Similarly, the genotypes of rs1899663 (AA, AC, and CC) and of rs4759314 (AA and AG) were not significantly associated with lung cancer type, family history, lymph node metastasis, or lung cancer stage, respectively (p > .05, Tables 4 and 5).

SNPs and the risk of lung cancer
In order to further explore the relationship between SNPs and lung cancer risk, stratified analysis was carried out according to the gender, smoking, alcohol consumption, and occupation. Gender was divided into men and women, smoking is divided into smoking and nonsmoking, alcohol consumption divided into drinking and nondrinking, and occupations were divided into employees of government, farmers, and other occupations.

HOTAIR SNPs
In order to further explore the relationship between HOTAIR SNPs and the susceptibility to lung cancer, we used online software SHEsis to analyze the linkage disequilibrium of these three SNPs. The results of linkage disequilibrium analysis revealed a high linkage disequilibrium between the rs920778 and rs1899663 (D′ = 0.99, r 2 = .74), rs920778 and rs4759314 (D′ = 0.85, r 2 = .13), and rs1899663 and rs4759314 (D′ = 0.79, r 2 = .00) genotypes, indicating a high linkage disequilibrium between rs920778 and rs1899663. A linkage disequilibrium was also observed between rs920778 and rs4759314 ( Figure 5).
The haplotypes of these three SNPs were studied using SHEsis software. The results showed that the common alleles in the lung cancer group and control group were A (rs920778), C (rs1899663), and A (rs4759314). The proportion of alleles in the lung cancer group and healthy control group was 78.3% and 76.4%, respectively; however, no significant difference was observed in the allele frequency between these two groups (Table S6).

| Bioinformatics analysis of SNP locus of HOTAIR gene
Following linkage disequilibrium analysis and haplotype analysis, these three loci were bioinformatically annotated using the public database HaploReg v4.1 (http://pubs.broad insti tute.org/mamma ls/haplo reg/haplo reg_v4.php). The three loci were all located on chromosome 12, and the affected bases of the three loci were all A. The reference base of site rs920778 and site rs4759314 was G, and the reference base of site rs1899663 was C. Some highly sensitive DNA regions were also found in these three loci. In addition, some changerelated motifs were observed at locus rs920778 and locus rs1899663, and they are also were quantitative trait loci for the expression of many genes (Table 7).

| DISCUSSION
HOTAIR is one of the earliest studied lncRNAs, and is closely related to the lung cancer progression ( Note: The logistic regression model was used to correct the age, ancestral home, past history, and the place of residence.
Pa and aOR were calculated by logistic regression with adjustment for age, gender, occupation smoking, and alcohol consumption.

T A B L E 4 (Continued)
T A B L E 5 Stratified analysis of rs1899663 and lung cancer risk  -Solorio et al., 2017;Li et al., 2017). HOTAIR is a prognostic factor for various kinds of tumors (Kogo et al., 2011;Li et al., 2013;Zhuang et al., 2013), but the understanding of its role in tumor pathogenesis remains limited. In this case-control study, we explored the relationship between HOTAIR SNPs and susceptibility to lung cancer, and found that the AG and  AG + GG genotypes of rs920778 are protective factors against NSCLC in females, and the genotype AG + GG was a protective factor against NSCLC in nonsmoking people. These data offer potential new tumor markers for screening and diagnosis of lung cancer by using public databases. Data analysis has been become an important tool in determining cancer pathogenesis, in seeking treatment, and in identifying tumor markers (Minotti et al., 2018). The TCGA database is known as the most comprehensive database of cancer information worldwide, covering 39 types of cancer involving 29 cancer organs (X. Liu et al., 2019). The GTEx database contains more normal sample data than the TCGA database (Consortium, 2013). The GEPIA website mainly contains the data of the TCGA and GTEx databases , whereas the UALCAN website only contains the data of the TCGA database (Deng, Xu, & Wang, 2019). Therefore, this study first used different analytical functions of GEPIA and UALCAN to explore the expression of HOTAIR in the prognoses of patients with LUAD and LUSC, together with identifying effective biomarkers of LUAD and LUSC that will provide strong evidence for lung cancer prognosis.

Herrera
We found that HOTAIR expression increased in LUAD and LUSC, and that the expression level was higher than in paracancerous tissues. We then analyzed the relationship between HOTAIR levels and prognoses of patients with LUAD and LUSC on the GEPIA website; no significant difference was observed in prognosis between patients with high expression levels and those with low expression patients, possibly due to the limited sample size and significant regional differences (Lv et al., 2019). The expression of HOTAIR in lung cancer patients of different genders was also analyzed. The expression level of HOTAIR in patients with LUAD or LUSC was higher than that in normal samples. However, no significant difference was observed between male and female patients.
SNP is the most typical type of genetic variant. The human genome probably contains approximately 10 million common SNPs (Mooney, 2005). SNPs that occur within the lncRNA transcripts can affect the structure and function of multiple RNA molecules (Cairns et al., 2019;Chen et al., 2019), whereas the presence of a SNP in the promoter region of a lncRNA could alter its expression level . In addition, somatic mutations that occur within lncRNAs exert important effects in cancer, and preliminary data are promising . Previous studies also revealed that SNPs in miR-219-1 (rs213210, rs421446, and rs107822) significantly affect the susceptibility and prognosis of NSCLC (Zheng et al., 2017).
Genetic variation of HOTAIR may affect its function and is related to the susceptibility of individuals to cancer development (Bayram, Ülger, et al., 2015;Xue et al., 2015). The minor alleles of HOTAIR rs4759314 and rs200349340 were significantly associated with pancreatic cancer susceptibility . HOTAIR rs920778 was associated with esophageal cancer and esophageal squamous cell carcinoma risk (Tian, Liu, Liu, Zuo, & Chen, 2019). The T allele or TT genotype of HOTAIR polymorphisms could serve as a potential genetic marker for cancer risk, especially in Asians (Xu, Zhou, et al., 2019). HOTAIR SNPs rs12427129 and rs3816153 were associated with the risk of hepatocellular carcinoma (HCC) in dominant genetic models. Additionally, SNP-environment interactions for rs12427129, rs3816153, and HBsAg status were found to enhance the risk of HCC . Although, HOTAIR rs920778 and rs1899663 significantly increase susceptibility to lung cancer (Wang et al., 2018), the roles of HOTAIR SNPs in lung cancer still need to be further studied.
In this study, three SNPs of HOTAIR (rs920778, rs1899663, and rs4759314) were investigated in order to explore the relationship of these SNPs with the pathogenesis of lung cancer. Our results showed that no significant difference in the allele frequency of these three SNPs between lung cancer patients and healthy controls. The allele results of rs920778 is similar to Li's study , but the allele results of rs4759314 is different from Li's study, which report that G allele carriers had a 2.598-fold increased risk of developing lung cancer compared to A allele carriers . This result may be due to regional differences and the size of the sample size (Chow et al., 2019). Interestingly, we found that AG or AG + GG genotypes of rs920778 were protective factors against lung cancer in female patients. In the smoking classification, AG + GG genotype of rs920778 was also a protective factor against lung cancer. In addition, significant differences were observed among the three genotypes of rs1899663 in the type of lung cancer. Similarly, Wang et al. found a significant association between HOTAIR SNP rs920778 and susceptibility to lung cancer (Wang et al., 2018).
In addition, rigorous literature screening and quality evaluation were used in this study. The relationship between SNPs in rs920778, rs1899663, and rs4759314 loci of the HOTAIR gene and susceptibility to lung cancer was analyzed from a genetics perspective.
In summary, this study investigated the roles of HOTAIR and its SNPs in lung cancer. We found that HOTAIR expression increased in NSCLC, and that the genotypes of rs920778 are protective factors in female patients and nonsmokers; this is useful for screening and prognosis for lung cancer.

ETHICS STATEMENT
The experiments were approved by the Ethics Committee of Binzhou Medical University.

PATIENT CONSENT FOR PUBLICATION
Prior to inclusion, all the patients and controls provided written informed consent.