Polymorphisms in lncRNA CCAT1 on the susceptibility of lung cancer in a Chinese northeast population: A case–control study

Abstract Object To explore the association of rs1948915, rs7013433 in long noncoding RNA (lncRNA) CCAT1 and rs6983267 in MYC enhancer region with the risk of lung cancer in a Chinese northeast population, a case–control study was conducted. Methods The hospital‐based case–control study contained 669 lung cancer patients and 697 healthy controls. Taqman® Probe allele resolution was used for genotyping. The differences between the case–control groups were analyzed using Student t‐test and chi‐square test. Logistic regression analysis was used to assess the relationship between the genotypes and the risk of lung cancer. Cross‐generation analysis was used to explore the relationship between gene–environment interaction and lung cancer. Results There was no association between the three selected single‐nucleotide polymorphisms (SNPs) and the susceptibility of lung cancer. Rs1948915 CT was correlated with lung adenocarcinoma. In female stratification, rs1948915 CT/CC was associated with a decreased susceptibility of lung cancer significantly. Additionally, the additive and multiplicative interaction models showed that there was no interaction between the three selected SNPs and smoking status in lung cancer. Conclusions There may be an association between lung adenocarcinoma and rs1948915 polymorphism in the Chinese northeast population, while rs7013433 and rs6983267 might have no association. There was no interaction between the three selected SNPs and smoking status.


| INTRODUCTION
Cancer is one of the most serious diseases nowadays, which cannot be ignored. According to the International Agency for Research on Cancer (IARC), 18.1 million new cases and 9.6 million cancer deaths were estimated worldwide in 2018. 1 By 2030, the global burden of cancer will rise to around 22.2 million new cases and 13.2 million deaths, which is disturbing. 2 Notably, there were 2.1 million people newly diagnosed with lung cancer and 1.8 million people died of it. 1 What is more, the 5-year survival rate of lung cancer is as low as 19% behind only pancreatic cancer, suggesting that increasing attention should be given to lung cancer. 3 Accumulating researches have indicated that the occurrence of lung cancer is an intricate process, which is affected by a variety of factors, including genetic factors, environmental factors, and their interaction. 4,5 With the burgeoning growth of genome-wide association studies (GWAS), a large number of studies have emerged on the relationship between long noncoding RNA (lncRNA) single-nucleotide polymorphisms (SNPs) and cancer susceptibility, as well as a greater focus on genetic risk factor studies. LncRNA, over 200 nucleotides, is a kind of noncoding RNA that has no protein-coding function. 6,7 According to the function in tumors, lncRNAs can be divided into tumor-promoting lncRNAs and tumor-suppressive ln-cRNAs. As gene regulators, lncRNAs may play an imperative role in trans-, cis-, and post-transcriptional gene regulation through complex mechanisms in oncogenic paths. [8][9][10][11] LncRNA polymorphisms might regulate their functions and expressions, affecting individual's cancer susceptibility. [12][13][14] That is to say, polymorphisms in functional lncRNAs, just like SNPs of protein-coding genes, can also promote the development of cancer. 15,16 CCAT1 (colon cancer-associated transcript 1), also termed as LOC100507056 or CARLo-5 (cancer-associated region long noncoding RNA), is a 2682 nucleotide-long lncRNA near c-Myc on chromosome 8q24.21, a famous transcription factor. [17][18][19][20][21] In 2012, Nissan et al. measured the high specific expression of CCAT1 in colorectal cancer (CRC) for the first time, and was once considered to be a specific expression of lncRNA in CRC, reporting that the average expression level of CCAT1 in colon cancer tissues was 235 times uncommonly higher than the counterpart in normal colon mucosa tissues. 19 However, emerging studies have recognized that the overexpression of ln-cRNA CCAT1 was determined in many types of cancer, like gastric carcinoma (GC) 20 and hepatocellular carcinoma (HCC), 22 etc. besides CRC.
The expression of miR490 can be regulated by CCAT1 in gastric cancer, while miR490 can also inhibit CCAT1 expression, and they are negatively correlated, whose high expression after transcription can decrease the expression of CCAT1 and significantly restrain the metastasis of gastric cancer. 23 Upregulation of CCAT1 expression is directly related to c-Myc in the E-box (enhancer box) element of its gene promoter region. If the E-box element mutates, c-Myc will not promote CCAT1 expression. [24][25][26] Xiang et al. showed CCAT1 promoted long-range chromatin looping and regulated the process of MYC transcription. The absence of CCAT1 decreased long-range interaction between its enhancers and the c-MYC promoter. 17 LncRNA CCAT1 is closely correlated to c-MYC transcription and cell growth in a variety of cancer types. 27,28 Zhao et al. have reported that CCAT1 expression is closely regulated by carcinogenic SNP rs6983267 of the MYC enhancer region, correlated with endometrial carcinoma. 29 Various studies have shown an association between lncRNA CCAT1 polymorphisms and cancer susceptibility. Previous studies analyzed European patients with multiple myeloma by GWAS and found that lncRNA CCAT1 rs1948915 polymorphism was closely related to multiple myeloma. 30,31 Li et al. concluded that lncRNA CCAT1 rs7013433 polymorphism was tightly connected with advanced stage of colorectal cancer in the population of Fujian and Zhejiang provinces, China, through a casecontrol study. 32 Park et al. found that SNP rs69832627 was connected with the susceptibility of lung cancer in smoking stratification through a case-control study. 33 Nevertheless, Zhang et al. proposed that subjects with GG homozygous genotype increased the susceptibility of developing lung cancer than individuals carrying TT homozygous genotype in the population of China. Additionally, there was a more significant difference in non-smokers in smoking stratification. 34 The conclusions drawn above are prominently inconsistent and need to be verified. SNP rs6983267 is located on lncRNA CCAT2. Many studies have shown that the high expression of CCAT1 and CCAT2 is significantly related to the poor prognosis of CRC patients, and has a strongly association with MYC enhancer. 35 And these two lncRNA independently, or in combination, can be used as an important biomarker for the prognosis of CRC. 35,36 Therefore, rs1948915 and rs7013433 in CCAT1 and rs6983267 in MYC enhancer region were selected for this study. Considering the significant parts of CCAT1 in the development of cancer and the unclear effects of CCAT1 in lung cancer, we implemented a case-control study to analyze the relationship of the polymorphisms rs1948915, rs7013433 in lncRNA CCAT1 and rs6983267 in the MYC enhancer region with lung cancer susceptibility in the northeast of China. Consequently, we explored the interaction of the selected SNPs and smoking exposure status with the risk of lung cancer, which was necessary to probe cancer etiology and decrease environmental-related risk factors for cancer prevention.

| GEPIA2 dataset
GEPIA2 (Gene Expression Profiling Interactive Analysis 2) is an online bioinformatics tool for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) projects, using a standard processing pipeline (http://gepia2.cancer-pku. cn/#analysis). To mine the expression and prognosis, GEPIA2 provides customizable tumor/normal differential expression analysis, patient survival analysis and so on. 37

| Study subjects
We performed a hospital-based case-control study, in Shenyang City, northeast of China, where there were 669 cases and 697 healthy controls. These patients were confirmed as lung cancer (from January 2011 to December 2013) at The First Affiliated Hospital of China Medical University, The Fourth Affiliated Hospital of China Medical University and General Hospital of the Northern War Zone of the Chinese People's Liberation Army. During the same period, we selected the corresponding controls from physical examination in the same hospital. The included criteria of cases were: (1) patients newly diagnosed by two expert pathologists without metastatic cancer or any previous cancer, (2) no therapy (both radiotherapy and chemotherapy), and (3) willing and capable to have an interview. The included criteria of healthy controls were without a history of any cancer or other diseases of lung. Importantly, all the participants are no blood relationship with each other and Chinese Han population. All anticipants were sure not to accept blood transfusion in the past 6 months. We got approval from the Ethics Committee of China Medical University, and informed consent was signed by each subject. After an interview, 10 ml of peripheral venous blood was donated by each subject as specimen for SNP genotyping. Additionally, if a subject smoked under 100 cigarettes in the past, he or she was determined as a non-smoker; if not, the subject was a smoker.

| SNP selection and genotyping
We selected the tagSNPs of CCAT1 by the pairwise option of the Haploview 4.2 software (setting r 2 ≥ 0.8, minor allele frequency > 0.05), using the data of Han Chinese from the 1000 Genome Projects. Then, we combined the domestic and foreign studies. Finally, we select rs1948915, rs7013433 in lncRNA CCAT1 and rs6983267 in the MYC enhancer region. The IDs of the test primers in order are C_3052970_10, C_1523520_20, and C_29086771_20. The minor allele frequencies (MAF) of the selected SNPs are totally more than 5% in the population of China. Genomic DNA samples were isolated from venous blood by phenolchloroform method. Next, an Applied Biosystems 7500 Real-Time PCR System (Foster City, CA) was used with Taqman® allelic discrimination for SNP genotyping with primer probe set. There were appropriate positive, negative, and blank controls contained in each run. Over 10% of samples twice were chosen twice randomly and tested for quality control by two persons, and the two results were in concordance with each other completely.

| Statistical analysis
We tested the differences of demographic variables between case group and control group with chi-squared test and Student's t-test. It was confirmed whether the selected SNPs were under the Hardy-Weinberg equilibrium (HWE) in the control population by the goodness-of-fit chi-squared test. We got the ORs and their 95% confidence intervals (CIs) by unconditional logistic regression analysis to estimate the associations of the selected SNPs with the susceptibility of lung cancer. The relationship of the interaction of polymorphisms of the selected SNPs and smoking status with lung cancer was evaluated by logistic regression models (multiplicative interaction) and F I G U R E 3 The significant differences in gene expression between different pathological stages in LUAD and LUSC patients, p < 0.05.

Variable
Case ( crossover analysis (additive interaction). We regarded those with both the protective genotype and no smoking exposure as the reference group in our analysis. By logistic regression models, multiplicative interaction was included in the models. All the statistical analyses were conducted by SPSS software (Version 20.0; IBM SPSS, Inc., Chicago, IL, USA). More importantly, we defined statistical significance as p < 0.05 for two sides.

| Expression analysis of CCAT1 in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) tissues
The CCAT1 expression of LUSC tissues was significantly higher than that in normal tissues through GEPIA2, including 486 LUSC tissues and 338 normal tissues, while it was no significance between 483 LUAD tissues and 347 normal tissues in Figure 1. The overall survival time showed positive results between clinical samples and the expression level of lncRNA CCAT1 in LUAD patients ( Figure 2 p < 0.01), but there was no statistical significance of expression level in LUSC patients ( Figure 2B, p > 0.05). We also generated expression violin maps based on the patient's stage of pathology. The results showed statistically significant differences in gene expression between different pathological stages in LUAD and LUSC patients ( Figure 3, p < 0.05).

| Baseline characteristics
This epidemiologic study recruited a total of 1366 participants including 669 patients with lung cancer and 697 healthy controls, whose demographic characteristics are depicted in the smoking exposure rate was 51.9% in the patients, whereas it was 31% in the control group, showing that the smoking exposure was an evident risk factor to lung cancer (p < 0.001). However, the Student-t revealed that the age distribution between the two groups was statistically significant, with being 60.51 ± 11.12 and 56.26 ± 14.97 in the cases and controls separately (p < 0.001). Therefore, all further statistical analyses were adjusted by gender, age, and smoking status to eliminate the potential confounders. The genotype frequencies of rs1948915, rs7013433, and rs6983267 in the control group were under the Hardy-Weinberg equilibrium (χ 2 = 1.049, p = 0.306 for rs1948915; χ 2 = 0.0006, p = 0.980 for rs7013433; χ 2 = 0.419, p = 0.518 for rs6983267), indicating that the subjects selected were a good representative sample from the general population.

| Genotype distribution and lung cancer susceptibility
It was summarized that the relationship of the genotype results of the three SNPs with the susceptibility to lung cancer and NSCLC in

SNPs and smoking exposure
It was provided that the results of the crossover analysis are in Table 7. Here, we evaluated the interaction between the selected SNP genotypes and smoking status on lung and NSCLC. We found that smokers with both protective and dangerous genotypes had significantly raised the susceptibility of lung cancer and NSCLC, compared with nonsmokers, indicating that there might be gene-environment interaction. Therefore, we further investigate the interaction, using additive and multiplicative models. Regrettably, there was no interaction of the selected SNP genotypes and smoking exposure with lung cancer and NSCLC risk in both additive and multiplicative models, summed up in Table 8.

| DISCUSSION
It is well known that cancer is a major killer of human health, and lung cancer is the main killer among all cancers. With the present background that growing incidence and bad prognosis of lung cancer have been arousing our great attention, a hospital-based case-control study was conducted to assess the association between polymorphisms of the three selected SNPs and lung cancer susceptibility. Based on the importance of gene-environment interaction in the development of lung cancer, we further assessed whether there was an interaction between gene polymorphisms and smoking exposure at the selected loci by crossover analysis of SNPs and smoking status. Through analysis of the GEPIA2 and TCAG databases, we found that the CCAT1 expression of LUSC tissues was significantly higher than that in normal tissues, while not LUAD tissues. In overall survival time analysis, the Kaplan-Meier curve showed positive results between clinical samples and the expression level of lncRNA CCAT1 in LUAD patients, but there was no statistical significance of expression level in LUSC patients. A recent study indicated high expression of lncRNA CCAT1 in NSCLC was correlated with tumor malignant possibility. And lncRNA CCAT1 directly inhibited microRNA-218 (miR-218) and indirectly increased BMI-1 expression (B lymphoma Mo-MLV insertion region 1 homolog), then enhanced tumor growth in NSCLC. 38 The study of ZHAO et al. did not divide NSCLC into subtypes (LUAD and LUSC), which may be the reason of inconsistent results. Herein further validating researches need to be implemented in large and independent samples before a believable conclusion can be drawn.
We obtained that rs1948915 CT/CC significantly decreased the risk of lung cancer in female stratification and even nonsmoking female population, compared with TT genotype carriers. Moreover, compared with their reference genotypes, the results showed that rs1948915 CT T A B L E 7 Crossover analysis of interaction between the selected SNP genotypes and smoking exposure in lung cancer and NSCLC and rs6983267 GG had a lower risk for lung cancer in the population ≤ 58 years old. In our result, rs1948915 C and rs6983267 G are protective alleles. However, there was no association of polymorphisms of the three selected SNPs with lung cancer in both overall population and other stratification analyses, including additive and multiplicative models. Thomsen et al. analyzed European patients with multiple myeloma by GWAS and found that lncRNA CCAT1 rs1948915 CC genotype was closely related to multiple myeloma, 30 indicating that allele C was a risk allele. However, in our present study, the results revealed that rs1948915 CT/CC polymorphisms significantly decreased the susceptibility of lung cancer in female population, compared with TT genotype. Obviously, the C allele was protective in lung cancer, with contrast to multiple myeloma. A possible reason is that the expression mechanisms of rs1948915 C/T in CCAT1 may be different in two cancers, as sophisticated expression mechanisms of rs1948915 polymorphism during the development of two cancers is still unclear.
In the study of rs6983267 on lung cancer, Park et al. found that rs6983267 GG was closely correlated with the susceptibility of lung cancer in smoking stratification through a multicenter case-control study. 33 However, Zhang et al. performed a case-control study in Zhejiang and Fujian provinces, China, and revealed that individuals carrying the GG homozygous genotype augmented the susceptibility of developing lung cancer compared with ones with TT homozygous genotype. Honestly, in our study, there was no correlation between rs6983267 polymorphism and lung cancer in both overall and stratified population except people no more than 58 years old in age stratification analysis. Moreover, existing evidence has proved that SNP rs6983267 GG could augment the risk of many cancers (e.g. colorectal cancer, 39,40 gastric cancer, 41 thyroid cancer, 42 etc.). The above inconsistence reveals that ethnic or regional differences may be a possible cause, and the other possible reason is the small-scale sample of our study, which might lead to various deviations. In our stratification analysis, especially in age stratification, the number of subjects in both case group and control group is small, which could result in a false positive. Further mechanism needs to be validated above inconsistent results.
Consequently, the key characteristics of SNPs in carcinogenic lncRNAs are needed to be explored in future study, to discover the unseen capacities of lncRNAs to diagnose early and prevent cancer. In this study, we have clear criteria of inclusion and exclusion in selecting newly diagnosed patients with lung cancer, which can avoid Neyman bias effectively. During the statistical analysis, all the ORs and 95% CIs were adjusted by gender, age, and smoking status in unconditional logistic regression analysis to reduce confounding bias. Nevertheless, we also have some limitations that should not be ignored. First, although we selected cases and controls from multiple hospitals, it is possible to have Berkson bias in the present study. Second, participants in the control were from the physical examination of the same hospital, but it may not represent all the healthy population. Third, the size of our present study is restricted, especially in stratification subgroup. Therefore, large-scale sample studies are needed to verify the results of our study across different ethnicities and regions later.

| CONCLUSIONS
There may be an association between lung adenocarcinoma and rs1948915 polymorphism in the Chinese northeast population, while rs7013433 and rs6983267 genetic variants might have no association with lung cancer. There was no interaction between the three selected SNPs and smoking status in lung cancer. Abbreviations: AP, attributable proportion due to interaction; CI, confidence interval; RERI, relative excess risk due to interaction; S, synergy index.
T A B L E 8 Additive interaction between the selected SNP risk genotypes and smoking exposure in lung cancer and NSCLC

AUTHOR CONTRIBUTIONS
Yangtao Ji conducted experiments and wrote the paper; Yue Yang collected samples and collated data; Zhihua Yin revised the paper. All authors have reviewed the final version of the manuscript and approved to submit to your journal.