SNPs in LncRNA genes are associated with non‐small cell lung cancer in a Chinese population

Background It has indicated that single nuclear polymorphisms (SNPs) in the regions encoding non‐coding transcripts are associated with lung cancer susceptibility. In a previous microarray study, we identified 13 differentially expressed long non‐coding RNAs (lncRNAs) in non‐small cell lung cancer (NSCLC) and associations of SNPs in these lncRNA genes with lung cancer were unknown. We conducted a case‐control study to address this issue. Methods Using the TaqMan method, we genotyped 17 SNPs located in the 13 lncRNA genes in 1294 cases with NSCLC and 1729 healthy controls. Unconditional logistic regression and Cox proportional hazards regression were used to analyze the associations of these SNPs with NSCLC risk and patient survival, respectively. These analyses were also repeated in subgroups of cases and controls stratified by gender, age group, smoking status, disease stage, and histological type. Results We identified three SNPs associated with NSCLC risk. For SNP rs498238, CC genotype was associated with lower risk compared to TT genotype (adjusted OR = 0.33, 95%CI: 0.11‐0.97, P = 0.043). For rs16901995, CT/TT genotypes were associated with lower risk compared to CC genotype in non‐smokers (adjusted OR = 0.78, 95%CI: 0.62‐0.98, P = 0.035). Variant genotypes in rs219741 were associated with NSCLC risk in young patients, and the adjusted OR was 1.47 (95%CI: 1.03‐2.10, P = 0.033) when compared to the wild genotype. No SNPs were found to be associated with patient overall survival in the study. Conclusion The study suggests that some genetic polymorphisms in the lncRNA genes may influence the risk of NSCLC among Chinese.

approximately 80% of all lung cancer cases, and small-cell lung cancer (SCLC). 3 Although tobacco smoking is the major risk factor, 4 the etiology of lung cancer is multifactorial, including inherited genetic characteristics, such as single nucleotide polymorphisms (SNPs), 5 which explains individual's susceptibility to the development of lung cancer. During the past decade, genome-wide association studies (GWAS) have identified many common SNPs associated with the risk and outcome of lung cancer. However, heritability analysis indicated that the identified genetic loci could explain only a small fraction of lung cancer susceptibility. 6 Additional efforts are needed to search for more lung cancer-related genetic factors, especially those rare variants and loci in non-coding regions.
Long non-coding RNAs (lncRNAs) are a class of RNA transcripts with more than 200 nucleotides in length and without translational capability. LncRNAs have been found to have diverse biological functions, some of which are involved in various tumorigenic processes. 7 A number of dysregulated lncRNAs have also been demonstrated to be potential diagnostic or prognostic biomarkers for lung cancer, such as metastasis associated in lung adenocarcinoma transcript 1 (MALAT1) 8 and HOX antisense intergenic RNA (HOTAIR) 9 which are overexpressed in NSCLC and recognized as onco-lncRNAs. In contrast, maternally expressed gene 3 (MEG3), 10 taurine-upregulated gene 1 (TUG1), 11 and BRAF-activated non-protein coding RNA (BANCR) 12 which are downregulated in NSCLC are considered as tumor suppressors. These dysregulated lncRNAs are found to be involved in regulation of cell growth, proliferation, migration, and invasion.
Evidence also indicates that SNPs in the lncRNA genes affected tumorigenic process and chemotherapy response. Gong et al 13 found that SNPs in HOTTIP, H19, and CCAT2 were associated with lung cancer risk, and SNPs in MALAT1, H19, CCAT2, HOTAIR, and ANRIL were related to lung cancer patients' response to platinum-based chemotherapy. Yuan et al 14 conducted a meta-analysis of eight GWAS on subjects with European ancestry and discovered rs114020893 in the lncRNA NEXN-AS1 associated with lung cancer risk. This SNP's influence on lung cancer susceptibility may be achieved through its genotype-specific secondary structure stability. Hu et al 15 reported a SNP in CASC8 associated with both lung cancer risk and chemotherapy response and toxicity.
Findings from the above studies indicate that identifying SNPs in the lncRNA genes associated with lung cancer may help to elucidate the biological mechanisms of lncRNAs in lung cancer. Currently, our knowledge on lncRNA's involvement in lung cancer is still limited; more studies are needed to discover SNPs in lncRNAs which are associated with lung cancer risk or outcome. Based on the findings of our previous study on lncRNAs in NSCLC, 16 we conducted a case-control study on SNPs of the lncRNAs which showed different expression between tumor and matched adjacent normal tissues. In this study, we analyzed the association of lung cancer with 17 SNPs in 13 selected lncRNAs. We also investigated these SNPs in relation to lung cancer survival. Results of our association study are described in this report.

| SNP selection and genotyping
In our previous study, 16  Our genotyping method has been described elsewhere. 18 In brief, genomic DNA in peripheral blood leukocytes was extracted from cases and controls using the standard phenol-chloroform method. SNP genotyping was determined by the TaqMan assay using the ABI 7900 FAST real-time polymerase chain reaction (PCR) system (Thermo Fisher Scientific, Waltham, MA, USA). All primers and probes were purchased from Thermo Fisher Scientific. Ten percent of the DNA samples were randomly selected for replication, and the results of the repeats were in complete concordance.

| Statistical analysis
Distributions of subject characteristics and genetic polymorphisms were compared between cases and controls using the chi-square test. Student t test was used for comparison of continues variables between groups. Hardy-Weinberg equilibrium was calculated for each SNP in the control subjects. In order to balance the distributions of age and gender in case and control groups, propensity score matching (PSM) analysis was conducted. Associations between SNPs and NSCLC risk were analyzed using the unconditional logistic regression model. Odds ratios (OR) and 95% con-

| Study population
The demographic characteristics of the initial 1294 cases and 1729 controls were summarized in

| Associations of SNPs and NSCLC risk
Allele distributions of the 17 SNPs selected for study were all in Hardy-Weinberg equilibrium in the control group (P > 0.05,  to those with CC genotype (adjusted OR = 0.78, 95%CI: 0.62-0.98, P = 0.035; Table 4). Similarly, when analyzing the relationship in subgroups, we found that SNP rs219741 was associated with increased risk of NSCLC among younger subjects (age < 60 years). The adjusted OR was 1.47, and 95%CI was between 1.03 and 2.10 (P = 0.033).
But subjects with CC genotype had a reduced risk in a recessive model (adjusted OR = 0.74, 95%CI: 0.54-1.00, P = 0.050). There was no significant difference in the dominant model, nor in stratified analyses.

| Associations of SNPs and NSCLC outcome
Patient characteristics and clinical features are shown in Table S3.
Survival analysis was performed to assess the genotypes of the four selected SNPs in association with the NSCLC outcome ( Table 5). The analysis showed no significant associations between these genotypes and NSCLC overall survival before or after adjustment for age, gender, smoking status, disease stage, and histology type. To further investigate the association of SNPs with NSCLC survival in patients with different clinical characteristics, we conducted stratification analyses in the dominant model (Table S4). The results showed that only in patients with lung adenosquamous carcinoma (ASC), rs219741 was associated with survival. However, the sample size (deaths/patients: 19/23 vs 5/6, in GG vs GA + AA genotypes, respectively) was too small to draw a conclusion.

| D ISCUSS I ON
In this study, we evaluated 17 SNPs in 13 lncRNAs with regard to their associations with NSCLC risk and survival. We found that NSCLC risk was significantly associated with SNP rs3113503, rs498238, rs16901995, and rs219741. These SNPs are located in initially named as loc100130502, is predicted to stay mainly in the nucleus of A549 cells. 27 In the NCBI GEO database, loc100130502 was shown to be upregulated in NSCLC tumors compared to matched adjacent non-tumor tissues of non-smoking women in one dataset GSE19804 ( Figure 1A), but no difference in another dataset GSE18842 ( Figure 1B). The LINC01833 gene is located close to the gene SIX3, and this non-coding transcript is considered a Wnt/β-catenin pathway-related lncRNA. 28 SIX3 was reported to inhibit the pathway in the development of vertebrate forebrain. 29 Kumar et al 30  is present mainly in cell nucleus, 27 and significant downregulation was observed in NSCLC when we analyzed the online datasets GSE19804 ( Figure 1C) and GSE18842 ( Figure 1D). No expression information was found for lnc-NDUFS6-5:5 (rs16901995) and loc105369301 (rs219741). LncRNASNP database indicates that SNP rs219741 may change the secondary structure of the lncRNA lnc-CHAF1B-3:1 ( Figure 1E for wild type and Figure 1F for mutant type).
Our data suggest that SNP rs498238 and rs3113503 may have allele-specific influences on lncRNA expression in NSCLC.
The SNPs we investigated in this study were selected from a list of lncRNAs which showed significant differences in expression between NSCLC tumor and matched adjacent normal tissues. The initial analysis of lncRNAs was accomplished with an expression microarray, and the study population was Chinese Han. Thus, the findings of our SNP analysis were likely to be limited to Chinese populations and the number of lncRNAs included in the microarray chip.
In addition to these limitations, our sample size for analyzing the SNP association was relatively small, and there were no validation and P value adjustment during our evaluation. We also did not perform any functional evaluation and experiments to demonstrate the biological relevance of these SNPs in NSCLC. Despite these shortcomings, we were able to find some preliminary data to suggest that SNPs in non-coding regions, especially in the lncRNA genes, may have potential implications in cancer etiology. More studies are needed to characterize these non-coding region SNPs and elucidate their biological relevance and molecular mechanisms in relation to lncRNA's function and tumorigenesis.
In summary, we analyzed 17 SNPs in the genes of lncRNAs with differential expression in NSCLC and identified three of them associated with the risk of NSCLC among Chinese. These findings suggest that SNPs in non-coding regions of the genome may also be important when comparing to those in the coding regions. Further analyzing this type of SNPs may provide new insights into the functions of lncRNAs and their involvement in cancer.