SNP interactions of PGC with its neighbor lncRNAs enhance the susceptibility to gastric cancer/atrophic gastritis and influence the expression of involved molecules

Abstract Multidimensional interactions of multiple factors are more important in promoting cancer initiation. Gene‐gene interactions between protein‐coding genes have been paid great attention, while rare studies refer to the interactions between encoding and noncoding genes. Our research group previously found encoding gene PGC polymorphisms could affect the susceptibility to atrophic gastritis (AG) and gastric cancer (GC). Interestingly, several SNPs in long noncoding RNA (lncRNA) genes, just adjacent to PGC, were found to be associated with AG risk and GC prognosis afterward. This study aims to explore the SNP interactions between PGC and its neighbor lncRNAs on the risk of AG and GC. Genotyping for seven PGC SNPs and seven lncRNA SNPs was conducted using Sequenom MassARRAY platform in a total of 2228 northern Chinese subjects, including 536 GC cases, 810 AG cases, and 882 controls. We found 15 pairwise PGC‐lncRNAs SNPs had interactions: Five pairs were associated with AG risk, and ten pairs were associated with GC risk. Moreover, two GC‐related interactions PGC rs6939861 with lnc‐C6orf‐132‐1 rs7749023 and rs7747696 survived the Bonferroni correction (P correction = 0.049 and 0.007, respectively). Several combinations showed obvious epistasis and cumulative effects on disease risk. Some three‐way interactions of SNPs with smoking and drinking could also be observed. Besides, a few interacting SNPs showed correlations with the expression levels of PGC protein and related lncRNAs in serum. Our study would provide research clues for further screening combination biomarkers uniting both protein‐coding and noncoding genes with the potential in prediction of the susceptibility to GC and its precursor.


| INTRODUCTION
It has been extensively investigated that the most common form of genetic variation, single nucleotide polymorphisms (SNPs), can be potential biomarkers for risk prediction of cancer. [1][2][3] However, the diagnostic efficacy for single SNP is limited, resulting from the multiple factors involved in carcinogenesis. 4 A consensus has been reached that multidimensional interactions of various factors such as gene-gene and gene-environment are more important in promoting cancer initiation. Knowledge of gene-gene and gene-environment interactions could help to reveal substantial hidden heritability within the architecture of cancer susceptibility. 5 In recent years, most investigations on gene-gene interactions are mainly focused on encoding genes, while rare studies refer to the interactions between encoding and noncoding genes. As we know, among the sequences transcribed constantly in human genomes, only 1% are protein-coding sequences and the vast majority is noncoding RNA (ncRNA). 6 Currently, the number and types of known functional ncRNAs have increased considerably. A subset of both short-and long-sized species are known to be involved in the regulation of target genes located at or near the same genomic locus. Their expression is often coordinated with that of nearby protein-coding genes, and in many cases, related transcripts can influence each other at one step or another during their biogenesis. 7 Therefore, exploration of interactions between encoding and their neighbor noncoding genes would be greatly beneficial for all round elucidation of gene impacts on physiological and disease states. PGC protein, encoded by the pepsinogen C (PGC) gene, is a specific marker in the terminal differentiation of gastric mucosa, of which the aberrant expression occurs in many gastric diseases. [8][9][10] Our research group previously found PGC polymorphisms could affect the susceptibility to atrophic gastritis (AG) and gastric cancer (GC). 11,12 Interestingly, several SNPs in long noncoding RNA (ln-cRNA) genes, just adjacent to PGC, were found to be associated with AG risk and GC prognosis afterward. 13 However, it remains unclear whether PGC and its neighbor lncRNAs have SNP interactions with each other on the susceptibility to GC/AG.
In this study, we explored the SNP interactions between PGC and its neighbor lncRNAs on the risk of AG and GC, the modifying effects of environmental factors such as smoking, drinking, and Helicobacter pylori (H. pylori) infection, and the influence of SNP interactions on the expression of PGC protein and related lncRNAs. Our study aims to provide research clues for the identification of combination biomarkers uniting both protein-coding and noncoding genes with the potential in risk prediction of GC and its precursor.

| Study subjects and epidemic-clinical information collection
The study was approved by the Ethics Committee of China Medical University First Hospital. Written informed consent was obtained from all participants. A total of 2228 subjects were involved in our study, including 536 GC cases, 810 AG cases, and 882 controls. All enrolled individuals were recruited from the Zhuanghe Gastric Diseases Screening Program or hospitals in Zhuanghe and Shenyang of Liaoning Province, China between 2002 and 2013, which had been previously reported. 14 The controls were frequency-matched to the GC and AG cases, respectively, on the basis of gender and age (±5 years). That means an individual in the control group can be matched to an AG and a GC case simultaneously so long as it has the same sex both with them, and neither the age differences between the control and the AG nor the GC case are more than 5 years. Epidemiological data for each subject were obtained from face-to-face inquiry or the medical records of inpatients. After admission, gastroscopy examination was performed by experienced endoscopists. According to the updated Sydney system and the seventh edition of TNM staging, [15][16][17] histopathological diagnoses were carried out independently by two gastrointestinal pathologists. Patients in the AG group were confirmed to have moderate to severe AG with or without intestinal metaplasia, and individuals confirmed to be with normal stomach or to have mild superficial gastritis were selected for the control group. Fasting venous blood samples (5 mL) were collected from each participant.

| Genotyping
Genomic DNA was extracted from each blood sample using phenol-chloroform method. SNP genotyping was performed by Bio Miao Biological Technology (Beijing, China) applying Sequenom MassARRAY platform (Sequenom, San Diego, CA, USA). Additionally, we randomly selected 10% of the samples for repeated assays, and the results of all duplicated samples were 100% consistent.

| Detection of H. pylori-IgG titer, PGC protein, and lncRNAs in serum
The serum H. pylori-IgG titer and PGC protein concentration were detected using an enzyme-linked immunosorbent assay (ELISA kit, Biohit, Helsinki, Finland). Individuals with the titer > 34 IU were diagnosed as H. pylori-positive. Total RNA was isolated from 400 μL of serum using a Blood Total RNA Isolation Kit (Bioteke, Beijing, China). Total RNA was converted into complementary DNA using a prime script RT master MIX (TaKaRa Biotech, Dalian, China). The lncRNA levels and an internal control gene, glyceraldehyde 3-phosphate dehydrogenase (GAPDH), were examined using SYBR Premix Ex TaqII (TaKaRa Biotech, Dalian, China). Quantitative real-time polymerase chain reaction (qRT-PCR) was performed in an Eppendorf Mastercycler Gradient System (Eppendorf AG, Hamburg, Germany) according to the manufacturer's protocol. The sequences of primers used in qRT-PCR were presented in Table S1. All the primers were synthesized by The Beijing Genomics Institute (Beijing, China). Melting curve analysis was performed to exclude the presence of nonspecific products and primer dimers. No template controls were included in each experiment. The relative quantification of lncRNA levels was calculated using the 2 −ΔCt method.

| Statistical analysis
The differences in epidemiological characteristics between case and control groups were evaluated using the chi-squared test. The multinomial logistic regression was applied to estimate the risk of gastric diseases by calculating odds ratios (ORs) with 95% confidence intervals (CIs). The log-likelihood ratio test was employed to assess the interactions among PGC SNPs, lncRNA SNPs, and environmental factors by comparing the model that only contained the main effects of each factor with the full model that also contained interaction items. The ORs with 95% CIs were adjusted by gender, age, and H. pylori infection status unless H. pylori was regarded as an interaction item. The Cochran-Armitage test for linear trend was used to judge the dosage effect on diseases risk with an increasing number of interacting factors. The difference in PGC protein and lncRNA levels in serum between two groups was compared using the Student's t test. The statistical analyses mentioned above were conducted using SPSS 22.0 software (SPSS, Chicago, IL, USA). All the tests were two-sided, and P < 0.05 was considered to be statistically significant. The Bonferroni correction was used to adjust P values for multiple tests as needed. Additionally, the dominant, recessive, and overdominant models were defined as heterozygote+homozygote variant vs. homozygote wild, homozygote variant vs heterozygote+homozygote wild, and heterozygote vs. homozygote wild+homozygote variant, respectively. 18

| Baseline characteristics of the subjects
The study subjects consisted of 810 AG cases matched with 880 controls and 536 GC cases matched with 748 controls. No significant difference was found in the distribution of gender and age between the two pairwise groups of cases and controls (Table S2).

| Association of single SNPs with AG and GC risk
A total of 14 SNPs were involved in the study, of which the relationship with the susceptibility to GC/AG had been However, no association with any disease risk was found in rs6941539 and rs6912200. 11,12 As for the seven lncRNA polymorphisms, rs61516247 was suggested to be associated with AG risk in overall population; while none was observed to have relationship with GC risk. 13

| Epistasis and cumulative effects of the interacting SNPs on AG and GC risk
We further examined the epistasis in the pairwise interacting SNPs. The results suggested when PGC rs6912200 CC genotype was present, lnc-C6orf-132-1 rs7747696 AG+GG genotype conferred to a highest 1.79-fold increased risk among AG-related combinations (P = 0.006, OR = 1.79). Regarding the association with GC risk, when PGC rs6939861 GA+AA genotype was present, lnc-C6orf-132-1 rs7747696 AG+AA genotype could elevate the risk most remarkably (P = 0.023, OR = 2.06, Table 2).
To further evaluate the diagnostic efficacy of these twoway combinations, we calculated the cumulative ORs for them. The samples were divided into three subgroups according to the number of interacting SNPs that individuals carried with. Four pairwise PGC-lncRNA SNPs showed significant dosage effects. Three of them conferred to an elevated GC risk with the increasing number of risk genotypes, including rs6939861-rs7747696 (P trend = 0.043), rs6939861-rs72855279 (P trend = 0.048), and rs6939861-rs80112640 (P trend = 0.044); only rs6939861-rs61516247 had a contrary effect on GC risk (P trend = 0.049, Table 3).

| Interactions of three dimensions among the SNPs and environmental factors
We next investigated the interaction effects of three dimensions among the interacting PGC-lncRNA SNPs and environmental factors, including H. pylori infection, smoking, and drinking. For AG risk, two combinations demonstrated positive interactions with smoking, which were rs9471643-rs7749023 (P interaction = 0.045, interaction index = 3.19) and rs9471643-rs7747696 (P interaction = 0.020, interaction index = 3.81, Table S3). Negative interaction was found between rs6941539-rs7738341 and drinking on GC risk (P interaction = 0.049, interaction index = 0.17, Table S4). However, no significance was observed in any three-way combination of PGC SNP-lncRNA SNP-H. pylori on diseases risk (P interaction > 0.05, Table S5).
The cumulative ORs of the three-way interacting combinations for disease risk were also calculated. No significant dosage effect was indicated in them (Table S6).

| Correlations of the interacting SNPs with PGC protein expression levels
To explore the possible mechanism in the SNP interactions of PGC with its neighbor lncRNAs, we analyzed the influence of the interacting SNPs on PGC protein expression in serum. Among AG-related combinations, PGC rs6912200 CT+TT genotype showed significant lower PGII concentration than CC genotype in both total subjects and controls when lnc-C6orf132-1 rs7749023 and rs7747696 had AA genotype (rs6912200-rs7749023 in total: P = 0.027; rs6912200-rs7749023 in control: P = 0.013; rs6912200-rs7747696 in total: P = 0.021; and rs6912200-rs7747696 in control: P = 0.014, Table 4).

SNPs with lncRNA expression levels
The association between single lncRNA SNPs and expression of the three involved lncRNAs had not been clarified before, and thus, we investigated their expression levels in four genetic models of each SNP. Only rs7749023 was found to be associated with lncRNA expression. Its AC+AA genotype had a significantly higher level of lnc-C6orf132-1 when compared with CC genotype (P < 0.001, Table S7).
We next explored the influence of the interacting SNPs on lncRNA expression in different disease groups. The levels of lnc-C6orf132-1 and lnc-LRFN-2 were correlated with several GC-related combinations. Notably, the AC+AA genotype of rs7749023 showed a higher level of lnc-C6orf132-1 in total subjects only in the presence of rs6939861 GA+AA genotype (P < 0.001), while no difference was observed when rs6939861 had GG genotype.
When the GA+GG genotype of rs61516247 was present, the expression level of lnc-LRFN-2 in total subjects was higher in PGC rs6939861 GA+AA genotype than GG genotype (P = 0.042, Table 5).

| DISCUSSION
In the present study, we newly found SNP interactions between PGC and its neighbor lncRNAs could enhance the susceptibility to GC/AG. Among the 15 pairwise interacting PGC-lncRNA SNPs, five pairs were associated with AG risk and ten pairs were associated with GC risk. Furthermore, several combinations showed obvious epistasis and cumulative effects on disease risk. Based on these results, three-way interactions were discovered when environmental factors were taken into account. We also found the interacting SNPs could affect the expression of PGC protein and involved lncRNAs. To our knowledge, this is the first time to report SNP interactions uniting proteincoding genes and neighbor noncoding genes for the risk of gastric diseases.
As is known to all, human body is a complex organism with tens of thousands of genes, comprising encoding and noncoding genes. Each of them exerts different function and cooperates with each other to ensure coordination of normal activities of life. By exploring the interactions of genetic variation between encoding and their neighbor noncoding genes, we can know more about crosstalk of genes and obtain better understanding for the mechanism of mutual regulation. The effect of an individual SNP on disease risk was usually reported to be weak (OR < 1.5), but combination of interacting SNPs had a moderate (OR ≥ 1.5) or strong effect (OR ≥ 2) on the susceptibility to cancer. 19,20 In our previous individual study, 7 PGC SNPs and 7 lncRNA SNPs involved had been investigated and the results announced that PGC rs6939861 was associated with a weakly increased GC risk (OR = 1.32), while rs6941539 and rs6912200 had no association with any disease risk. 11,12 All the lncRNA SNPs had no effect on GC risk. 13 In the present assembled study, we found that PGC rs6941539 combined with lnc-C6orf-132-1 rs72855279 and rs80112640 had interaction ORs of 4.65 and 4.60 for GC risk; PGC rs6912200 combined with rs72855279 and rs80112640 had interaction ORs of 6.34 and 6.27; and pairwise PGC rs6939861 with lnc-C6orf-132-1 rs7749023, rs7747696, rs72855279, and rs80112640 had interaction ORs of 3.87, 4.88, 5.70, and 5.76 for GC risk, respectively. All of the risk effects are strong and greater than the individual effects of related SNPs, suggesting PGC-lncRNA SNPs could synergistically enhance the susceptibility to GC and be used as more effective markers for risk prediction. Notably, two interactions rs6939861-rs7749023 and rs6939861-rs7747696 also survived the Bonferroni correction, which was a strict method for multiple comparison. Given the strong significance on GC, we further calculated the population attributable fraction (PAF) to assess their clinical or public health values. The RRs were 1.19 and 1.22, and PAFs were 0.093 and 0.105, respectively. It could be drawn from the results that about 9.3% and 10.5% patients with GC in our study might be attributed to their carrying combined risk genotypes of PGC rs6939861 with lnc-C6orf-132-1 rs7749023 and rs7747696. The statistics may guide larger population and indicate the potential value of combined detection of polymorphisms in PGC and its neighbor lncRNAs for GC early screening. Among the studied polymorphisms, some SNPs made no significant contribution to GC/AG risk in the main effect analysis. For example, PGC rs6912200 CT+TT genotype showed no association with AG/GC risk compared with CC genotype (P = 0.637, OR = 1.06; P = 0.547, OR = 0.93, respectively). 11 However, when combined with some ln-cRNA SNPs, obvious epistasis was observed (P = 0.026, OR = 1.46; P = 0.026, OR = 0.16, respectively). Moreover, lnc-C6orf-132-1 rs7747696 was also not associated with AG/GC (P = 0.086, OR = 1.19; P = 0.670, OR = 1.05, respectively). 13 But in the presence of PGC rs6912200 CC genotype, rs7747696 AG+AA genotype was linked to a 1.79-fold moderate increased AG risk, which was the highest in AG-related SNPs. On GC risk, it showed a unique strong effect (OR = 2.06) when PGC rs6939861 GA+AA genotype was present. They also demonstrated a significant cumulative effect, suggesting their cooperation with each other to confer GC susceptibility. Therefore, lnc-C6orf-132-1 rs7747696 AG+GG genotype combined with PGC rs6912200 CC genotype and rs6939861 GA+AA genotype might be the superior SNP models for determination of AG/GC risk, respectively.
Apart from the host genetics, environmental factors also play critical roles in the development of gastric diseases. In this study, two environmental factors including smoking and drinking were found to have modifying effects on PGC-lncRNA SNP interactions. The carcinogenic effect of tobacco smoke on various organs is well recognized, and it accounts for about 50% increase in GC risk. 21,22 Previously, lots of studies have investigated the interaction between other genes and smoking on GC, including TNF, Exo1, CYP1A1, IL-10, ERCC8, GSTP1, and hTERT. [23][24][25][26][27][28][29] They all suggest genetic effects of gene polymorphisms on gastric carcinogenesis can be exacerbated by cigarette smoking. Here, the interactions of PGC rs9471643 and lnc-C6orf-132-1 rs7749023/ rs7747696 on AG risk could also be affected by smoking, although the mechanism has not been understood. Similar to tobacco smoke, alcohol drinking is also a well-acknowledged independent risk factor of GC, which has been reported to  Notes. a P was adjusted by gender, age, and H. pylori infection status; b P values after Bonferroni correction; AG, atrophic gastritis; GC, gastric cancer; CON, control; OR, odds ratio; CI, confidence interval. The results are in bold if P for trend < 0.05.

T A B L E 3 (Continued)
(Continues)  have interactions with genetic variations in several metabolic enzyme genes such as GSTM1 and ALDH2. 30,31 Alcohol is initially metabolized to an intermediate metabolite, acetaldehyde, which is further metabolized and eliminated from the body. 32 Reactive oxygen species (ROSs) are produced during the generation of NADH from the conversion of ethanol to acetaldehyde by alcohol dehydrogenase and may induce gastric mucosal oxidative injury. 33,34 The modifying effects of drinking on SNP interactions for GC risk were also demonstrated in our study, which were PGC rs6941539 and lnc-C6orf-132-1 rs7748341. However, it needs verification whether PGC and neighbor lncRNAs participate in alcohol metabolism. The effects of gene polymorphisms on cancer susceptibility are often achieved by affecting the expression of its encoding  protein. In the present study, we evaluated the influence of interacting SNPs on PGC protein expression in serum. With respect to PGC rs6912200 with lnc-C6orf132-1 rs7749023, the CT/TT+AA genotype can significantly reduce PGII level compared with the CC+AA genotype, suggesting PGC rs6912200 could affect serum PGC expression. Our research group has also found healthy subjects carried with rs6912200 CT, TT, and CT/TT variant genotypes have lower serum expression levels of PGC protein. 11 However, the difference cannot be observed in the subjects carried with rs7749023 AC/CC genotype, indicating lnc-C6orf132-1 rs7749023 might counteract with rs6912200 and upregulate PGC expression. It has been revealed that SNP interactions between PGC with some host genes such as IL1B and PTPN11 may result from the alteration of PGC protein expression. PGC can also interact with polymorphisms in miRNAs that target it, including let-7e, miR-4795, and miR-365b. They can bind to the 3′-UTR region of PGC and inhibit its expression. 35 Our study first reported the SNP interactions of PGC with its neighbor lncRNAs could also affect PGC expression. PGC protein is a well-known marker for the differentiation of gastric epithelial cells. It serves as a proteinase involved in the digestion of protein in stomach, and its levels significantly decrease in AG and dysplasia implicating poorly differentiated cells and are more susceptible to GC. 10 Furthermore, the serum PGII level has been proven to be promising biomarkers for diagnosis of GC and AG in recent years. 36,37 Therefore, PGC protein has close relationship with malignancy of gastric mucosa and could well recognize the risk of malignant gastric lesions. Based on the above findings, it is not difficult to speculate the possible mechanism of SNP interactions in PGC with its neighbor lncRNAs on enhancing the susceptibility to GC/AG may due to their influence on PGC expression.
The association of studied SNPs with lncRNA expression was also investigated. One single lncRNA SNP, rs7749023, was found to affect the expression level of lnc-C6orf132-1. Interestingly, when rs7749023 was combined with PGC rs6939861, the influence on lnc-C6orf132-1 expression could only be observed in the presence of rs6939861 GA+AA genotype but none in GG genotype, suggesting the expression of involved lncRNAs might also be affected by their SNP interactions. Through lncRNA expression profile and Gene Ontology (GO) analysis, we have known lnc-C6orf132-1 has the ability to upregulate and downregulate the expression of some oncogenes or tumor suppressor genes related to GC progression. 13 As an important class of molecular regulators in human genomes, lncRNAs could influence the expression of nearby genes through transcription-related process such as enhancing the activity of gene promoters, which was called cis-acting. 38 In our study, three interacting PGC SNPs are located in the promoter region, including rs6912200, rs6941539, and rs9471643. The neighbor lncRNAs may exert regulatory roles on PGC in cis, act with the SNPs in PGC promoters and thus demonstrate gene-gene interactions. Further in-depth study is needed to elucidate the molecular mechanism involved. F I G U R E 1 The pattern diagram of SNP interactions between PGC and its neighbor lncRNAs. A total of 14 individual SNPs are involved, including seven in PGC, five in lnc-C6orf132-1, one in lnc-LRFN2-1, and one in lnc-LRFN2-2. The effects of interacting SNPs modified by environmental factors could enhance the susceptibility to GC/AG and influence the expression of PGC protein and related lncRNAs In summary (Figure 1, Table S8), we conducted a casecontrol study to explore the SNP interactions between PGC and its neighbor lncRNAs for the risk of GC and AG, the modifying effects of environmental factors, and the influence of SNP interactions on the expression of PGC protein and involved lncRNAs. A total of 15 pairwise interacting PGC-lncRNA SNPs were discovered, in which five pairs were associated with AG risk and ten pairs were associated with GC risk. By comparing the epistasis and cumulative effects, superior SNP diagnostic models for AG/GC were identified respectively. Some three-way interactions of SNPs with smoking and drinking could also be observed. Besides, a few interacting SNPs showed correlations with the expression levels of PGC protein and related lncRNAs in serum, which might account for their gene-gene interactions on GC/AG. Our study would provide research clues for further screening combination biomarkers uniting both protein-coding and noncoding genes with the potential in prediction of the susceptibility to GC and its precursor.