Characterization of lncRNA LINC00520 and functional polymorphisms associated with breast cancer susceptibility in Chinese Han population

Abstract Background The aim was to evaluate the association between the LINC00520 genetic polymorphisms and breast cancer (BC) susceptibility. Methods Nine single‐nucleotide polymorphisms (SNPs) on LINC00520 genotyping were performed in 504 BC patients and 505 cancer‐free controls in Chinese Han population to study the relationship between LINC00520 polymorphism and BC susceptibility. qRT‐PCR and luciferase tests were used to explore how rs12880540 affected the expression of LINC00520. Results The genotype GG (OR:3.58, 95%CI:1.32‐9.69) in rs8012083 increased the risk of triple‐negative BC. The genotype GG (OR:0.31, 95%CI:0.14‐0.69) in rs8012083, the genotype AA (OR:2.74, 95%CI:1.01‐7.42) in rs2152275, and genotype TG (OR:1.62, 95%CI:1.04‐2.52) in rs12880540 were associated with HER‐2 status. The dominant (OR:0.65, 95%CI:0.45‐0.95) and overdominant genetic model (OR:0.67, 95%CI:0.46‐0.98) consistently showed that rs11622641 T was significantly associated with lower risk of BC. Similarly, the recessive genetic model (OR:1.57, 95%CI:1.07‐2.30) of rs12880540 and the dominant (OR:1.62, 95%CI:1.24‐2.11) and overdominant (OR:1.56, 95%CI:1.19‐2.03) genetic model of rs2152278 may increase the risk of BC. The relative expression of LINC00520 increased linearly with the increase in the number of rs12880540 mutations. rs12880540 alleles were due to the interaction between LINC00520 and miR‐3122 at T, but the mutation of rs12880540 G > T had no effect on the binding ability of LINC00520 and miR‐3122. Conclusion A genetic variant of rs8012083 in LINC00520 may be used as a biomarker for triple‐negative BC after further evaluation of diagnostic tests. The genetic variant of LINC00520 was related to the susceptibility of BC, and rs12880540 might affect the corresponding mRNA expression of lncRNA LINC00520.


| INTRODUCTION
Breast cancer (BC) is the second most malignant tumor in the world, ranked after lung cancer. BC remains the leading cause of cancer deaths in Chinese women, placing a huge burden on public health. 1 In China, new BC cases accounted for 12.2% of all new cancer cases and 9.6% of all deaths. 2 With the Chinese population increasing and accelerated aging, the number of new BC cases has been continually rising in recent years and the age of onset tended to be younger. 3 The high incidence and mortality of BC in China have become a public problem that seriously threatened women's physical and mental health. 4 Finding the specific susceptibility biomarker and identifying individuals with higher risk of BC could be helpful to improve the early diagnostic rate of BC.
With the development of high-throughput technology, emerging evidence demonstrated that genetic alterations, including single-nucleotide polymorphisms (SNPs), insertions, deletions, and copy number variations, were associated with the risk of various diseases including BC. 5,6 Notably, SNPs, especially in irregular regions of protein-coding and related with diseases, have been extensively studied and reported widely. 7 The expression of the same lncRNA transcripts varies by different health conditions, age, tissues, even cells, which can indicate their potential as possible biomarkers and be predictive of diagnosis and prognosis of diseases. 8,9 Enormous studies have proven that lncRNAs were associated with tumorgenesis, progression, metastasis, and drug resistance, functioning as oncogenes, tumor suppressor genes, or both, involving in shaping multiple biological characteristics of cancer at transcriptional, posttranscriptional, and epigenetic levels. [10][11][12] In addition, the abnormal expression levels of different lncRNAs with oncogenic and tumor suppressor functions have been frequently found in the development of BC. 13 Particularly, adverse rules of miRNAs-sponging ln-cRNAs, involved in ceRNA networks (ceRNETs), have been shown in BC. 14 Eades discovered lincRNA-ROR/miR-145/ ARF6 mRNA networks interacting with metastasis and prognosis in triple-negative BC in 2015. 15 Long intergenic nonprotein coding RNA 520 (LINC00520) is located at human chromosome 14q22.3. LINC00520, a highly conserved long-chain noncoding RNA with a length of about 20kb, is widely expressed in various tissues. A study reported that LINC00520 was overexpressed in laryngeal squamous cell carcinoma tissue. 16 The upregulation of LINC00520 expression might play an important biological role in the metastasis of laryngeal squamous cell carcinoma. The expression of LINC00520 was upregulated in primary nasopharyngeal carcinoma. 17 In addition, the level of LINC00520 was significantly increased in renal cell carcinoma. 18 LINC00520 promoted the migration of BC cells induced by Src, STAT3, and PI3K and played a functional role in BC. 19 Although it was reported that the lncRNA LINC00520 was associated with BC, the complexity of its function was also determined by the complexity of the lncRNAs structure. However, the association among LINC00520 SNPs and the susceptibility of BC and its mechanism is not clear yet. In this study, the combined database, literature retrieval, and bioinformatics techniques were used to screen lncRNA and its SNP site associated with BC. The association between long noncoding RNA genetic variation and BC susceptibility was studied, as well as biological function was explored by the method of molecular epidemiology.

| Study subjects
A total of 1009 subjects were enrolled in the study, including 504 cases with BC and 505 age-matched (±2 years) healthy controls (Table 1). Subjects were included in accordance with strict standards: (a) Patients with BC must have a definite pathological diagnosis of new cases; (b) The patients with BC were selected from the First Affiliated Hospital and the Third Affiliated Hospital of Zhengzhou University from 2013 to 2015; (c) Patients with BC have never received radiotherapy or chemotherapy or any other due to the interaction between LINC00520 and miR-3122 at T, but the mutation of rs12880540 G > T had no effect on the binding ability of LINC00520 and miR-3122. Conclusion: A genetic variant of rs8012083 in LINC00520 may be used as a biomarker for triple-negative BC after further evaluation of diagnostic tests. The genetic variant of LINC00520 was related to the susceptibility of BC, and rs12880540 might affect the corresponding mRNA expression of lncRNA LINC00520.

K E Y W O R D S
breast cancer, LINC00520, lncRNA, single-nucleotide polymorphisms, susceptibility treatment before; (d) Except BC, there were no other diseases in all cases.
The control was collected from the site of an epidemiological survey of cardiovascular disease involving more than 20 000 people in Henan Province during the same time. The healthy controls were matched with the BC group according to the age (±2 years), who were healthy without any history of cancer or any other chronic disease. None of the participants had any affinity.
All subjects volunteered to participate in the study and signed informed consent. The study has been approved by the Medical and Health Research Ethics Committee of Zhengzhou University.

| Data collection
The data of the patients with BC were obtained through medical records of the hospital, including names, ages, nationality, place of origin, menarche age, menopause status, whether there was a history of breastfeeding and childbearing, cancer history of first-degree relatives (parents, siblings, and children), second-degree relatives (grandparents, uncles, aunts), along with estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor-2 (Her-2). The data of controls were collected by face-to-face questionnaire designed by experts, asked by trained investigators. The content of the data was the same as BC cases. All the collected information was entered parallel with EpiData software. The whole process of investigation was carried out under strict quality control, including collection, collation, and entry of information.

| DNA extraction
Peripheral blood (5 mL) was collected from each subject using an anticoagulation vessel containing ethylenediaminetetraacetic acid (EDTA) for DNA extraction and genotyping. The genomic DNA was extracted using genomic DNA extraction kit (DK601-02 centrifugal column type) of Shanghai Lifefeng Biotech Co., Ltd and the operation process was carried out in accordance with the instructions strictly. All the extracted genomic DNA was labeled, measured, and finally stored in −80°C refrigerator for future use.
In this study, rs11622641, rs7157819, rs12880540, and rs2152275 located in LINC00520 were genotyped by SNPscan. Because of the low score of other SNPs, SNPscan method was not suitable. Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) technique was used for genotyping of rs4144657, rs2152278, and rs7142488 in LINC00520, while rs8008130 and rs8012083 using created restriction site PCR (CRS-RFLP). While there was no suitable restriction endonuclease near the mutation site, we needed to change a base on the primer so that a new restriction site could be formed with the base of the mutation site, thus CRS-RFLP method was used. Primers for PCR amplification were designed using primer 6.0 software. NCBI BLAST website (https ://www.ncbi.nlm.nih.gov/tools/ primer-blast/ ) was used to evaluate the specificity of primers. Gradient PCR technique was used to standardize the conditions of DNA amplification and optimize the annealing temperature for the primers set. The suitable restriction enzymes were selected by WatCut website (http://watcut.uwate rloo. ca/templ ate.php; Table S3). Moreover, approximately 5% of samples were randomly selected for Sanger sequencing, yielding a 100% concordance rate between the two methods.

| Quantitative real-time PCR analysis
The relative expression of LINC00520 was detected by the Eco Real-Time PCR System (Illumina). The primers of LINC00520 and GAPDH were synthesized by TaKaRa Biotechnology Co. Ltd. The reaction conditions of qRT-PCR were performed in strict accordance with the following instructions. All specific primers used in quantitative RT-PCR were shown below, LINC00520 primer: 5′-AAGCAGGACACAATTACAAC-3′ and 5′-GCAGGTCCGAGGTATTCGTC-3′, GAPDH primer: 5′-CGGAGTCAACGGATTTGGTCGTAT-3′ and 5′-AGCCTTCTTCATGGTGGTGAAGAC-3′. All samples were independently measured in triplicate. Finally, the relative expression of the gene was the average of three replicates per sample. The relative expression of LINC00520 was calculated by 2 −ΔΔCT method, and GAPDH was used as the internal reference gene (X ± SD). 20

| Cell culture, transfection, and luciferase reporter assay
To further examine whether rs12880540 could have impact on the binding of miR-3122 and LINC00520, predicted by lncRNASNP2 database (Figure S1), 293T cell lines were obtained and grown in DMEM supplemented with 10% fetal bovine serum (GIBCO) in a humidified atmosphere of 5% CO2 at 37°C. All cell lines were never subcultured for more than 3 months and sequenced by DNA using the Applied Biosystems Amp F/STR Identifier kit and last performed in September 2018. The miR-3122 mimics and negative control (NC) were synthesized in Genecreate. For transfection assays, 293T cells were seeded in 48-well plates and simultaneously transfected with PGL3-basic-luc vector and miR-3122 or NC mimics using Lipofectamine 3000 (Invitrogen). After 48 hours, cells were harvested, and Renilla luciferase/Firefly luciferase activities between different alleles were detected and analyzed according to the manufacturer's instruction (dual luciferase assay system, Promega). All experiments were performed independently in triplicate.

| Quality control
Negative control was performed for each batch of PCR amplification to avoid false positive of sample contamination. About 5% of samples were randomly selected for sequencing to ensure the reliability of genotyping results. Figure S2 showed the PCR-RFLP and CRS-RFLP techniques, while the sequencing results were shown in Figure S3. The qRT-PCR experiments were repeated three times independently to reduce the measurement error.

| Statistical analysis
The sample size of the study (n = 460) was calculated using Power Analysis and Sample Size (PASS) software based on study power (0.9) and minimal alleles frequency (0.1). Student's t test was used for continuous normal distribution variables and Chi-square test was used to classify variables. Hardy-Weinberg equilibrium (HWE) was examined to compare the observed genotype frequencies of cancer-free controls with the expected genotype frequencies by goodness-of-fit Chi-square test. Odds ratios (ORs) and its 95% confidence intervals (CIs) were calculated to evaluate the association between SNPs and BC susceptibility adjusted for potential confounding factors, including age, age of menarche, age of menopause, menopausal status, pregnancy number, abortion number, breastfeeding history, and family history of BC in first-degree relatives. MDR 2.0 software was used to analyze interaction between polymorphic sites in lncRNA gene and reproductive factors. SHEsis (http://analy sis.bio-x.cn/myAna lysis.php) was conducted to calculate the difference of haplotype frequencies in both patients and controls. The input of data was carried out by EpiData software with double parallel entry. The statistical analysis was performed by SPSS 23.0 software.

| Characteristics of the participants
The average age of BC cases was 48.00 years (±9.85), while that of the control group was 48.15 years (±9.61). No statistical differences were found in the distribution of age, menopause state, age at menopause, and number of abortion between the BC cases and cancer-free controls (all P > .05  Table 2, the genotype distributions of the nine SNPs in control group all conformed to HWE (P > .05). The relationship between SNPs and BC risk based on different genetic models is summarized in Table 2

| Stratified analysis of SNP genotypes and BC risk
We also examined the effect of SNPs on BC susceptibility in different subgroups of demographic factors (Table 3). CT + TT in rs11622641 was a protective genotype of BC susceptibility in females with age less than or equal to 50 years (OR:0.55, 95%CI:0.33-0.92), age at menarche less than 14 years (OR:0.55, 95%CI:0.34-0.88), number of pregnancy more than twice (OR:0.60, 95%CI:0.36-0.98), and number of

| Receptor status and the genotypes of nine SNPs
As shown in :0.14-0.69) in rs8012083 were associated with HER-2 status. There was no discovery between SNPs and ER and PR. Table 5 showed the association between nine SNPs of LINC00520 and the different molecular typing states of BC. The genotypic GG (OR:3.58, 95%CI:1.32-9.69) in rs8012083 increased the risk of triple-negative BC.

| Analysis of haplotype
The haplotype analysis was conducted to evaluate the combined effect of four LINC00520 SNPs. As shown in Table  6, C rs7157819 T rs12880540 A rs2152275 T rs11622641 was the common haplotype in cases and controls with a frequency of 0.05% and 0.078%, respectively, and could decrease BC risk by 37%   showed that no haplotype was associated with BC risk with P < .05.

| Gene-reproductive factor interaction analysis
We further explored the interaction between functional SNPs and other factors on BC risk using multifactor dimensionality reduction (MDR) analysis ( Table 7). The optimal geneenvironment interactive model with testing balance accuracy (TBA) of 0.70 and cross-validation consistency (CVC) of 7/10 revealed that the number of pregnancies more than twice with number of abortions more than twice carrying rs2152278 T allele had 5.24 times higher risk of BC over those without above characteristics.

| Functional correlation of rs12880540
genotypes with the relative expression of LINC00520 in plasma As shown by qRT-PCR analyses (Figure 1), the average relative expression of LINC00520 in the plasma of individuals with rs12880540 TG (1.61 ± 0.35, n = 44) and rs12880540 GG (2.25 ± 0.29, n = 22) was both higher (both P < .0029) than those only with rs12880540 TT (1.03 ± 0.48, n = 40). Furthermore, with the increase of C genotype, LINC00520 expression appeared to be elevated.

| Effect of rs12880540 mutation on the binding of miR-3122 and LINC00520
As the database predicted, rs12880540 G > T mutation could gain/loss a binding site for miR-3122 and LINC00520, which might regulate the expression of LINC00520 (Figure 2). To investigate the effect of rs12880540 on the binding of miR-3122 and LINC00520, we constructed luciferase reporter containing part of LINC00520 covering rs12880540 G > T allele cotransfected with miRNA mimics in 293T cells, measured and Note: The best model was selected as the one with the maximum testing balance accuracy and maximum cross-validation consistency.
In this study, the best interaction model was the three-factor model including number of pregnancies-number of abortions-LINC00520 rs2152278.

F I G U R E 1
The relative expression of LINC00520 with different rs12880540 genotypes in plasma. The relative mRNA expression of LINC00520 levels in blood plasma from 106 cancer-free controls was significantly higher in the TG (1.61 ± 0.35) and GG genotypes (2.25 ± 0.29) than the TT genotype (1.03 ± 0.48) (all P < .001) calculated the relatively luciferase activity. Luciferase reporter assays displayed that the luciferase activity of the construct with the risk rs12880540 G allele decreased when transfected with miR-3122 mimics and there was no change when transfected with NC, which not indicated that rs12880540 G would affect the combination of miR-3122 and LINC00520.

| DISCUSSION
So far as we know, no association between genetic variation of LINC00520 and BC susceptibility has been explored. In the study, a case-control research was conducted to analyze the association between LINC00520 genetic susceptibility and the risk of BC. After adjusted by age, menarche age, menopausal status, number of pregnancies, number of miscarriages, history of breastfeeding, and family history of BC, the study found that the variation of rs11622641 allele was statistically associated with a reduced risk of BC. The variation of rs12880540 and rs2152278 alleles increased the risk of BC. The results showed that TG, rs2152275 genotype TG, and rs8012083 genotype AA of rs12880540 genotype GG were significantly correlated with the positive status of Her-2 receptor. Haplotype analysis showed that LINC00520 C rs7157819 T rs12880540 A rs2152275 T rs11622641 haplotype could reduce the risk of BC. The homozygous mutant GG at rs8012083 site could significantly increase the risk of triple-negative BC. A genetic variant of rs8012083 in LINC00520 could be used as a biomarker for triple-negative BC. However, the mechanism of its influence on the susceptibility to triple-negative BC needed to be further studied.
Previous studies have shown that SNP could affect the level of genes expression through allelic changes, thus participating in the carcinogenesis of tumor. Compared with rs619586 A allele, the expression of MALAT1 in individuals carrying rs619586 G allele was lower, which might be related to the decreased susceptibility to BC. 20 The G risk gene of rs6983267 site of lncRNA CCAT2 could increase the expression of CCAT2, which was related to the risk of colorectal cancer. 21 The results showed that the genotype GG at rs12880540 site significantly increased the risk of BC compared with the genotype TT. The relative expression of LINC00520 showed a linear trend with the increase in the number of rs12880540 mutations. It was suggested that LINC00520 gene polymorphism could increase the risk of BC and affect the expression of LINC00520 in plasma, which played an important role in the occurrence and development of BC, consistent with the results of the increased expression of LINC00520 in BC. 19 The SNP mutations could change the activity of ln-cRNAs transcriptional regulatory region and affect the expression of lncRNAs by changing the secondary structure of lncRNAs to obtain or lose the binding site of miRNAs, thus affecting the development, metastasis, and prognosis of tumor. [22][23][24] The biological function of the secondary structure and the miRNA binding ability of the SNPs of the LINC00520 were predicted using the lncRNASNP2 database (http://bioin fo.life.hust.edu.cn/lncRN ASNP) and the RNAfold website (http://rna.tbi.univie.ac.at/cgibin/RNAWe bSuit e/RNAfo ld.cgi). The predicted results showed that the secondary structure of the LINC00520 has changed after the mutation of the rs12880540 allele G to T, while the two binding sites of the miR-92a-2-5p and the miR-4648 and the four binding sites of the miR-3122, the miR-3913-5p, the miR-4259, and the miR-4425 were lost. The results of double luciferase reporter gene experiment showed that LINC00520 and miR-3122 interact with each other because of T, but the variation of rs12880540 G > T had little effect on the binding ability of LINC00520 and miR-3122. Therefore, the effect of rs12880540 G > T on BC susceptibility may not be due to its change in binding ability of LINC00520 to miR-3122.
BC is a disease with extremely complex pathogenesis. It is generally believed that BC is the result of the interaction of many factors, such as personal lifestyle, environment, genetic, and reproductive factors. 25 So far, studies have found individual or family history of smoking, drinking, obesity, menarche at earlier age, late menopausal, infertile, multiple miscarriages, genetic factors BC or ovarian cancer, genetic mutations, endogenous hormone exposure, and exogenous hormone intake (oral contraceptive and hormone replacement therapy), as well as other environmental and reproductive factors as risk factors for BC, while breastfeeding and physical activity were well-known protective factors. [26][27][28][29] The results of this study showed that there was obvious interaction between the number of pregnancies, the number of miscarriages, and rs2152278. However, the mechanism of its interaction needs to be further studied.
For the first time, we found the association between LINC00520 gene polymorphism and BC susceptibility. The