An in silico approach to identify and prioritize miRNAs target sites polymorphisms in colorectal cancer and obesity

Abstract Colorectal cancer (CRC) and obesity are linked clinical entities with a series of complex processes being engaged in their development. MicroRNAs (miRNAs) participate in these processes through regulating CRC and obesity‐related genes. This study aimed to develop an in silico approach to systematically identify and prioritize miRNAs target sites polymorphisms in obesity and CRC. Data from genome‐wide association studies (GWASs) were used to retrieve CRC and obesity‐associated variants. The polymorphisms that were resided in experimentally verified or computationally predicted miRNA target sites were retrieved and prioritized using a range of bioinformatics analyses. We found 6284 CRC and 38931 obesity unique variants. For CRC 33 haplotypes variants in 134 interactions were in miRNA targetome, while for obesity we found more than 935 unique interactions. Functionally prioritized SNPs revealed that, SNPs in 153 obesity and 50 CRC unique interactions were have disruptive effects on miRNA:mRNA integration by changing on target RNA secondary structure. Structural accessibility of target sites were decreased in 418 and 103 unique interactions and increased in 516 and 79 interactions, for obesity and CRC, respectively. The miRNA:mRNA hybrid stability was increased in 127 and 17 unique interactions and decreased in 33 and 24 interactions for the effect of obesity and CRC SNPs, respectively. In this study, seven SNPs with 15 interactions and three SNPs with four interactions were prioritized for obesity and CRC, respectively. These SNPs could be used for future studies for finding potential biomarkers for diagnoses, prognosis, or treatment of CRC and obesity.

evidence that functionally links obesity to the risk of several cancers including CRC. [2][3][4][5] A range of genetic and environmental factors contributes to the risk of CRC and obesity. Experimentations have documented a range of pathological molecular alterations contributing to CRC and obesity, among which dysregulation of miRNAs is specially highlighted.
MiRNAs are small noncoding RNAs that regulate tumorigenesis through functioning as either tumor suppressors or oncogenes. 6 They participate in transcriptome regulation by binding to 3'-untranslated region (3'-UTR) of target mRNAs and either suppressing translation or inducing mRNA degradation. It has been shown that the intricate miRNA:mRNA network may be influence by the presence of single nucleotide polymorphism (SNP) within or near miRNA binding site. Such miRNA binding site polymorphism may influence miRNA:mRNA interaction through altering miRNA:mRNA hybrid stability, secondary structure of local RNA, structural accessibility of target sites, or even creating novel biding sites. Several recent genetic association studies have highlighted the contribution of miRNA binding site polymorphisms to the risk of complex disease specially CRC. 7,8 In this study, we leveraged data from genome-wide association studies (GWASs) on CRC, obesity, and obesity-related traits to explore putative disease-associated polymorphisms that influence miRNA target sites. A range of bioinformatics analyses was also performed to predict functional consequences of these polymorphisms and provide a list of prioritized variants for future experimentations.

| METHOD AND ANALYSIS
The bioinformatics methods applied in this study are depicted in Figure 1. Variants from GWA studies on CRC risk, CRC survival, obesity risk, and obesity-related traits were retrieved from NHGRI-EBI GWAS catalog (gwas_catalog_ v1.0.1-associations_e90_r2017-10-10), available at (https:// www.ebi.ac.uk/gwas/). The HaploReg (version 4.1) database was used to construct population-specific association blocks for each GWAS lead variant based on 1000 Genome project Phase I populations and a defined linkage disequilibrium threshold (i.e., r 2 of at least 0.6). The obtained association blocks, containing all putative disease-associated variants were intersected with a miRNA targetome data set to identify putative disease-associated variants that are resided within or in flanks of an experimentally verified or a computationally predicted miRNAs target site. The functional effects of these polymorphism on different aspects of miRNA:mRNA interactions, including local RNA secondary structures, structural accessibility of target sites, and miRNA:mRNA hybrid stability, were analyzed.

| The MIRNA targetome data set
A comprehensive data set of experimentally verified and computationally predicted miRNA target sites were obtained by combining miRNA:mRNA interactions from the StarBase (version 2, available at http://starb ase.sysu.edu.cn/starb ase2/ index.php), the targetScan (version 7.1, available at http:// www.targe tscan.org/vert_71/), and the microRNA.org (available at http://www.micro rna.org/micro rna/home.do) databases. The data set includes both computationally predicted target sites and those that are validated using a range of methods such as CLIP-Seq. 9 The 25 nucleotides upstream and downstream of target sites were considered as flanking regions.

| Putative CRC/obesity-associated variants and association blocks
The lead SNP of GWA studies with a p-value ≤1.0 × 10 −6 were retrieved. The HaploReg (version 4.1, available at https://pubs.broad insti tute.org/mamma ls/haplo reg/haplo reg. php) was used to retrieve all proxy SNPs (defined as r 2 ≥0.6) of GWAS lead variants based on 1000 Genomes project super-populations. 10 Five categories of obesity-related traits were considered (Table 1). Moreover, we obtained GWAS index variants from GWA studies associated with CRC risk and CRC survival.

| Effect of targetome SNPS on local RNA secondary structures
The RNAsnp program (version 1.2, available athttps://rth. dk/resou rces/rnasn p/) 11 used to assess impacts of CRC or obesity-associated SNPs in target sites and flanking regions on secondary structures of local RNA. RNA sequences and SNPs were inputted in RNAsnp to generate probability matrix associated to wild-type (WT) and mutant (ALT) alleles. RNAsnp shows the structural difference between WT and ALT alleles with the Euclidean distance measure (d) for all sequence intervals and reports the polymorphism with the maximum base pair distances (d max ) and the corresponding p-value. A p-value less than 0.2 is considered significant. 11

| Structural accessibility of target sites
As described previously, changes in structural accessibility of target sites may interfere with miRNA binding. RNAplfold program was used to calculate structural accessibility for 3′-UTRs with the ALT and WT alleles. The difference between the target site accessibility of WT and ALT allele was computed using ∆Pu. 12,13 2.5 | MIRNA:mRNA hybrid stability A target site polymorphism may also interfere with miRNA binding through altering the stability miRNA:mRNA hybrid structure. The RNA hybrid v2.1.2 (available at https://bibis erv.cebit ec.uni-biele feld.de/rnahy brid/) 14 was employed to identify free energy of hybridization (ΔGhybrid) for ALT and WT alleles. ΔΔGhybrid for each miRNA target site polymorphism (calculated as ΔΔGhybrid = ΔGhybrid assassinate to ALT-ΔGhybrid assassinate to WT), is a measure of the SNP effect on the stability of hybridization. A positive ΔΔGhybrid is an indication of a decreased stability that is imposed by the ALT allele.

| Putative obesity/CRC-associated SNPs in miRNAs targetome
We obtained 159 GWAS index SNPs from 32 studies on CRC risk and 45 GWAS index SNPs from three studies on CRC survival by mining GWAS catalog. After extending polymorphisms to association haplotypes, 5163 and 1121 putative disease variants were retrieved for CRC risk and survival, respectively. The CRC-associated index and haplotypes variants were reported in a range of populations including Europeans (n = 4464), Asians (n = 2216), Africans (n = 53), and Americans (n = 130), with some variants being reported in multiple populations. After mapping putative CRC-associated variants to the miRNA targetome data set, 134 and 48 unique miRNA:mRNA:SNP were identified for CRC risk and survival, respectively. In other words, 33 the CRC-associated index and haplotypes variants were identified to reside within or near miRNA binding sites, potentially influencing 134 miRNA:mRNA:SNP interactions (Table 2). Moreover, 11 unique CRC survival-associated variants were found to reside within or near miRNA binding sites, influencing 48 miRNA:mRNA:SNP interactions (Table 3).
We retrieved 1079 GWAS index variants from 75 studies pertaining to five obesity-related categories. After extending to association blocks, 38931 unique putative obesity-related variants were obtained. Since the most of included GWA studies were related to European populations, the more of putative obesity-related variants belonging to the European.

| Structural accessibility of target sites
Accessibility of target site plays an important role in miRNAmediated regulation of gene expression. 16,18 The impact of CRC/obesity-associated variants residing within or near miRNA binding sites on accessibility of target sites is measure by ΔPu. For obesity, we found a decreasing accessibility effect 45% (up to −37%) and increasing effect for 55% (up to 30%) of target sites, which 418 unique interactions decreased and 516 interactions increased accessibility effect. While the results for CRC were 67% (up to −39%) and 43% (form up to 9%), respectively. In another word, for CRC survival 24 unique interactions shows decreased and 24 interactions shows increased accessibility effect, and for CRC risk 79 interactions decreased while 55 interactions increased accessibility effect. These results shown that a significant numbers of target site SNPs could effect on the role of miRNA on target RNA and consequently on the role of these genes on CRC and obesity. Detailed results are shown in Figure 3 and supplementary file 3.

| MIRNA:mRNA hybrid stability
The effects on target site variant on interaction between miRNA and its target gene can be measured by change in hybrid stability. This alteration is based on base pair creation or disruption. 16 Here, we applied ΔΔGhybrid to measure target-SNPs effects on included miRNA-mRNA interactions. The miRNA:mRNA hybrid stability was increased in 127 interactions (up to 6.6 kcal/mol) and 17 interactions (up to 4.5 kcal/ mol) for the effect of obesity and CRC SNPs, respectively, while decreased in 33 interactions (up to −2.4 kcal/mol) and 24 interactions (up to −4.7 kcal/mol) for them. The results are shown in Figure 4. Finally, we prioritize interactions for CRC and obesity based on effect of targetome SNPs on local RNA secondary structures, structural accessibility of target sites, miRNA:m-RNA hybrid stability, and annotation of polymorphisms with expression quantitative trait loci (eQTL). To validate the prioritized genes, as well as to investigate the association between the obesity genes with CRC, we analyzed the genes in TCGA data (available at https://portal.gdc.cancer.gov/). 19 The obtained results are presented in Table 7.

| DISCUSSION
Obesity is one of the most common complex diseases in the world and CRC is a common obesity-related cancer. Recently many GWA studies focused on discovering the variants related to these diseases and found many novel polymorphisms. According to these data several genes appears to be involved in these diseases pathogenesis while different miRNAs have a role in their regulation. Thus, variants in miRNA:mRNA binding sites may play important role in CRC and obesity. Many of the newly identified associations in GWAS are related to noncoding regions variants. 20 The role of these variants in tumor development and cancers risk were previously investigated in many studies. [21][22][23] These variants can be prioritized by bioinformatics analyses without any cost and laboratory works which could be used in future experimental studies. Here, we used a bioinformatics approach to find polymorphisms which could potentially effect these diseases, and finally, we prioritized the most important miRNA:mRNA interactions based on using different bioinformatics tools to find the functional significance of these variants. The effect of miRNAs binding site polymorphisms on the binding sites structural changes have been studied previously 16,24 which may influence on miRNA regulatory effect. In our study, 10 and 33 miRNA binding site SNPs significantly changed on local RNA secondary structures in obesity and CRC, respectively. According to the miRNA:mRNA hybrid stability plot ( Figure 4) for obesity most of SNPs in target sites do not have any effect on hybrid stability but for CRC most of miRNA binding site SNPs were effective on hybrid stability. While about the accessibility of target sites most of CRC and obesity-related SNPs do not have any significant effect. Finally, seven SNPs with 15 interactions and three SNPs with four interactions were identified for obesity and CRC, respectively.  These results revealed that miRNA:mRNA hybrid stability, structural accessibility, and RNA secondary structures could be influenced by obesity or CRC-associated variants in miRNA target sites. We used different common algorithms for miRNA:mRNA interaction prediction with highly conserved target site to manage balance between specificity and sensitivity of bioinformatics algorithms. According to our knowledge, all prioritized SNPs (10 SNPs in Table 7) were not studied for the association with obesity or CRC. About the association of these SNPs with obesity or CRC, only rs4759058 was predicted which to be related to waist-hip ratio. 25 From them rs16912239 and rs11085537 were not eQTL SNPs. We finally found that two and six miRNA binding site SNPs have related to obesity and CRC, respectively.
The TCGA data analysis validated prioritized CRCrelated genes and displayed that the obesity prioritized genes were also effective on CRC. Based on the obtained results, the ZSCAN32, PLCD4, HOXC13, and ADCY9 from obesity predicted genes, as well as LIMA1, and SLC22A23 from CRC predicted genes were significantly related to CRC survival rate. Furthermore, there are several lines of evidence on the role of prioritized genes in pathogenesis of CRC or obesity. The GWA studies confirm that, the genes in prioritized interactions for CRC were associated to the cancer. [26][27][28] The expression of these genes have been considered in the recent studies, for example, the GOLGA2 newly identified as novel differentially expressed proteins in CRC, 29 the expression of LIMA1 changed in Cholangiocarcinoma, 30 and the expression of SLC22A23 expression increased in subjects with Laryngeal squamous cell carcinoma. 31 Besides, the CRC prioritized genes were also associated to obesity for instance, the LIMA1, 32 SLC22A23, 33 and LOC440518 34 related to body fat distribution, obesity-related traits, and BMI, respectively.
The association of predicted obesity genes with obesity approved in the GWA studies. 33,[35][36][37][38][39] For instance, LEMD2 is associated with body weight, 38 HOXC13 and ADCY9 polymorphisms are associated with waist-to-hip ratio and BMI, respectively. [39][40][41] The GWAS and recent published studies significantly proved the genes and miRNAs in obesity prioritized interaction were also associated to cancer. For instance, the ADCY9, 40 and HOXC13 41 were related to cancer, and adverse response to chemotherapy, respectively. The ADCY9 expression remarkably increased in colon tumor tissue and can be a poor prognostic factor for colon survival. 42 The HOXC13 and CSNK2B expression increase in breast cancer and play a key role in the progression of breast cancer. 43,44 The expression of cyclins regulated by HOXC13 and knockdown of this gene resulted to cell cycle arrest and apoptosis, in the colon cancer cell lines. 45 The roles of identified miRNAs in cancer were also determined. The miRNAs in prioritized interactions were related to CRC and other type of cancers. In this case, the miR-30 family, miR-504, miR-34c, miR-488, miR-15b, miR-195, miR-497, miR-15a, miR-503, miR-141, miR-182, miR-96, and miR-136 were also related to invasion, proliferation, metastasis and tumor growth, or survival of several cancers including CRC. [46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63][64][65] However, there are few studies on the role of included prioritized miRNAs for interactions with obesity. For obesity there are some studies on mice, for instance, miR-16-5p decrease with high weight and a mutation in miR-16 gene lead to increasing in body weight. 66 The upregulation of miR-16 was observed in calorie-restricted mice with lower body weight. 67 The miR-30e-5p and miR-141-3p were upregulated in high-fat diet mice. 68,69 The miR-30b, miR-16, miR-15b, and miR-15a were downregulated in diet-induced obesity mice. 70 According to our results and above described documents, the miRNAs and mRNAs in obesity prioritized interactions played significant roles in CRC, this represented a strong genetic linkage on the mentioned diseases. Therefore, prioritized polymorphisms with miRNA:mRNA interactions identified in this study could be important for future investigation on the role of miRNAs and their targeted genes on CRC and obesity.

| IN CONCLUSION
This was the first comprehensive systematic and bioinformatics approach for identification and prioritization of variants in miRNA binding sites of genes related to obesity or CRC as two most common complex and related diseases. The results of our study will be valuable for future association studies and functional studies to examine the role of these miRNA target site polymorphisms and genes and their association based on these identified interactions. These SNPs and interactions could be used for future studies for finding potential markers for diagnoses, prognosis, or treatment of CRC and obesity.