In recent years, SNPs in miRNA target sites have been widely studied to be associated with diseases from hereditary diseases to different cancers (review in [Sethupathy and Collins, 2008]). And SNPs in miRNA genes were reported to involve in the alteration of miRNA processing [Duan et al., 2007; Harnprasopwat et al., 2010], thyroid cancer [Jazdzewski et al., 2009], gastric cancer risk [Peng et al., 2009], ulcerative colitis [Okubo et al., 2011], squamous cell carcinoma [Liu et al., 2010], and nonsyndromic progressive hearing loss [Mencia et al., 2009]. Thus, identifying functional miRNA-related SNPs are of interest for diseases and complex trait studies. However, the effects on miRNA biogenesis and target selection of SNPs in miRNA genes have not been studied extensively. In the first step of this study, we identified the miRNA-related SNPs and summarized their features. Then, we focused on the prediction of potential effects on miRNA biogenesis and target binding by SNPs in miRNA genes through both prediction and experimental validation. Finally, we compiled all the data into the miRNASNP, a free online database. Considering the wide regulation of miRNA and widely existed SNP, our identified functional miRNA-related SNPs will be a useful resource to mine SNP-associated disease or phenotype in population.
Promising SNPs in Human miRNA Precursors
Since miRNA functions as a top regulator involved in a wide range of regulation, SNPs in miRNA genes may affect miRNA function by influencing the miRNA biogenesis process or target interactions, thus cause serious consequences. In this study, we identified 757 SNPs in human miRNA genes, and further examined the HapMap data for these SNPs. Although lots of SNPs lack frequency information in HapMap, we still found that 69 of them were sampled in HapMap and 40 of them with relatively high MAF (q ≥ 0.1) in at least one population. In the Result section, we inferred the effects on maturation of the SNPs in pre-miRNAs by summarizing published examples. According to our speculated rules, eight of the 40 SNPs (rs11614913, rs13299349, rs13447640, rs6971711, rs11844707, rs72246410, rs4822739, and rs17797090) locate in stem regions with ΔΔG > 2 kcal/mol, which may decrease the MIR production. Another seven SNPs (rs2910164, rs2292832, rs10505168, rs5997893, rs12780876, rs10934682, and rs2043556) locate in stem regions with ΔΔG <–2 kcal/mol, which may increase the MIR production. Theoretically, these SNPs would greatly change the production of the mature miRNAs, thus may contribute to genetic difference among different population. However, an SNP in the pre-miRNA with relatively low MAF may also cause serious consequences in individuals once it occurs. For example, two SNPs (+13 G > A) and (+14 C > A) in miR-96 seed region were observed in a Spanish family with autosomal dominant progressive high-frequency hearing loss due to impaired maturation and disturbed target sites [Mencia et al., 2009].
Since the seed region of an miRNA is the most important feature for its target binding [Bartel, 2009], SNPs in miRNA seed region will influence the miRNA target binding and selection directly. Here, we identified 50 SNPs in the seed regions of 41 human miRNA genes and predicted their target gain and loss effects for these SNPs (Fig. 2 and miRNASNP website). Our predicted results indicate that SNPs in miRNA seed regions would cause nearly half targets loss and gain on average. In our dataset, five miRNAs with SNP in seed region (miR-124, miR-125-5, miR-1302, miR-379, and miR-499-3p) are conserved in mammalias (chimpanzee, mouse, rat, and dog). We extracted their conserved targets and performed KEGG pathway and Gene Ontology enrichment analyses. Results show that miRNA-mediated function will be changed greatly after SNP variants. For example, the conserved targets of wild miR-124 show significant enrichment in terms “regulation of apoptosis,” “intracellular membrane-bounded organelle,” and “regulation of cellular biosynthetic process,” but no enrichment for the conserved targets of its variant (Benjamini corrected P value < 0.05). Of them, miR-124 and miR-125a-5p have experimentally validated targets in TarBase [Sethupathy et al., 2006] and miR2Disease [Jiang et al., 2009], we found 135 validated targets would loss for miR-124 and one validated target would loss for miR-125a-5p once the SNP allele changes.
Notable, the SNP rs12220909 in miR-4293, which is the only one SNP located in seed region and sampled in HapMap with q > 0.1. We further analyzed the MAF of rs12220909 in HapMap populations and found the frequencies of the allele C are 0 in both Utah residents with Northern and Western European ancestry from the CEPH collection (CEU) and Yoruba populations, 0.034 in Japanese, and 0.211 in Han Chinese. Chinese has a significant higher C genotype (χ2 test, P < 0.01). The ΔΔG affected by rs12220909 is −0.5 kcal/mol, which means the SNP type is slightly more stable than wild type and it may increase the mature miRNA expression. Since it locates in the seed region, our target gene gain and loss prediction indicated that miR-4293 would loss 1,735 target genes and only gain 199 target genes after G→C substitution. Gene Ontology and KEGG analyses show that lost target genes significantly enrich in the term of “ion binding,” “plasma membrane part,” and “small GTPase regulator activity” (Benjamini corrected P < 0.05), while the gained target genes by SNP variant do not show significant enrichment in any categories. Although, no studies reported the function of this miRNA currently, it is interesting to study its function and associated phenotype.
Besides seed region, other residues in mature miRNA sequence were suggested to play a modest role in target recognition [Bartel, 2009; Grimson et al., 2007]. After computational prediction for miRNA target gain and loss, we further performed experiments to validate the effects on target binding by SNPs in seed region and mature region. We selected 11 miRNA target pairs for three target genes, which are ATP6V0E1, BCL2, and SEMA3F (Table 3). Among these miRNA-target pairs, two (miR-34a/BCL2 and miR-124/ATP6V0E1) have been validated by others [Wang et al., 2009; Wang and Wang, 2006] and we also confirmed both of them. In our experimental results, five of the eight SNPs in seed regions were proved to dysregulate their targets. Three SNPs in mature sequences other than seed regions all have slight influences on their target binding, even an indel SNP. These results support the conclusion that residues in seed region play key roles and other residues in mature sequence have modest effects on miRNA target binding [Bartel, 2009].
It is worthy of note that in this study, we proved the target gain by SNP rs2620381 in miR-627 seed region by experiments. In wild type of miR-627, it can not bind the 3′UTR of ATP6V0E1, while the SNP-type miR-627 gained the ability to target the ATP6V0E1 3′UTR and repressed its expression dramatically in our luciferase experiments. To our best knowledge, this is the first experimentally validated example for target gain affected by a SNP in an miRNA. It provides a new mechanism for miRNA dysregulation in different individuals. Our results show that SNPs in an miRNA gene, especially in the seed region, will alter the target profile of the miRNA by losing original targets and gaining new targets. These SNPs in miRNA seed regions and their target gain and loss information will be a potential useful clue to study the miRNA function and find the SNP-associated disease or phenotype.
Promising SNPs in 3′UTRs of Human Protein Coding Genes
In contrast to the SNPs in miRNA genes, there are more reports that studied the effect of SNPs in 3′UTRs. Loss of a potential miRNA target site may increase the protein expression, while gain of a functional miRNA target site will repress the protein expression, thus affect physiological function and clinical phenotype. Here, using our pipeline, we identified tens of thousands of SNPs locating in potential miRNA target sites and some of them show high MAF, high MAF difference between populations, or positive selection pressure during evolution. Those SNPs will be important candidates for causal variants of human disease. Currently, genome-wide association studies have uncovered many SNPs associated with traits and diseases. The NHGRI GWAS catalogue (http://www.genome.gov/gwastudies, accessed by 2010-12-16) described 1,227 unique SNPs associated with one or more traits (P < 5 × 10−8) [Hindorff et al., 2009]. Among these SNPs, six are in our 3′UTR dataset and three are present in our target loss and gain dataset. They are rs1036819 associated with longevity, rs28927680 associated with triglycerides, and rs1042725 associated with height. The original papers also mentioned that these SNPs in 3′UTR may be involved in the traits by miRNA-mediated regulation but without detail miRNA information. Utilizing our database miRNASNP, users can find the detail information about miRNA and its target gain and loss. For example, when users search rs28927680 in miRNASNP, it will show the SNP locates in the potential target sites of six miRNAs (hsa-miR-1323, hsa-miR-548a-3p, hsa-miR-548e, hsa-miR-548f, hsa-miR-548o, and hsa-miR-548t) in 3′UTR of gene BUD13. SNP rs28927680 is reported to be associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, or triglycerides in human [Kathiresan et al., 2008], hence the SNP-associated miRNA and target site information may shed light on further experiments.
There are more than 1,000 experimentally validated miRNA-target pairs in miR2Disease and Tarbase databases. Based on these data and miRNA-related SNPs in our miRNASNP database, we identified 31 SNPs in 3′UTRs with the abilities to disturb experimental validated miRNA-target pairs. Three of them (rs5186, rs12720208, and rs56109847) have been experimental confirmed to make dysregulate their corresponding targets and associated with diseases. Sethupathy et al. demonstrated that the SNP (rs5186) in the AGTR1 3′UTR mediates allele-specific targeting of miR-155 to AGTR1, thereby modulating AGTR1 protein levels [Sethupathy et al., 2007]. SNP rs12720208 was proved to mediate allele-specific in vitro targeting of miR-433 to the FGF20 3′UTR and confers risk for Parkinson disease [Wang et al., 2008]. Kapeller et al. identified rs62625044 (now merged into rs56109847) in the 3′UTR of HTR3E, which could mediate allele-specific miR-510 targeting. This was associated with diarrhea-predominant irritable bowel syndrome (IBS-D) in females from the United Kingdom and was also confirmed in a German cohort by replication study [Kapeller et al., 2008]. Besides the three validated SNPs, the rest will be attractive SNPs in human miRNA target sites for future studies.