By continuing to browse this site you agree to us using cookies as described in About Cookies
Wiley Online Library is migrating to a new platform powered by Atypon, the leading provider of scholarly publishing platforms. The new Wiley Online Library will be migrated over the weekend of February 24 & 25 and will be live on February 26, 2018. For more information, please visit our migration page: http://www.wileyactual.com/WOLMigration/
MicroRNAs (miRNAs) are a family of endogenous small noncoding RNAs involved in various developmental and physiological processes by negatively regulating gene expression [Bartel 2004; Zhang et al., 2007]. To date, thousands of miRNAs in human and other species have been identified in miRBase database [Kozomara and Griffiths-Jones, 2011]. miRNA genes may reside in introns of protein coding genes or intergenic regions. They are initially transcribed in the nucleus as long primary transcripts (pri-miRNAs) and further processed by the RNase Drosha to hairpin precursor miRNAs (pre-miRNAs) [Lee et al., 2003]. Then, the pre-miRNA hairpins are exported to the cytoplasm and processed into ∼22 nt miRNA duplex by RNase Dicer [Bartel, 2004]. One strand from the miRNA duplex (miR-5p/miR-3p duplex) containing the less stable 5′ end is preferentially selected and loaded onto the RNA-induced silencing complex (RISC) to produce a functional, mature miRNA (MIR) [Khvorova et al., 2003]. MIRs recognize their target mRNAs mainly by base-pairing interaction between nucleotides 2 and 8 (seed region) from its 5′ end and the complementary nucleotides on the 3′ untranslated region (3′UTR) of target mRNAs [Lai, 2002; Lewis et al., 2003]. Currently, it is estimated that an miRNA may regulate hundreds of target genes and most of human protein coding genes are regulated by miRNAs [Betel et al., 2008; Friedman et al., 2009; Krek et al., 2005].
Single nucleotide polymorphisms (SNPs) are important variations for the diversity among individuals, as well as leading to phenotypes, traits, and diseases [Shastry, 2009]. Since miRNAs are wide and key regulators of gene expression, miRNA-related SNPs including SNPs in miRNA genes and target sites may function as regulatory SNPs through modifying miRNA regulation to affect the phenotypes and disease susceptibility [Ryan et al., 2010]. Moreover, SNPs located in MIRs are likely to cause complex influence by affecting MIR maturation, functional strand selection, and target selection. To date, a number of studies have demonstrated that SNPs in target sites or miRNA genes are associated with diseases [Jazdzewski et al., 2008; Mencia et al., 2009; Ryan et al., 2010; Saunders et al., 2007; Sethupathy and Collins, 2008; Sun et al., 2009]. For example, an SNP in the 3′UTR of KRAS gene located in the binding site of miRNA let-7 weakens its inhibition and results to increase the risk of nonsmall cell lung cancer [Chin et al., 2008]. An SNP in pre-miR-146a is reported to decrease mature miRNA expression and predisposes to papillary thyroid carcinoma [Jazdzewski et al., 2008]. Sun et al. validated several SNPs in human miRNA genes that could affect the biogenesis and function of miRNAs [Sun et al., 2009]. A single mutation in pre-miR-155 creating a mismatch near the 3′ end of miR-155 leads to a shift in strand selection, thereby fine-tunes their targets and results in a butterfly effect on global gene expression [Lee et al., 2011]. Another study reported that a mutation in the seed region of human miR-96 was responsible for nonsyndromic progressive hearing loss [Mencia et al., 2009].
Since 2005, several studies have systematically identified and analyzed the human polymorphisms in miRNAs and/or miRNA target sites [Bhartiya et al., 2011; Duan et al., 2009; Iwai and Naraba, 2005; Landi et al., 2008; Ryan et al., 2010; Saunders et al., 2007]. Some features of these SNPs such as SNP density, allele frequency, evolutionary conservation and effects on miRNA guide strand selection have been analyzed based on data at that time. For the convenience of biologists, several online databases and tools about SNPs in miRNA target sites or miRNAs were developed by different groups, such as dbSMR [Hariharan et al., 2009], Patrocles [Hiard et al., 2010], PolymiRTS [Bao et al., 2007], microSNiPer [Barenboim et al., 2010], and miRvar [Bhartiya et al., 2011]. However, most of them focus on SNPs in target sites and their effects; few of them mentioned the SNPs in miRNA genes and their influences on target selection and miRNA biogenesis. Moreover, the numbers of miRNAs and SNPs were increased greatly in recent 2 years. For example, 1,048 versus 706 human miRNAs are in the miRBase release 16 (September 2010) [Kozomara and Griffiths-Jones, 2011] versus release 13 (March 2009), respectively. There are more than 30 million human SNPs in current dbSNP 132 (September 2010) and only 14 million human SNPs in dbSNP 129 (April 2008). Therefore, despite of these previous efforts, a more comprehensive database for SNPs in miRNA genes based on the greatly increased data is necessary and useful. To this end, here we systematically characterized all miRNA-related SNPs, summarized their features, and analyzed their effects on target binding alteration and mature miRNA biogenesis by both prediction and experiments. All useful data about miRNA-related SNPs and target alteration information were compiled into a user-friendly database, miRNASNP, freely available at http://www.bioguo.org/miRNASNP/.
Materials and Methods
Identification of miRNA-Related SNPs and Their Context Information
miRNA data (including chromosomal location, host gene, conservation among species, and miRNA cluster etc.) of the nine studied species (human, chimpanzee, mouse, rat, dog, horse, cow, chicken, and zebrafish) were obtained from miRBase database (release 16.0). The SNP information was downloaded from the latest version of NCBI dbSNP (release 132 for human). Then, we compared the chromosomal locations of pre-miRNAs and SNPs to identify SNPs in pre-miRNAs and their adjacent upstream and downstream 1000-bp regions. SNPs in pre-miRNAs were further classified into mature miRNAs and miRNA seed regions according to their locations in miRNA. We calculated the SNP densities of each miRNA and flanking regions and then used the t-test statistical method to test the difference of SNP densities between miRNA regions and flanking regions. The SNP density of each mature miRNA site was defined as the number of SNPs at the site per 1,000 miRNAs.
Data for SNP population allele frequency and iHSs were derived from HapMap (http://www.hapmap.org/) and Haplotter (http://haplotter.uchicago.edu/) [Voight et al., 2006]. Experimentally validated miRNAs targets were obtained from miR2Disease [Jiang et al., 2009] and TarBase databases [Sethupathy et al., 2006]. Together, 1,440 miRNA-target pairs with experimental evidences were included in our analysis. Minimum free energies of pre-miRNA hairpin structures were generated by RNAfold [Denman, 1993] and the pictures of pre-miRNA second structures were generated by RNAplot. All data processing scripts were written in Perl.
miRNA Target Gain and Loss Analysis by Prediction
To predict miRNA target sites, we combined results of two popular tools, TargetScan (http://www.TargetScan.org/) [Friedman et al., 2009; Lewis et al., 2005] and miRanda (http://www.microrna.org) [Betel et al., 2008], which are regularly updated and considered with relatively good performances. In detail, we used the miRanda v3.3a with the default parameters and cutoffs (Score S ≥ 140 and Energy E ≤– 7.0) to predict miRNA target. For TargetScan, we used the default parameters and defined the target site conserved if it existed in one of the corresponding orthologs of mouse, rat, and dog. Sequences of 3′UTRs of human, chimp, rat, mouse, and dog were obtained from UCSC genome browser (http://genome.ucsc.edu/).
For the SNPs in miRNA seed regions, two different methods were used to predict the target sites for the wild-type miRNAs and SNP-miRNAs. These resulted in four groups of target gene data, which are recorded as WT (target genes of wild-type miRNAs processed by TargetScan), WM (target genes of wild-type miRNAs processed by miRanda), ST (target genes of SNP-miRNAs processed by TargetScan), and SM (target genes of SNP-miRNAs processed by miRanda). If one miRNA/target pair exists in both WT and WM, but not in either ST or SM, we called this miRNA/target pair loss. On the contrary, if one miRNA/target pair was predicted in both ST and SM, but neither in WT nor WM, we defined the SNP-miRNA gained the target gene. In addition, for each miRNA/target loss or gain pair, we obtained the sequence (±50 bp) of target site and used RNAhybrid [Kruger and Rehmsmeier, 2006] to calculate the minimum hybridization energy of the miRNA–target interaction. Generally, more energy change would affect the miRNA–target interaction more sharply. The binding energy changes between wild-type miRNA/target and SNP-miRNA/target were provided in our database as additional information for users making further judgments.
For the SNPs in mRNA 3′UTRs, the gain and loss of miRNA-target binding and energy change were analyzed by the same strategy. The difference is that the miRNA sequence did not change, while the UTR sequences were divided into wild type and corresponding mutant type.
DAVID bioinformatics online tools (http://david.abcc.ncifcrf.gov/)[Huang da et al., 2009] were used to identify enriched functional annotation categories for genes with conserved sites lost and gained by SNPs in miRNA seed regions. Gene Ontology (GO) terms [Barrell et al., 2009] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [Kanehisa et al., 2004] were evaluated. Benjamini corrected P < 0.05 was considered statistically significant.
miRNA Target Gain and Loss Analysis by Experiments
To test the target binding influence of SNPs in miRNA genes, nine miRNAs and three genes were selected to perform luciferase reporter assays. Among them, miR-627 and miR-379 were tested twice with different targets or different SNPs.
Dual-luciferase reporter system was constructed with 3′UTRs amplified from genomic DNA and cloned in psiCHECKTM-2 vector (Promega, Madison, WI). The primer sequences for ATP6V0E1 were 5′-GCCTCGAGGGTCACGAGAAGAGAATGCC-3′ and 5′-CAGCGGCCGCAGATTTCCATAGAAAGGAGG-3′ and the product length was 516 bp; the primer sequences for BCL2 were 5′-ATTCTAGGCGATCGCTCGAGGTCAACATGCCTGCCCCAAACA-3′ and 5′-TTATTGCGGCCAGCGGCCGCTCCATCCGTCTGCTCTTCAGAT-3′ and the product length was 457 bp; the primer sequences for SEMA3F were 5′-CTCGAGCTCGAGATTGGTGGGTTGAA-3′ and 5′-GCGGCCGCCTGTCCCACCCAGGCAGGAT-3′ and the product length was 489 bp. The mimics of wild-type miRNAs and SNP-miRNAs were chemically synthesized by Shanghai Bioladder Company (Shanghai, China).
Then, for each wild-miRNA/target pair or variant-miRNA/target pair, the reporter vector with target gene 3′UTR and miRNA mimic were cotransfected into HEK-239 cells, which are commonly used in miRNA luciferase experiments. Luciferase assays were carried out by using Luciferase Reporter Gene Assay kit (Promega Company) 48h after transfection. Renilla luciferase (target) and Firefly luciferase (internal control) activities were detected. Renilla luciferase/Firefly luciferase were calculated to compare the difference among them. All experiments were performed in triplicate.
miRNASNP Database Construction
Summary and Features of miRNAs with SNPs
There are 1,048 human pre-miRNA entries in miRBase (release 16, Sept. 2010) and more than 30 million human SNPs in dbSNP (Build 132, September 2010). After mapping the SNPs onto human pre-miRNA genes, we identified 757 SNPs (including indel polymorphisms) in 440 pre-miRNAs. Among them, 178 SNPs are in 149 MIRs and 50 SNPs in 41 MIR seed regions. There are 177 miRNAs with more than one SNP, in which miR-3939, a human-specific miRNA, has the most SNPs (13) in its pre-miRNA region. Besides human, we also identified the SNPs in pre-miRNAs of other eight species, which are chimpanzee, mouse, rat, dog, cow, horse, chicken, and zebrafish (Table 1). In the following of this study, we only analyzed the human miRNAs if no specific mention because much fewer data are available in other species. To compare the SNP density among different region, we further characterized SNPs in the flanking regions of human pre-miRNAs (–1 kb ∼ +1 kb). As a result, we observed that the SNP density of pre-miRNAs is lower than that of flanking regions, and SNP density of MIR seed regions is significantly lower than that in the pre-miRNAs and flanking regions (Fig. 1A). The same trends were found in mouse and chicken that have relatively more SNPs than other six species (Supp. Fig. S1). The SNP density of middle sites in human MIRs is higher than their 5′ and 3′ sites (Fig. 1B). According to the miRNA position on the respective genome and the conservation among species, we classified human miRNAs into different categories with SNPs or without SNPs. Then, we performed statistical test on miRNAs with and without SNPs in each miRNA category (Table 2). The results show that conserved miRNAs and miRNAs in clusters tend to have fewer SNPs (χ2 test, P < 0.01). miRNAs in host genes or intergenic regions have no significant difference on SNP distribution.
Table 1. A Summary of single nucleotide polymorphisms (SNPs) in pre-miRNAs in Nine Species
Total SNPs in dbSNP
No. of poly-miRs a
No. of SNPs in poly-miRs
apoly-miRs: miRNAs with SNP in it.
bA total of 178 SNPs in 149 poly-miRs mature sequence, 50 SNPs in 41 poly-miRs seed sequence.
440 (149, 41)b
757 (178, 50)b
Table 2. The Features of miRNA Distribution in Different Categories
Conserved in primates
Conserved in mammalias
In 5-kb cluster
In 10-kb cluster
*P < 0.01 by χ2 test comparing miRNAs with SNP or without SNP in different categories.
miRNAs with SNP
miRNAs without SNP
miRNA Target Alteration by SNPs in miRNA Seed Regions
As miRNA target selection mainly relies on the seed region base pairing, alternative allele of an SNP in the miRNA seed region may greatly change its target spectrum, resulting in lots of targets loss and gain, therefore affect the miRNA function dramatically. As mentioned above, we have identified 50 SNPs in the seed regions of 41 human miRNAs. Because two insertion SNPs do not change the seed sequences, we just analyzed the target alteration for the other 48 SNPs (Supp. Table S1). We predicted the targets by a combined strategy using both TargetScan and miRanda tools for miRNAs with reference sequences (wild-type miRNAs) and miRNAs with SNP alleles (SNP-miRNAs). By comparing the targets of wild-type miRNA with that of SNP-miRNA, we obtained the potential miRNA targets loss and gain (Fig. 2). According to our strategy, on one hand, there are 55,887 miRNA-target pairs predicted by both tools for wild-type miRNAs and 53% of them will be disturbed after changing the reference allele to another SNP allele. On the other hand, the SNP-miRNAs have 58,683 miRNA-target pairs predicted by both tools and 52% of them are novel created by the SNP allele (Fig. 2).
As shown in Figure 2, the total number of putative targets of miR-518d-3p, miR-3622a-5p, and miR-1304(1) (rs76857625) greatly increased after variation, increasing by a number of 2,075, 1,703, and 1,593, respectively. While the number of putative target genes of miR-642a, miR-4293, and miR-3614-5p decreased dramatically after SNP allele alteration, decreasing by 1,895, 1,429, and 1,406, respectively. Especially, miR-4293 lost nearly 93% of its target genes after mutation. Interestingly, miR-1304 has two SNPs in its seed region, but the alterations of target spectrum for the two SNPs were different. The number of putative targets of miR-1304 with SNP rs76857625 was greatly increased after variation, while the number of putative targets of miR-1304 with SNP rs79759099 was sharply decreased after variation.
Target Binding Alteration by Experimental Validation
Although SNPs in miRNAs were predicted to alter miRNA target binding, to validate the effects in experiments, we selected 11 miRNA-target pairs and their corresponding SNP-miRNA target pairs to test by luciferase reporter assay. Of these 11 selected candidate SNPs, eight are in miRNA seed regions and three are in the miRNA mature region excluding the seed region (Table 3).
Table 3. Experimentally Validated Human miRNA-Target Pairs and the SNPs in miRNA Genes
aThe rectangle marks the SNP position.
For the eight SNPs in miRNA seed regions, four showed all or partial loss of target binding function and one showed gain of target binding in our experiments, which were consistent with the prediction (Fig. 3). The losses of target groups were miR-627(rs2620381)/SEM3F, miR-379(rs61991156)/SEM3F, miR-499-3p(rs3746444)/BCL2, and miR-124(rs34059726)/ATP6V0E1, while the gain of target group was miR-627(rs2620381)/ATP6V0E1. Of these five SNPs showing significant different luciferase activity in our assays, SNP rs34059726 in miR-124 seed region shows the greatest influence on the interaction between miRNA and target, with the normalized luciferase activity changing from 10% to 80%. The SNP rs2620381 in miR-627 was the first example that was confirmed to create novel target binding by an SNP in seed. MiR-627 does not bind the 3′UTR of gene ATP6V0E1 in the wild type since the relative luciferase activity of which was the same as controls. While cotransfected the SNP-type miR-627 and reporters, it decreased half of the luciferase activity compared with control (Fig. 3).
To investigate whether SNPs in miRNA mature sequence except seed region could affect the interaction of miRNA and target genes, we also designed three SNPs in miRNA mature sequences (rs35356504 in miR-940, rs72631818 in miR-379, and rs35301225 in miR-34a) for experiments (Table 3). Among them, SNP rs35356504 in miR-940 is an indel SNP, which is the SNP type in a nucleotide deletion. As a surprising result, we found that all the three SNPs in miRNA mature sequence have effects on their target binding (Fig. 3). Although the influences of these SNPs in mature sequences were slight weaker than SNPs in seed regions, they were still statistically significant.
Effects on Mature miRNA Production by SNPs in miRNA Genes
Several groups reported that SNPs in miRNA genes could affect the biogenesis process of miRNAs [Duan et al., 2007; Gottwein et al., 2006; Jazdzewski et al., 2008; Li et al., 2010; Mencia et al., 2009; Sun et al., 2009, 2010]. Are there any rules for the effects that SNPs impact on the miRNA biogenesis processing? To address this question, we made an extensive collection for the published results of SNP in pre-miRNAs (Table 4) including their energy changes and mature miRNA production. We observed that 11 SNPs, which reduced the product of mature miRNA, locate in the pre-miRNA stem regions, and change their hairpin structure from stable to unstable status, such as G:C to G:G. The energy change (ΔΔG) of the hairpin structure caused by those SNPs is often at a relatively high levels ranging from 2.1 to 7.1. Two SNPs in miRNAs elevated the mature miRNA product, one in the loop and the other in the stem region, which slightly increased the hairpin stability (G:U to A:U) (Table 4).
Table 4. A list of Published SNPs in pre-miRNAs Influencing Mature miRNA Production
Wild base pairing
SNP base pairing
Mature miR production
“5p 1” represents the first nucleotide of 5′ end of mature miRNA.
Based on the above observations, we roughly speculated the effects of SNPs in miRNA genes on mature miRNA production. For an SNP in the miRNA stem, if it decreases the stability of the hairpin structure, it will reduce the product of mature miRNA, otherwise increase the product. The more energy changed, the more likely the product affected. In our results, the average energy change of pre-miRNA secondary structures (|ΔΔG|) caused by SNPs is 2.1 kcal/mol. About 44% of energy changes are >2.0 kcal/mol, which may affect the mature miRNA products significantly. However, since the inference is based on rules summarized from current uncompleted published data, there may be some exceptions and more experimental data are needed to validate.
miRNA Target Alteration by SNPs in 3′UTR
To analyze the influence of SNPs in 3′UTR on miRNA/mRNA interaction, we first mapped SNPs to 3′UTRs of all human protein coding genes and identified 225,759 SNPs in these 3′UTRs. After changing reference 3′UTR (wild-type 3′UTR) sequences to corresponding SNP-3′UTR sequences, we predicted all putative miRNA target sites for wild-type 3′UTR and SNP-3′UTR by both TargetScan and miRanda. As a result, 1,916,262 and 2,154,323 target sites (including nonconserved) were predicted for wild-type 3′UTRs and SNP-3′UTRs. By comparing these target sites, we found a total of 58,977 SNPs disturbed 90,784 original miRNA target sites, while 59,810 SNPs created 91,711 new potential miRNA target sites. Among these SNPs, 20,779 SNPs could disturb original target sites and create new miRNA target sites at the same time. Therefore, a total of 98,008 (58,977 + 59,810–20,779) SNPs can be considered as possible functional SNPs. Here, the target site gain and loss were defined by the same strategy as SNPs in miRNA seed regions in Figure 2 and Method section. All these target site gain and loss data are shown in the miRNASNP database. In target loss dataset, we identified 31 SNPs with the potential to disrupt 30 experimentally verified miRNA-target pairs collected from the TarBase database and miR2Disease (Supp. Table S2).
To increase the credibility of our results, we further used RNAhybrid to quantitatively measure the binding energy change between miRNA with wild-type 3′UTR or SNP-3′UTR. The averages of energy changes caused by SNPs in 3′UTR were 11.5 kcal/mol and 11.7 kcal/mol in target loss and target gain dataset, respectively. About 50% energy changes in whole dataset were >10 kcal/mol and some of them were even >30 kcal/mol, which will greatly affect the miRNA binding. All the energy change data were shown at the online miRNASNP database.
The frequencies of miRNA-related SNPs
Generally, functional SNPs are more importance in population genetics if they are at a high frequency or undergo positive selection. To get more information about our identified functional SNPs, we mapped them to HapMap data and iHS data from Haplotter (http://haplotter.uchicago.edu/) to check the population allele frequencies and test the positive selection. HapMap contains genotypes and frequency data for different population with relative high frequency. iHS data have been developed as a genomic standardized measure for recent positive selection for a given SNP in a population. After examined the HapMap data, we found only 69 of the 757 SNPs in human pre-miRNAs were sampled in HapMap (Supp. Table S3) and 40 of them have relatively high minor allele frequencies (MAF) (q ≥ 0.10), among which five in MIRs and one in seed region (rs12220909 in miR-4293). There are 11,150 SNPs in the target loss dataset and 11,190 SNPs in the target gain dataset sampled in HapMap (Supp. Tables S4 and S5), among which 7,443 and 7,339 SNPs have relatively high MAF of q ≥ 0.10 in at least one population, respectively. Moreover, 45 SNPs in the target loss dataset and 37 in the target gain dataset show large allele frequency differences (q ≥ 0.8) between populations (Supp. Tables S6 and S7). Since most of miRNA-related SNPs lack iHS information, we only found 100 SNPs with a relatively high iHS (his > 2.5) (Supp. Table S8), which correspond to the most extreme 1% of iHS values in genome-wide outliers and represent undergoing recent positive selection pressure [Voight et al., 2006]. These SNPs with high population MAF, with high population frequency difference, or undergoing positive selection pressure will be important candidates for population phenotype research and complex trait studies.
miRNASNP, an Online Database of miRNA-Related SNPs
To provide a useful resource of these miRNA-related SNPs and their potential target loss and gain information for all researchers, we compiled all the data into a MySQL database and developed a user-friendly online website, miRNASNP. The miRNASNP database contains five major modules (Fig. 4A): (1) SNPs in human pre-miRNAs; (2) SNPs in human pre-miRNA flanking regions (–1 kb ∼ +1kb); (3) SNPs in pre-miRNAs of other eight species; (4) targets gain/loss by SNPs in miRNA seeds; and (5) targets gain/loss by SNPs in target 3′UTRs (Fig. 4C). Besides the basic information for human miRNAs with SNPs in their precursors, we further marked the SNP in the pre-miRNA stem-loop and showed the graphics secondary structure with the SNP (Fig. 4B and E). Species module provides data of SNP-miRNAs in other eight species, which are chimpanzee, mouse, rat, dog, cow, horse, chicken, and zebrafish. The two modules of target gain and loss enable users to investigate the predicted target gain and loss due to SNPs in miRNA seed regions or in target mRNA 3′UTRs. miRNASNP provides different functions for data browsing and searching, which includes a quick search box on top-left of each page by search miRNA ID/SNP ID/gene symbol and an advanced search page for all data in the five modules in miRNASNP (Fig. 4D).
In recent years, SNPs in miRNA target sites have been widely studied to be associated with diseases from hereditary diseases to different cancers (review in [Sethupathy and Collins, 2008]). And SNPs in miRNA genes were reported to involve in the alteration of miRNA processing [Duan et al., 2007; Harnprasopwat et al., 2010], thyroid cancer [Jazdzewski et al., 2009], gastric cancer risk [Peng et al., 2009], ulcerative colitis [Okubo et al., 2011], squamous cell carcinoma [Liu et al., 2010], and nonsyndromic progressive hearing loss [Mencia et al., 2009]. Thus, identifying functional miRNA-related SNPs are of interest for diseases and complex trait studies. However, the effects on miRNA biogenesis and target selection of SNPs in miRNA genes have not been studied extensively. In the first step of this study, we identified the miRNA-related SNPs and summarized their features. Then, we focused on the prediction of potential effects on miRNA biogenesis and target binding by SNPs in miRNA genes through both prediction and experimental validation. Finally, we compiled all the data into the miRNASNP, a free online database. Considering the wide regulation of miRNA and widely existed SNP, our identified functional miRNA-related SNPs will be a useful resource to mine SNP-associated disease or phenotype in population.
Promising SNPs in Human miRNA Precursors
Since miRNA functions as a top regulator involved in a wide range of regulation, SNPs in miRNA genes may affect miRNA function by influencing the miRNA biogenesis process or target interactions, thus cause serious consequences. In this study, we identified 757 SNPs in human miRNA genes, and further examined the HapMap data for these SNPs. Although lots of SNPs lack frequency information in HapMap, we still found that 69 of them were sampled in HapMap and 40 of them with relatively high MAF (q ≥ 0.1) in at least one population. In the Result section, we inferred the effects on maturation of the SNPs in pre-miRNAs by summarizing published examples. According to our speculated rules, eight of the 40 SNPs (rs11614913, rs13299349, rs13447640, rs6971711, rs11844707, rs72246410, rs4822739, and rs17797090) locate in stem regions with ΔΔG > 2 kcal/mol, which may decrease the MIR production. Another seven SNPs (rs2910164, rs2292832, rs10505168, rs5997893, rs12780876, rs10934682, and rs2043556) locate in stem regions with ΔΔG <–2 kcal/mol, which may increase the MIR production. Theoretically, these SNPs would greatly change the production of the mature miRNAs, thus may contribute to genetic difference among different population. However, an SNP in the pre-miRNA with relatively low MAF may also cause serious consequences in individuals once it occurs. For example, two SNPs (+13 G > A) and (+14 C > A) in miR-96 seed region were observed in a Spanish family with autosomal dominant progressive high-frequency hearing loss due to impaired maturation and disturbed target sites [Mencia et al., 2009].
Since the seed region of an miRNA is the most important feature for its target binding [Bartel, 2009], SNPs in miRNA seed region will influence the miRNA target binding and selection directly. Here, we identified 50 SNPs in the seed regions of 41 human miRNA genes and predicted their target gain and loss effects for these SNPs (Fig. 2 and miRNASNP website). Our predicted results indicate that SNPs in miRNA seed regions would cause nearly half targets loss and gain on average. In our dataset, five miRNAs with SNP in seed region (miR-124, miR-125-5, miR-1302, miR-379, and miR-499-3p) are conserved in mammalias (chimpanzee, mouse, rat, and dog). We extracted their conserved targets and performed KEGG pathway and Gene Ontology enrichment analyses. Results show that miRNA-mediated function will be changed greatly after SNP variants. For example, the conserved targets of wild miR-124 show significant enrichment in terms “regulation of apoptosis,” “intracellular membrane-bounded organelle,” and “regulation of cellular biosynthetic process,” but no enrichment for the conserved targets of its variant (Benjamini corrected P value < 0.05). Of them, miR-124 and miR-125a-5p have experimentally validated targets in TarBase [Sethupathy et al., 2006] and miR2Disease [Jiang et al., 2009], we found 135 validated targets would loss for miR-124 and one validated target would loss for miR-125a-5p once the SNP allele changes.
Notable, the SNP rs12220909 in miR-4293, which is the only one SNP located in seed region and sampled in HapMap with q > 0.1. We further analyzed the MAF of rs12220909 in HapMap populations and found the frequencies of the allele C are 0 in both Utah residents with Northern and Western European ancestry from the CEPH collection (CEU) and Yoruba populations, 0.034 in Japanese, and 0.211 in Han Chinese. Chinese has a significant higher C genotype (χ2 test, P < 0.01). The ΔΔG affected by rs12220909 is −0.5 kcal/mol, which means the SNP type is slightly more stable than wild type and it may increase the mature miRNA expression. Since it locates in the seed region, our target gene gain and loss prediction indicated that miR-4293 would loss 1,735 target genes and only gain 199 target genes after G→C substitution. Gene Ontology and KEGG analyses show that lost target genes significantly enrich in the term of “ion binding,” “plasma membrane part,” and “small GTPase regulator activity” (Benjamini corrected P < 0.05), while the gained target genes by SNP variant do not show significant enrichment in any categories. Although, no studies reported the function of this miRNA currently, it is interesting to study its function and associated phenotype.
Besides seed region, other residues in mature miRNA sequence were suggested to play a modest role in target recognition [Bartel, 2009; Grimson et al., 2007]. After computational prediction for miRNA target gain and loss, we further performed experiments to validate the effects on target binding by SNPs in seed region and mature region. We selected 11 miRNA target pairs for three target genes, which are ATP6V0E1, BCL2, and SEMA3F (Table 3). Among these miRNA-target pairs, two (miR-34a/BCL2 and miR-124/ATP6V0E1) have been validated by others [Wang et al., 2009; Wang and Wang, 2006] and we also confirmed both of them. In our experimental results, five of the eight SNPs in seed regions were proved to dysregulate their targets. Three SNPs in mature sequences other than seed regions all have slight influences on their target binding, even an indel SNP. These results support the conclusion that residues in seed region play key roles and other residues in mature sequence have modest effects on miRNA target binding [Bartel, 2009].
It is worthy of note that in this study, we proved the target gain by SNP rs2620381 in miR-627 seed region by experiments. In wild type of miR-627, it can not bind the 3′UTR of ATP6V0E1, while the SNP-type miR-627 gained the ability to target the ATP6V0E1 3′UTR and repressed its expression dramatically in our luciferase experiments. To our best knowledge, this is the first experimentally validated example for target gain affected by a SNP in an miRNA. It provides a new mechanism for miRNA dysregulation in different individuals. Our results show that SNPs in an miRNA gene, especially in the seed region, will alter the target profile of the miRNA by losing original targets and gaining new targets. These SNPs in miRNA seed regions and their target gain and loss information will be a potential useful clue to study the miRNA function and find the SNP-associated disease or phenotype.
Promising SNPs in 3′UTRs of Human Protein Coding Genes
In contrast to the SNPs in miRNA genes, there are more reports that studied the effect of SNPs in 3′UTRs. Loss of a potential miRNA target site may increase the protein expression, while gain of a functional miRNA target site will repress the protein expression, thus affect physiological function and clinical phenotype. Here, using our pipeline, we identified tens of thousands of SNPs locating in potential miRNA target sites and some of them show high MAF, high MAF difference between populations, or positive selection pressure during evolution. Those SNPs will be important candidates for causal variants of human disease. Currently, genome-wide association studies have uncovered many SNPs associated with traits and diseases. The NHGRI GWAS catalogue (http://www.genome.gov/gwastudies, accessed by 2010-12-16) described 1,227 unique SNPs associated with one or more traits (P < 5 × 10−8) [Hindorff et al., 2009]. Among these SNPs, six are in our 3′UTR dataset and three are present in our target loss and gain dataset. They are rs1036819 associated with longevity, rs28927680 associated with triglycerides, and rs1042725 associated with height. The original papers also mentioned that these SNPs in 3′UTR may be involved in the traits by miRNA-mediated regulation but without detail miRNA information. Utilizing our database miRNASNP, users can find the detail information about miRNA and its target gain and loss. For example, when users search rs28927680 in miRNASNP, it will show the SNP locates in the potential target sites of six miRNAs (hsa-miR-1323, hsa-miR-548a-3p, hsa-miR-548e, hsa-miR-548f, hsa-miR-548o, and hsa-miR-548t) in 3′UTR of gene BUD13. SNP rs28927680 is reported to be associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, or triglycerides in human [Kathiresan et al., 2008], hence the SNP-associated miRNA and target site information may shed light on further experiments.
There are more than 1,000 experimentally validated miRNA-target pairs in miR2Disease and Tarbase databases. Based on these data and miRNA-related SNPs in our miRNASNP database, we identified 31 SNPs in 3′UTRs with the abilities to disturb experimental validated miRNA-target pairs. Three of them (rs5186, rs12720208, and rs56109847) have been experimental confirmed to make dysregulate their corresponding targets and associated with diseases. Sethupathy et al. demonstrated that the SNP (rs5186) in the AGTR1 3′UTR mediates allele-specific targeting of miR-155 to AGTR1, thereby modulating AGTR1 protein levels [Sethupathy et al., 2007]. SNP rs12720208 was proved to mediate allele-specific in vitro targeting of miR-433 to the FGF20 3′UTR and confers risk for Parkinson disease [Wang et al., 2008]. Kapeller et al. identified rs62625044 (now merged into rs56109847) in the 3′UTR of HTR3E, which could mediate allele-specific miR-510 targeting. This was associated with diarrhea-predominant irritable bowel syndrome (IBS-D) in females from the United Kingdom and was also confirmed in a German cohort by replication study [Kapeller et al., 2008]. Besides the three validated SNPs, the rest will be attractive SNPs in human miRNA target sites for future studies.
We thank Wei Liu and Hui Liu for helpful discussions.