MirPlex: A Tool for Identifying miRNAs in High-Throughput sRNA Datasets Without a Genome


  • Daniel Mapleson and Simon Moxon contributed equally to this work.

Correspondence to: Vincent Moulton, University of East Anglia, Norwich NR4 7TJ, United Kingdom. E-mail: vincent.moulton@uea.ac.uk


MicroRNAs (miRNAs) are a class of small non-coding RNA (sRNA) involved in gene regulation through mRNA decay and translational repression. In animals, miRNAs have crucial regulatory functions during embryonic development and they have also been implicated in several diseases such as cancer, cardiovascular and neurodegenerative disorders. As such, it is of importance to successfully characterize new miRNAs in order to further study their function. Recent advances in sequencing technologies have made it possible to capture a high-resolution snapshot of the complete sRNA content of an organism or tissue. A common approach to miRNA detection involves searching such data for telltale miRNA signatures. However, current miRNA prediction tools usually require a sequenced genome to analyse regions flanking aligned sRNA reads in order to identify characteristic miRNA hairpin secondary structures. Since only a handful of published genomes are available, there is a need for novel methods to identify miRNAs in sRNA datasets from high-throughput sequencing devices without requiring a reference genome. This paper presents miRPlex, a tool for miRNA prediction that requires only sRNA datasets as input. Mature miRNAs are predicted from such datasets through a multi-stage process, involving filtering, miRNA:miRNA* duplex generation and duplex classification using a support vector machine. Tests on sRNA datasets from model animals demonstrate that the tool is effective at predicting genuine miRNA duplexes, and, for some sets, achieves a high degree of precision when considering only the mature sequence. J. Exp. Zool. (Mol. Dev. Evol.) 320B:47–56, 2013. © 2012 Wiley Periodicals, Inc.