BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data
Version of Record online: 22 APR 2011
© 2011, The International Biometric Society
Volume 67, Issue 4, pages 1215–1224, December 2011
How to Cite
Ji, Y., Xu, Y., Zhang, Q., Tsui, K.-W., Yuan, Y., Norris Jr., C., Liang, S. and Liang, H. (2011), BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data. Biometrics, 67: 1215–1224. doi: 10.1111/j.1541-0420.2011.01605.x
- Issue online: 14 DEC 2011
- Version of Record online: 22 APR 2011
- Received August 2010. Revised March 2011. Accepted March 2011.
- Data augmentation;
- Read alignment;
- Short reads;
- Solexa sequencing;
Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software.