Communicated by: Keiichi I. Nakayama
Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics
Article first published online: 12 JUN 2012
© 2012 The Authors Journal compilation © 2012 by the Molecular Biology Society of Japan/Blackwell Publishing Ltd.
Genes to Cells
Volume 17, Issue 8, pages 633–644, August 2012
How to Cite
Helmy, M., Sugiyama, N., Tomita, M. and Ishihama, Y. (2012), Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. Genes to Cells, 17: 633–644. doi: 10.1111/j.1365-2443.2012.01615.x
- Issue published online: 25 JUL 2012
- Article first published online: 12 JUN 2012
- Manuscript Accepted: 14 APR 2012
- Manuscript Received: 19 MAR 2012
We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.