Get access

Refining similarity scoring to enable decoy-free validation in spectral library searching

Authors

  • Wenguang Shao,

    1. Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
    Search for more papers by this author
  • Kan Zhu,

    1. Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
    Search for more papers by this author
  • Henry Lam

    Corresponding author
    1. Division of Biomedical Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
    2. Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
    • Correspondence: Professor Henry Lam, Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China

      E-mail: kehlam@ust.hk

      Fax: +852-2358-0054

    Search for more papers by this author

Abstract

Spectral library searching is a maturing approach for peptide identification from MS/MS, offering an alternative to traditional sequence database searching. Spectral library searching relies on direct spectrum-to-spectrum matching between the query data and the spectral library, which affords better discrimination of true and false matches, leading to improved sensitivity. However, due to the inherent diversity of the peak location and intensity profiles of real spectra, the resulting similarity score distributions often take on unpredictable shapes. This makes it difficult to model the scores of the false matches accurately, necessitating the use of decoy searching to sample the score distribution of the false matches. Here, we refined the similarity scoring in spectral library searching to enable the validation of spectral search results without the use of decoys. We rank-transformed the peak intensities to standardize all spectra, making it possible to fit a parametric distribution to the scores of the nontop-scoring spectral matches. The statistical significance of the top-scoring match can then be estimated in a rigorous manner according to Extreme Value Theory. The overall result is a more robust and interpretable measure of the quality of the spectral match, which can be obtained without decoys. We tested this refined similarity scoring function on real datasets and demonstrated its effectiveness. This approach reduces search time, increases sensitivity, and extends spectral library searching to situations where decoy spectra cannot be readily generated, such as in searching unidentified and nonpeptide spectral libraries.

Ancillary