Unifying protein inference and peptide identification with feedback to update consistency between peptides


  • Colour Online: See the article online to view Figs. 1–4 in colour.

Correspondence: Professor Fang-Xiang Wu, Division of Biomedical Engineering, and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan S7N 5A9, Canada

E-mail: fangxiang.wu@usask.ca

Fax: +1-306-966-5427


We first propose a new method to process peptide identification reports from databases search engines. Then via it we develop a method for unifying protein inference and peptide identification by adding a feedback from protein inference to peptide identification. The feedback information is a list of high-confidence proteins, which is used to update an adjacency matrix between peptides. The adjacency matrix is used in the regularization of peptide scores. Logistic regression (LR) is used to compute the probability of peptide identification with the regularized scores. Protein scores are then calculated with the LR probability of peptides. Instead of selecting the best peptide match for each MS/MS, we select multiple peptides. By testing on two datasets, the results have shown that the proposed method can robustly assign accurate probabilities to peptides, and have a higher discrimination power than PeptideProphet to distinguish correct and incorrect identified peptides. Additionally, not only can our method infer more true positive proteins but also infer less false positive proteins than ProteinProphet at the same false positive rate. The coverage of inferred proteins is also significantly increased due to the selection of multiple peptides for each MS/MS and the improvement of their scores by the feedback from the inferred proteins.