Software and data: www.informatics.indiana.edu/predrag/files/knnspectra.zip
Extending the coverage of spectral libraries: A neighbor-based approach to predicting intensities of peptide fragmentation spectra
Version of Record online: 4 FEB 2013
© 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Volume 13, Issue 5, pages 756–765, March 2013
How to Cite
Ji, C., Arnold, R. J., Sokoloski, K. J., Hardy, R. W., Tang, H. and Radivojac, P. (2013), Extending the coverage of spectral libraries: A neighbor-based approach to predicting intensities of peptide fragmentation spectra. Proteomics, 13: 756–765. doi: 10.1002/pmic.201100670
Colour Online: See the article online to view Fig. 5 in colour.
- Issue online: 8 MAR 2013
- Version of Record online: 4 FEB 2013
- Accepted manuscript online: 9 JAN 2013 09:55AM EST
- Manuscript Accepted: 11 NOV 2012
- Manuscript Revised: 19 OCT 2012
- Manuscript Received: 27 DEC 2011
- National Institutes of Health. Grant Numbers: R01 RR024236–01A1, R01 GM103725-04, R01 AI090077
- National Cancer Institute. Grant Number: U24 CA126480-01
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.
Figure S1. For each length l and charge state, the pairwise spectral correlation between all peptides (for precursors of charge +2 and length 7 to 19 we only sampled 10,000 peptides randomly) were calculated. The distributions of spectral correlation are shown for peptide pairs with Hamming distance ranging from 1 to l.
Figure S2. For each length l and charge state, we consider all peptide pairs with spectral similarity or (top and bottom 5% of peptides). The number of times the two peptide sequences agree in terms of amino acid identity at each location (1 to l − 1; the C-terminal position is ignored) is shown.
Figure S3. The average intensities corresponding to the most dominant fragment ions in the standard spectra. For each length l and charge state, the vectors representing the standard spectra for each peptide were averaged over all peptides. For simplicity we only show b- and y- ions for charge +2 and b-, y-, b++- and y++-ions for charge +3.
Figure S4. The 2-D heat map (bin size = 0.01 for both axes) of actual similarity and predicted similarity. 636,290 pairs with known actual similarity and predicted similarity were generated from the cross-validation process for each target-neighbor pair.
Figure S5. Scatter plot of prediction accuracy and confidence score. Prediction accuracy is defined as the correlation coefficient between actual and predicted standard spectra, i.e. . These two quantities are calculated for each of the 8,934 target peptides (charge +2, length 20). The scatter plot only shows 1,000 points by grouping the original 8,934 dots into 1,000 bins of equal size by the prediction accuracy. The x and y coordinates of each point represent the averaged prediction accuracy and confidence score for each bin.
Figure S6. The average correlation coefficient between predicted and actual standard spectra as a function of K. The correlation coefficients were averaged over 8,934 target peptides (charge +2, length 20).
Table S1. The Area Under the ROC Curve (AUC) and spectral similarity for the predictors constructed for each length and charge state. Table S2. Average of confidence scores (Score) and spectral similarities (SS) between predicted standard spectra and their real counterparts for unique peptides identified (1) in NIST but not in NISTKNN library and (2) in both NIST and NISTKNN libraries.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.