As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.


Figure S1. For each length l and charge state, the pairwise spectral correlation between all peptides (for precursors of charge +2 and length 7 to 19 we only sampled 10,000 peptides randomly) were calculated. The distributions of spectral correlation are shown for peptide pairs with Hamming distance ranging from 1 to l.

Figure S2. For each length l and charge state, we consider all peptide pairs with spectral similarity inline imageor inline image (top and bottom 5% of peptides). The number of times the two peptide sequences agree in terms of amino acid identity at each location (1 to l − 1; the C-terminal position is ignored) is shown.

Figure S3. The average intensities corresponding to the most dominant fragment ions in the standard spectra. For each length l and charge state, the vectors representing the standard spectra for each peptide were averaged over all peptides. For simplicity we only show b- and y- ions for charge +2 and b-, y-, b++- and y++-ions for charge +3.

Figure S4. The 2-D heat map (bin size = 0.01 for both axes) of actual similarity and predicted similarity. 636,290 pairs with known actual similarity inline image and predicted similarity inline image were generated from the cross-validation process for each target-neighbor pair.

Figure S5. Scatter plot of prediction accuracy and confidence score. Prediction accuracy is defined as the correlation coefficient between actual and predicted standard spectra, i.e. inline image. These two quantities are calculated for each of the 8,934 target peptides (charge +2, length 20). The scatter plot only shows 1,000 points by grouping the original 8,934 dots into 1,000 bins of equal size by the prediction accuracy. The x and y coordinates of each point represent the averaged prediction accuracy and confidence score for each bin.

Figure S6. The average correlation coefficient between predicted and actual standard spectra as a function of K. The correlation coefficients were averaged over 8,934 target peptides (charge +2, length 20).

Table S1. The Area Under the ROC Curve (AUC) and spectral similarity for the predictors constructed for each length and charge state. Table S2. Average of confidence scores (Score) and spectral similarities (SS) between predicted standard spectra and their real counterparts for unique peptides identified (1) in NIST but not in NISTKNN library and (2) in both NIST and NISTKNN libraries.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.