On behalf of the Editors we write with regard to our discussion of the need for rules governing protein identification by mass spectrometry published in Rapid Communications in Mass Spectrometry (RCM). While protein identification by mass spectrometry is well established, rules governing exactly what parameters constitute identification are not. We suggest here rules that we hope are not overly burdensome on authors, but provide readers with assurance of quality.
The following information should be included in manuscripts that describe ‘identification’ of proteins by mass spectrometry:
- 1The name of programs used to convert raw MS and MS/MS data into ‘database searchable’ files.
- 2The name of the software used to query a sequence database using MS data, e.g. SEQUEST,1 Mascot,2 X!Tandem,3 or Peaks.4
- 3The name of the database searched, including any specific time stamp date if it is public and where it may be found, should be included as a reference. For private (or contrived) databases, a full description of the contents including the number of proteins in the database and the average sequence length should be included.
- 4A description of the type of scoring and cutoff criteria used to decide that a set of data (MS or MS/MS) indicates the presence of a protein in the sample. For example, if SEQUEST was used for the database search, then state the XCORR and ΔCORR cutoff values.
- 5A measure of the false positive rate. The simplest method to calculate this for MS/MS data is to search the data against a reverse sequence database search.5 Proteins identified with an estimated false positive rate greater than 10% are considered dubious identifications (see item 11 below).
- 6For each protein identified, a list of the peptides matched with their scores and an accounting of protein sequence coverage. Very long lists may need to be supplied as supplementary data, but should be included at the time of submission for review.
- 7Proteins identified by a single peptide MS/MS spectrum match are considered dubious identifications (see item 11) and are discouraged.
- 8Should identification be based on peptide mass fingerprinting (PMF), mass accuracies of peptides used for the identification should be stated. Proteins identified by PMF using m/z peaks with signal/noise less than 1.5 are considered dubious (see item 11).
- 9For identifications from either MS/MS or PMF, authors are strongly encouraged to use a software package that assigns an objective statistical basis to the identification, e.g. Protein Prophet.6
- 10Where peptides used for identification match to homologous proteins, authors must indicate why a particular species was chosen over all possible homologues.
- 11Dubious protein/peptide identifications (see items 5, 7 and 8) must be verified by complementary means. This should include the manually annotated tandem mass spectrum as supplementary data, and any corroborating non-MS data such as reaction with protein-specific antibodies in a Western blot or ELISA.
Finally, we point out that it could be argued that data used to justify any publicly available manuscript must be publicly available also. While RCM has no authority beyond the manuscript in its published form to make data public, authors are reminded that many granting agencies are increasingly interested in public availability of data generated with public funds. Thus we remind authors that they may receive requests for raw files that support specific identifications, and we encourage dissemination of this information, as it should increase the quality of their own work as well as aid the community at large.