Scan2S: Increasing the precision of PROSITE pattern motifs using secondary structure constraints

Authors

  • Lucy Skrabanek,

    Corresponding author
    1. Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York 10021
    2. HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York 10021
    • Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, NY 10021
    Search for more papers by this author
  • Masha Y. Niv

    Corresponding author
    1. Faculty of Agricultural, Food, and Environmental Quality Sciences, Institute of Biochemistry, Food Science, and Nutrition, The Hebrew University of Jerusalem, Rehovot 76100, Israel
    • Faculty of Agricultural, Food, and Environmental Quality Sciences, Institute of Biochemistry, Food Science, and Nutrition, The Hebrew University of Jerusalem, Rehovot 76100, Israel
    Search for more papers by this author

Abstract

Sequence signature databases such as PROSITE, which include protein pattern motifs indicative of a protein's function, are widely used for function prediction studies, cellular localization annotation, and sequence classification. Correct annotation relies on high precision of the motifs. We present a new and general approach for increasing the precision of established protein pattern motifs by including secondary structure constraints (SSCs). We use Scan2S, the first sequence motif-scanning program to optionally include SSCs, to augment PROSITE pattern motifs. The constraints were derived from either the DSSP secondary structure assignment or the PSIPRED predictions for PROSITE-documented true positive hits. The secondary structure-augmented motifs were scanned against all SwissProt sequences, for which secondary structure predictions were precalculated. Against this dataset, motifs with PSIPRED-derived SSCs exhibited improved performance over motifs with DSSP-derived constraints. The precision of 763 of the 782 PSIPRED-augmented motifs remained unchanged or increased compared to the original motifs; 26 motifs showed an absolute precision increase of 10–30%. We provide the complete set of augmented motifs and the Scan2S program at http://physiology.med.cornell.edu/go/scan2s. Our results suggest a general protocol for increasing the precision of protein pattern detection via the inclusion of SSCs. Proteins 2008. © 2008 Wiley-Liss, Inc.

Ancillary