Identification of GATC- and CCGG-recognizing Type II REases and their putative specificity-determining positions using Scan2S—A novel motif scan algorithm with optional secondary structure constraints

Authors

  • Masha Y. Niv,

    Corresponding author
    1. Department of Physiology and Biophysics, Weill Medical College of Cornell University, 1300 York Ave., New York, New York 10021
    • Institute of Biochemistry, Food Science and Nutrition, Faculty of Agricultural, Food and Environmental Quality Sciences, The Hebrew University of Jerusalem, P.O. Box 12, Rehovot 76100, Israel
    Search for more papers by this author
    • Current address: Institute of Biochemistry, Food Science and Nutrition, Faculty of Agricultural, Food and Environmental Quality Sciences, The Hebrew University of Jerusalem, PO Box 12, Rehovot 76100, Israel

  • Lucy Skrabanek,

    1. Department of Physiology and Biophysics, Weill Medical College of Cornell University, 1300 York Ave., New York, New York 10021
    2. HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, 1305 York Ave., New York, New York 10021
    Search for more papers by this author
  • Richard J. Roberts,

    1. New England Biolabs, 240 County Road, Ipswich, Massachusetts 01938-2723
    Search for more papers by this author
  • Harold A. Scheraga,

    1. Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853-1301
    Search for more papers by this author
  • Harel Weinstein

    1. Department of Physiology and Biophysics, Weill Medical College of Cornell University, 1300 York Ave., New York, New York 10021
    2. HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, 1305 York Ave., New York, New York 10021
    Search for more papers by this author

Abstract

Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering. Proteins 2008. © 2007 Wiley-Liss, Inc.

Ancillary