A hybrid clustering of protein binding sites

Authors

  • Gábor Iván,

    1.  Protein Information Technology Group, Department of Computer Science, Eötvös University, Budapest, Hungary
    2.  Uratim Ltd., Budapest, Hungary
    Search for more papers by this author
  • Zoltán Szabadka,

    1.  Protein Information Technology Group, Department of Computer Science, Eötvös University, Budapest, Hungary
    2.  Uratim Ltd., Budapest, Hungary
    Search for more papers by this author
  • Vince Grolmusz

    1.  Protein Information Technology Group, Department of Computer Science, Eötvös University, Budapest, Hungary
    2.  Uratim Ltd., Budapest, Hungary
    Search for more papers by this author

V. Grolmusz, Protein Information Technology Group, Department of Computer Science, Eötvös University, Pázmány Péter stny. 1/C, H-1117 Budapest, Hungary and Uratim Ltd., H-1118 Budapest, Hungary
Fax: +36 1 381 2231
Tel: +36 1 381 2226
E-mail: grolmusz@cs.elte.hu

Abstract

The Protein Data Bank contains the description of approximately 27 000 protein–ligand binding sites. Most of the ligands at these sites are biologically active small molecules, affecting the biological function of the protein. The classification of their binding sites may lead to relevant results in drug discovery and design. Clusters of similar binding sites were created here by a hybrid, sequence and spatial structure-based approach, using the OPTICS clustering algorithm. A dissimilarity measure was defined: a distance function on the amino acid sequences of the binding sites. All the binding sites were clustered in the Protein Data Bank according to this distance function, and it was found that the clusters characterized well the Enzyme Commission numbers of the entries. The results, carefully color coded by the Enzyme Commission numbers of the proteins, containing the 20 967 binding sites clustered, are available as html files in three parts at http://pitgroup.org/seqclust/.

Ancillary