Get access
Advertisement

ProDomAs, protein domain assignment algorithm using center-based clustering and independent dominating set

Authors

  • Elnaz Saberi Ansari,

    1. Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
    Search for more papers by this author
  • Changiz Eslahchi,

    Corresponding author
    1. Department of Computer Science, Shahid Beheshti University, G.C., Tehran, Iran
    2. School of Biological Science, Institute for Research in Fundamental Science (IPM), Tehran, Iran
    • Correspondence to: Changiz Eslahchi, School of Biological Science, Institute for Research in Fundamental Science (IPM), Tehran, Iran. E-mail: ch-eslahchi@sbu.ac.ir

    Search for more papers by this author
  • Hamid Pezeshk,

    1. School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
    Search for more papers by this author
  • Mehdi Sadeghi

    1. National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
    Search for more papers by this author

ABSTRACT

Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. Proteins 2014; 82:1937–1946. © 2014 Wiley Periodicals, Inc.

Get access to the full text of this article

Ancillary