Get access

Performance evaluation of a new algorithm for the detection of remote homologs with sequence comparison


  • Maricel G. Kann,

    1. Department of Chemistry, University of Michigan, Ann Arbor, Michigan
    Current affiliation:
    1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
    Search for more papers by this author
  • Richard A. Goldstein

    Corresponding author
    1. Biophysics Research Division, University of Michigan, Ann Arbor, Michigan
    • Department of Chemistry, University of Michigan, Ann Arbor, MI 48109-1055
    Search for more papers by this author


A detailed analysis of the performance of hybrid, a new sequence alignment algorithm developed by Yu and coworkers that combines Smith Waterman local dynamic programming with a local version of the maximum-likelihood approach, was made to access the applicability of this algorithm to the detection of distant homologs by sequence comparison. We analyzed the statistics of hybrid with a set of nonhomologous protein sequences from the SCOP database and found that the statistics of the scores from hybrid algorithm follows an Extreme Value Distribution with lambda ∼1, as previously shown by Yu et al. for the case of artificially generated sequences. Local dynamic programming was compared to the hybrid algorithm by using two different test data sets of distant homologs from the PFAM and COGs protein sequence databases. The studies were made with several score functions in current use including OPTIMA, a new score function originally developed to detect remote homologs with the Smith Waterman algorithm. We found OPTIMA to be the best score function for both both dynamic programming and the hybrid algorithms. The ability of dynamic programming to discriminate between homologs and nonhomologs in the two sets of distantly related sequences is slightly better than that of hybrid algorithm. The advantage of producing accurate score statistics with only a few simulations may overcome the small differences in performance and make this new algorithm suitable for detection of homologs in conjunction with a wide range of score functions and gap penalties. Proteins 2002;48:367–376. © 2002 Wiley-Liss, Inc.

Get access to the full text of this article