Comparison of sequence and structure alignments for protein domains

Authors

  • Aron Marchler-Bauer,

    1. Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Anna R. Panchenko,

    1. Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
    Search for more papers by this author
  • Naomi Ariel,

    Corresponding author
    1. Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
    • Computational Biology Branch, National Center for Biotechnology Information, Building 38A, Room 8N805, National Institutes of Health, Bethesda, MD 20894
    Search for more papers by this author
  • Stephen H. Bryant

    Corresponding author
    1. Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
    • Computational Biology Branch, National Center for Biotechnology Information, Building 38A, Room 8N805, National Institutes of Health, Bethesda, MD 20894
    Search for more papers by this author

  • This article is a US Government work and, as such, is in the public domain in the United States of America.

Abstract

Profile search methods based on protein domain alignments have proven to be useful tools in comparative sequence analysis. Domain alignments used by currently available search methods have been computed by sequence comparison. With the growth of the protein structure database, however, alignments of many domain pairs have also been computed by structure comparison. Here, we examine the extent to which information from these two sources agrees. We measure agreement with respect to identification of homologous regions in each protein, that is, with respect to the location of domain boundaries. We also measure agreement with respect to identification of homologous residue sites by comparing alignments and assessing the accuracy of the molecular models they predict. We find that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent. However, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison. In addition, when overall sequence similarity is low alignments from sequence comparison produce less accurate molecular models, suggesting that they less accurately identify homologous sites. These observations suggest that structure comparison results might be used to improve the overall accuracy of domain alignment collections and the performance of profile search methods based on them. Proteins 2002;48:439–446. © 2002 Wiley-Liss, Inc.

Ancillary