Predicting disulfide connectivity patterns

Authors

  • Chih-Hao Lu,

    1. Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan
    Search for more papers by this author
  • Yu-Ching Chen,

    1. Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan
    Search for more papers by this author
  • Chin-Sheng Yu,

    1. Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
    Search for more papers by this author
  • Jenn-Kang Hwang

    Corresponding author
    1. Institute of Bioinformatics, National Chiao Tung University, Hsinchu 30050, Taiwan
    2. Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
    3. Core Facility for Structural Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan
    • Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 30050, Taiwan
    Search for more papers by this author

Abstract

Disulfide bonds play an important role in stabilizing protein structure and regulating protein function. Therefore, the ability to infer disulfide connectivity from protein sequences will be valuable in structural modeling and functional analysis. However, to predict disulfide connectivity directly from sequences presents a challenge to computational biologists due to the nonlocal nature of disulfide bonds, i.e., the close spatial proximity of the cysteine pair that forms the disulfide bond does not necessarily imply the short sequence separation of the cysteine residues. Recently, Chen and Hwang (Proteins 2005;61:507–512) treated this problem as a multiple class classification by defining each distinct disulfide pattern as a class. They used multiple support vector machines based on a variety of sequence features to predict the disulfide patterns. Their results compare favorably with those in the literature for a benchmark dataset sharing less than 30% sequence identity. However, since the number of disulfide patterns grows rapidly when the number of disulfide bonds increases, their method performs unsatisfactorily for the cases of large number of disulfide bonds. In this work, we propose a novel method to represent disulfide connectivity in terms of cysteine pairs, instead of disulfide patterns. Since the number of bonding states of the cysteine pairs is independent of that of disulfide bonds, the problem of class explosion is avoided. The bonding states of the cysteine pairs are predicted using the support vector machines together with the genetic algorithm optimization for feature selection. The complete disulfide patterns are then determined from the connectivity matrices that are constructed from the predicted bonding states of the cysteine pairs. Our approach outperforms the current approaches in the literature. Proteins 2007. © 2007 Wiley-Liss, Inc.

Ancillary