Get access

Predicting the disulfide bonding state of cysteines using protein descriptors

Authors

  • M.H. Mucchielli-Giorgi,

    1. Equipe Statistique des Séquences Biologiques, UPRESA CNRS, Université d'Evry, Département de Mathématiques, Evry, France
    Search for more papers by this author
  • S. Hazout,

    1. Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436, Université Paris, Paris CEDEX, France
    Search for more papers by this author
  • P. Tufféry

    Corresponding author
    1. Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436, Université Paris, Paris CEDEX, France
    • Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436; Universite Paris, case 7113, 2 Place Jussieu, 75251 Paris CEDEX 05, France
    Search for more papers by this author

Abstract

Knowledge of the disulfide bonding state of the cysteines of proteins is of major interest in designing numerous molecular biology experiments, or in predicting their three-dimensional structure. Previous methods using the information gained from aligned sets of sequences have reached up to 82% of success in predicting the oxidation state of cysteines. In the present study, we assess the relative efficiency of different descriptors in predicting the cysteine disulfide bonding states. Our results suggest that the information on the residues flanking the cysteines is less informative about the disulfide bonding state than about the amino acid content of the whole protein. Using a combination of logistic functions learned with subsets of proteins homogeneous in terms of their amino acid content, we propose a simple prediction approach, starting from a single sequence, that reaches success rates close to 84%. This score can be improved by avoiding predictions regarding cysteines for which the decision is not well marked. For example, we obtain a score close to 87% correct prediction when we exclude predicting 10% of the cysteines. Proteins 2002;46:243–249. © 2002 Wiley-Liss, Inc.

Ancillary