Get access

Two-stage support vector regression approach for predicting accessible surface areas of amino acids

Authors

  • Minh N. Nguyen,

    1. BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore
    Search for more papers by this author
  • Jagath C. Rajapakse

    Corresponding author
    1. BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore
    2. Biological Engineering Division, Massachusetts Institute of Technology, USA
    • BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore 639798
    Search for more papers by this author

Abstract

We address the problem of predicting solvent accessible surface area (ASA) of amino acid residues in protein sequences, without classifying them into buried and exposed types. A two-stage support vector regression (SVR) approach is proposed to predict real values of ASA from the position-specific scoring matrices generated from PSI-BLAST profiles. By adding SVR as the second stage to capture the influences on the ASA value of a residue by those of its neighbors, the two-stage SVR approach achieves improvements of mean absolute errors up to 3.3%, and correlation coefficients of 0.66, 0.68, and 0.67 on the Manesh dataset of 215 proteins, the Barton dataset of 502 nonhomologous proteins, and the Carugo dataset of 338 proteins, respectively, which are better than the scores published earlier on these datasets. A Web server for protein ASA prediction by using a two-stage SVR method has been developed and is available (http://birc.ntu.edu.sg/∼pas0186457/asa.html). Proteins 2006. © 2006 Wiley-Liss, Inc.

Ancillary