Mining of protein contact maps for protein fold prediction



The three-dimensional structure of proteins is useful to carry out the biophysical and biochemical functions in a cell. Approaches to protein structure/fold prediction typically extract amino acid sequence features, and machine learning approaches are then applied to classification problem. Protein contact maps are two-dimensional representations of the contacts among the amino acid residues in the folded protein structure. This paper highlights the need for a systematic study of these contact networks. Mining of contact maps to derive features pertaining to fold information offers a new mechanism for fold discovery from the protein sequence via the contact maps. These ideas are explored in the structural class of all-alpha proteins to identify structural elements. A simple and computationally inexpensive algorithm based on triangle subdivision method is proposed to extract additional features from the contact map. The method successfully characterizes the off-diagonal interactions in the contact map for predicting specific ‘folds’. The decision tree classification results show great promise in developing a new and simple tool for the challenging problem of fold prediction. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 362–368 DOI: 10.1002/widm.35