SEARCH

SEARCH BY CITATION

Abstract

Knowledge discovery in databases has become an integral part of practically every aspect of bioinformatics research, which usually produces, and has to process, very large amounts of data. Rational drug design is one of the current scientific areas that has greatly benefited from bioinformatics, particularly a step, which analyzes receptor–ligand interactions via molecular docking simulations. An important challenge is the inclusion of the receptor flexibility since they can become computationally very demanding. We have represented this explicit flexibility as a series of different conformations derived from a molecular dynamics simulation trajectory of the receptor. This model has been termed as the fully flexible receptor (FFR) model. In our studies, the receptor is the enzyme InhA from Mycobacterium tuberculosis, which is the major drug target for the treatment of tuberculosis. The FFR model of InhA (named FFR_InhA) was docked to four ligands, namely, nicotinamide adenine dinucleotide, pentacyano(isoniazid)ferrate II, triclosan, and ethionamide, thus, generating very large amounts of data, which needs to be mined to produce useful knowledge to help accelerate drug discovery and development. Very little work has been done in this area. In this article, we review our work on the application of classification decision trees, regression model tree, and association rules using properly preprocessed data of the FFR molecular docking results, and show how they can provide an improved understanding of the FFR_InhA-ligand behavior. Furthermore, we explain how data mining techniques can support the acceleration of molecular docking simulations of FFR models. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 532–541 DOI: 10.1002/widm.46