• Protein[BOND]protein interaction pocket;
  • L-Shaped PLS;
  • Generative topographic mapping;
  • Lipophilic potentials;
  • Pharmacophoric points


Protein[BOND]protein interaction (PPI) pockets in a host[BOND]guest protein system were predicted using an L-shaped partial least squares (LPLS) method. LPLS is an extension of standard PLS regression, where, in addition to response vector y and regressor matrix X, an extra data matrix Z is constructed which summarizes the background information on X. The regressor matrix X is a similarity matrix of Tanimoto coefficients of the paired fingerprints of pockets, while the background information Z constitutes eleven physico-chemical and geometrical parameters for describing a pocket. The Boolean response vector y specifies whether each pocket is PPI or non-PPI (indicated by 1 and 0, respectively). Constructing two LPLS models, we successfully predicted the PPI pockets of two protein clusters. Clusters 1 and 2 comprised the X-ray crystal structures of protein-peptide complexes and protein-protein complexes, respectively. From the loading plots derived from each model, we could speculate the geometrical constraints of the PPI pockets. These two models are exclusively unique and it was validated by the cross-prediction simulations. The PPI pockets of cluster 1 were projected onto 2D maps by generative topographic mapping (GTM) and the molecular lipophilic potentials (MLP). Among three examples, the MLP distributions were highly similar because the specimens shared the same p53 guest peptides. Contribution to the Autumn School of Chemoinformatics in Nara, Japan, November 27–28, 2013