• Pgp;
  • topological models;
  • data fusion

P-glycoprotein (Pgp) is an ATP-dependent efflux transporter protein associated with multidrug resistance in several diseases such as cancer, epilepsy and AIDS. It is preferentially expressed in organs and tissues that function as a barrier (e.g. the gut walls or the blood–brain barrier) or promote the elimination of xenobiotics from the organism (e.g. liver and kidney). Pgp limits drug bioavailability; thus, the recognition of Pgp substrates at the early stages of the drug development cycle is essential for the development of new chemotherapeutic agents to deal with multidrug resistance issues. Here we present the development of several classifier models based on topological descriptors to identify potential Pgp substrates, aimed to be applied as secondary filter in virtual screening campaigns. Receiver Operating characteristic (ROC) curves show that combination of individual models, through data fusion, in a three-model ensemble, allows attaining higher areas under the curve and an overall better behavior in terms of sensitivity or specificity. The individual discriminant functions (dfs) presented have a performance similar to that of the previously reported models and, remarkably, our models only include low-dimensional (up to 2D) molecular descriptors, which makes them adequate for the virtual screening of increasingly large virtual chemical repositories. Copyright © 2011 John Wiley & Sons, Ltd.