Get access

Calculation of molecular lipophilicity: State-of-the-art and comparison of log P methods on more than 96,000 compounds

Authors

  • Raimund Mannhold,

    1. Molecular Drug Research Group, Heinrich-Heine-Universität, Universitätsstraße 1, D-40225 Düsseldorf, Germany
    Search for more papers by this author
  • Gennadiy I. Poda,

    1. Pfizer Global R & D, 700 Chesterfield Parkway West, Mail Zone BB2C, Chesterfield, Missouri 63017
    Search for more papers by this author
  • Claude Ostermann,

    1. Nycomed GmbH, Byk-Gulden-Str. 2, D-78467 Konstanz, Germany
    Search for more papers by this author
  • Igor V. Tetko

    Corresponding author
    1. Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology Ingolstädter Landstraße 1, Neuherberg D-85764, Germany
    2. Institute of Bioorganic & Petrochemistry, Ukrainian National Academy of Sciences, UA-02660 Kyiv, Ukraine
    • Helmholtz Zentrum München—German Research Center for Environmental Health (GmbH), Institute of Bioinformatics and Systems Biology Ingolstädter Landstraße 1, Neuherberg D-85764, Germany. Telephone: +49-89-3187-3575; Fax: +49-89-3187-3585.
    Search for more papers by this author

Abstract

We first review the state-of-the-art in development of log P prediction approaches falling in two major categories: substructure-based and property-based methods. Then, we compare the predictive power of representative methods for one public (N = 266) and two in house datasets from Nycomed (N = 882) and Pfizer (N = 95809). A total of 30 and 18 methods were tested for public and industrial datasets, respectively. Accuracy of models declined with the number of nonhydrogen atoms. The Arithmetic Average Model (AAM), which predicts the same value (the arithmetic mean) for all compounds, was used as a baseline model for comparison. Methods with Root Mean Squared Error (RMSE) greater than RMSE produced by the AAM were considered as unacceptable. The majority of analyzed methods produced reasonable results for the public dataset but only seven methods were successful on the both in house datasets. We proposed a simple equation based on the number of carbon atoms, NC, and the number of hetero atoms, NHET: log P = 1.46(±0.02) + 0.11(±0.001) NC−0.11(±0.001) NHET. This equation outperformed a large number of programs benchmarked in this study. Factors influencing the accuracy of log P predictions were elucidated and discussed. © 2008 Wiley-Liss, Inc. and the American Pharmacists Association J Pharm Sci 98:861–893, 2009

Ancillary