• Permeability;
  • Lipophilicity;
  • QSAR;
  • logD;
  • Random forest;
  • Caco-2;
  • LLC-PK1;
  • ADME;
  • Molecular modeling


A QSAR model for predicting passive permeability (Papp) was derived from Papp values measured in the LLC-PK1 cell line. The QSAR method and descriptor set that performed best in terms of cross-validation was random forest with a combination of AP, DP, and MOE_2D descriptors. The QSAR model was used to predict the Caco-2 cell permeability for 313 compounds described in the literature with good success. We find that passive permeability for different cell lines can be predicted with similar molecular properties and descriptors. It is shown that the variation in experimental measurements of Papp is smaller than the error in QSAR predictions indicating that predictions are not quantitatively perfect, although qualitatively useful. We get better predictions if the training set is large and diverse, rather than smaller and more internally consistent. This is because prediction accuracy falls off quickly with decreasing similarity to the training set and it is therefore better to have as large a training set as possible. While single physical parameters are not as good as a full QSAR model in predicting Papp, logD seems the most important parameter. Intermediate values of logD are associated with higher Papp.