Structure-based Drug Metabolism Predictions for Drug Design


  • Editor’s invited manuscript to celebrate the 4th Anniversary of Chemical Biology & Drug Design.

* Corresponding author: Hao Sun


Significant progress has been made in structure-based drug design by pharmaceutical companies at different stages of drug discovery such as identifying new hits, enhancing molecule binding affinity in hit-to-lead, and reducing toxicities in lead optimization. Drug metabolism is a major consideration for modifying drug clearance and also a primary source for drug metabolite-induced toxicity. With major cytochrome P450 structures identified and characterized recently, structure-based drug metabolism prediction becomes increasingly attractive. In silico methods based on molecular and quantum mechanics such as docking, molecular dynamics and ab initio chemical reactivity calculations bring us closer to understand drug metabolism and predict drug–drug interactions. In this study, we review important progress in drug metabolism and common in silico techniques adopted to predict drug regioselectivity, stereoselectivity, reactive metabolites, induction, inhibition and mechanism-based inactivation, as well as their implementation in hit-to-lead drug discovery.

The goal of drug discovery is to find best medicines to prevent, treat and cure diseases quickly and efficiently. To fulfill this goal, computational tools have helped medicinal chemists modify and optimize molecules to potent drug candidates, have led biologists and pharmacologists to explore new disease genes and novel drug targets, and have been also guiding drug metabolism scientists to achieve better pharmacokinetic profiles and avoid drug toxicities. These in silico approaches have been widely applied to predict drug absorption, distribution, metabolism, excretion and toxicity (ADMET). The optimization of chemical space in early discovery stage using these in silico tools will shorten the total drug discovery cycle time and at the same time enhance the late-stage drug survival rate. Lipinski’s ‘Rule-of-Five’ is a rule of thumb to predict small-molecule oral drug bioavailability, which initiated the era of data mining and computer-aided drug design in pharmaceutical industry (1). Another virtual ‘prediction’ example is the ‘structure alert chart’ summarized by medicinal chemists for drug design purpose. To develop comprehensive drug metabolism prediction methods will offer a powerful means to identify and analyze potential pharmacokinetic and toxicological problems.

Metabolism is a major consideration for modifying drug clearance and a primary source for drug metabolite-induced drug toxicity. An appropriate pharmacokinetic profile such as a reasonable half-life is mainly controlled by drug metabolism, which should be adapted to the desired purpose and is extremely important for drug development. Drug-metabolizing enzymes catalyze phase I (such as hydrolysis, reduction and oxidation) and phase II (such as glucuronidation, sulfation, acetylation, methylation and glutathione conjugation) reactions (2). The oxidation reactions in phase I often cause toxicities and/or drug–drug interactions. Cytochrome P450s are major enzymes to catalyze the oxidative bioactivation, and other enzymes are flavin monooxygenase, aromatase, monoamine oxidase, and alcohol/aldehyde dehydrogenase. To elucidate the underlying oxidation mechanisms will help improving pharmacokinetic and drug safety profiles. To date, 57 human P450 isoforms are known (3). They catalyze the metabolic activation of xenobiotics/drugs, sterols, fatty acids, eicosanoids and vitamins. Most of them are expressed in human livers including CYP1A2, 2A6, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 3A4 and 3A5, and others are in human lungs such as CYP1A1, 1B1, 2A6, 2B6, 2E1, 2F1, 2J2, 2S1, 3A5 and 4B1 or in minor sites including kidney, brain, small intestine, peripheral blood cells, platelets, neutrophils, and seminal vesicles (3). CYP1A2, 2C9, 2C19, 2D6, 2E1, and 3A4/5 are major ones generally used for in vitro drug metabolism profiling studies in current discovery settings. They contribute to a combined metabolism of over 90% of all drugs in the market, and their individual contribution to the total hepatic P450 expression is approximate 15%, 20%, 5%, 5%, 10%, and 30%, respectively (3).

Over the past 30 years, researchers have attempted to pinpoint the nature and mechanism of P450-catalyzed drug metabolism by comparing substrate selectivity, identifying reaction intermediates, characterizing products, and most recently studying enzyme structures in a molecular level. The most relevant and accepted catalysis mechanism includes several major steps. In the catalytic cycle, it starts with the binding of the substrate to the active site of P450s, then reduction of the ferric heme iron to its ferrous state by co-enzyme and reductase, then binding of molecular oxygen followed by a second one-electron transfer and two-step protonation to form the iron-oxo intermediate (compound I), and then electron transfer for catalysis (3). It is difficult to simulate this whole metabolic process by computers accurately, but with the availability of human P450 structures gradually, structure-based modeling has been developed to facilitate drug design.

In silico drug metabolism prediction methods were ligand-based such as building pharmacophore and QSAR quantitative structure–activity relationship (QSAR) modeling before structure-based drug design emerges (4). QSAR modeling still plays a big role in pharmaceutical industry because of the significant growth of high-throughput screening data. In QSAR, neural networks, classification methods such as recursive partitioning, decision trees, and support vector machines are constructed with topological, structural and electronic descriptors. ADMET properties are thus predicted on the basis of these descriptors such as lipophilicity and solubility. Usually QSAR predictions work particularly well with structurally similar compounds whereas molecules from other regions of the chemical space can cause outliers. Furthermore, QSAR is limited by the quality of the available experimental data. QSAR can sometimes provide hints about the active site if according descriptors appear in the regression equation such as steric constraints and lipophilicity. Pharmacophore models are constructed to overlay structures of all ligands also to simulate the spatial and chemical properties of the binding site (5). Pharmacophore models, structure-based docking and molecular dynamics are all mechanism-based approaches. Recently, the prediction power has been improved in building 3D-QSAR pharmacophore models with spatial atomic descriptors in consideration. These descriptors include molecular interaction fields, electronic properties, and shapes of active sites. The purpose of in silico ADMET prediction is that eventually we should be able to predict drug absorption based on the combination of lipophilicity, solubility, permeability, metabolism, and transporter activity; to predict drug distribution based on the volume of distribution, plasma-protein binding and blood–brain barrier penetration; and to predict clearance based on renal clearance, hepatic metabolism, biliary excretion, and gut stability.

Structure-based methods have been an integral part of drug design. These mechanistic approaches are encroaching on drug metabolism studies such as to predict the binding modes of substrates, conformation change of enzymes, catalysis, and their consequences on physiological system. For example, docking method and local binding energy calculation were used to evaluate the relationship between the metabolism and carcinogenicity (6). Indeed, structure-based modeling by combining molecular and quantum mechanics enables us to predict substrate affinity, lability and metabolic pathways. In practice, mass spectrometry, nuclear magnetic resonance spectroscopy, proteomics, metabolomics, and other advanced bioanalytical techniques have been improved significantly, which can help us validate these in silico predictions quickly. There are still challenges using these tools such as the difficulties in calculating free energies in the thermodynamic cycles, enthalpy and entropy differences and solvation issues (7), but they already saved us time and recourses discovering new drugs faster and more efficiently, and thus provide us more opportunities in drug discovery. We believe, with breakthroughs, in silico methods will play an increasingly important role in drug discovery.

To simplify the complexity of drug metabolism prediction, a hybrid method of combining molecular and quantum mechanics has been proposed, which is designed to predict the substrate binding and rate-limiting steps of catalysis with cytochrome P450s. The primary goal of this review is to present these computational approaches on drug metabolism prediction. We first review drug-induced toxicities and mechanisms, then specific prediction examples for regioselectivity, stereoselectivity, inhibition, induction, and mechanism-based inhibition, as well as the application in hit-to-lead drug discovery.

Drug Metabolite-induced Toxicities

The attrition rate of drug candidates as a result of toxicities remains high. Toxicity is still a major safety concern for drug withdrawal (such as drugs suprofen, terfenadine, rofecoxib, mibefradil, troglitazone, nefazodone, cisapride, remoxipride, tolcapone, and tienilic acid), the ‘black box warning’ (such as the most recent GlaxoSmithKline’s anti-diabetic drug rosiglitazone), and the discontinuation of clinical trials (such as Pfizer’s hypercholesterolemia drug torcetrapib withdrawal from Phase III) (7,8). An analysis of the first-in-human registration for ten big pharma companies demonstrated only a 10% total success rate leading to the final FDA approval (8). The failure rate becomes even higher when all drug candidates in preclinical research are included in the statistics. In this regard, it becomes important to define strategies for drug safety assessment. The traditional drug safety testing approaches include in vivo animal models and in vitro cell-based assays, and most recently in silico assessment is also introduced. QSAR-based expert systems are mainly used in early drug discovery to predict toxicological endpoints including carcinogenicity, teratogenicity, mutagenicity, immunotoxicity, neurotoxicity, developmental toxicity, respiratory sensitization, and skin irritation (9), but the quality of experimental datasets (10) is still a big challenge to develop a predictive model.

Drug metabolism continues to be a prominent contributor in drug safety (11). To understand the molecular and cellular mechanisms responsible for drug- and metabolite-induced toxicity will be critical to reduce the attrition rate. In-target and off-target toxicities are two different mechanisms (12), in detail, the toxicity can be categorized into 4 types (13): type-1 is related to pharmacological pathways either on primary target or non-primary target; type-2 refers to the idiosyncratic toxicities such as drug-induced allergies; type-3 is due to chemical activation such as drug metabolism and the metabolites adducting with macromolecules; and type-4 is the chronic toxicity such as carcinogenesis and teratogenesis. Mitochondrial dysfunction is also a mechanism. It is very possible that the withdrawal anti-diabetic drug troglitazone and cholesterol lowering drug cerivastatin caused mitochondrial dysfunction (14). Long-term mechanisms may come from the impaired mitochondrial replication and protein synthesis, and short-term from the inactivation of the electron transport complexes in the inner membrane of mitochondria through the covalent binding with reactive metabolites (15).

Although in silico predictions of drug metabolism seem unlikely to replace the well-established in vitro and in vivo methods in the near future, these computational tools help early drug design greatly. One potential toxicological factor the can be eliminated in early stage is the mechanism-based toxicity. Xenobiotics may be directly toxic such as nicotine, whereas the toxicity of drugs is largely due to their metabolites, either highly reactive electrophiles or free radicals. Various mechanisms were proposed such as reactive metabolite formation and mechanism-based inactivation. Primarily, metabolic activation process depends on physicochemical properties of compounds, the interactions between compounds and active site amino acid residues of drug-metabolizing enzymes, and catalysis. In addition, transporters such as P-glycoprotein or organic ion-transporting polypeptides can transport drug metabolites to the off-site target for toxicity. Moreover, drug toxicity can be idiosyncratic or caused by genetic diversity. Polymorphisms in P450, N-acetyltransferase and transporter genes impact individual responses to drugs because of the population genetic difference of these enzymes. The ultimate objective of drug metabolism prediction is to evaluate the toxicity, but currently it focuses on the selection and optimization of drug candidates for the advancement of drug discovery pipelines, also with the goal of reducing recourses and recreating more reliable candidates (11).

One of the key roles of preclinical drug metabolism assessment is to identify potential liabilities in new chemical series as early as possible. For example, the identification of reactive metabolites offers promise for decreasing the high rate of attrition, because reactive metabolites may bind covalently to the locus of their formation or at a distant site to inactivate target proteins, change the biochemistry, signal transduction, or even trigger immunological response. Various forms of reactive metabolites are identified and used for the in vitro reactive metabolite screening. These moieties include quinones, quinone methides, quinone imines, imine methides, epoxides, anilines and derivatives, furans, benzylamines, thiophenes, glitazones, thioureas, alkenes, and alkynes (16). However, some reactive metabolites still escape from the detection even with the most advanced bioanalytical instruments, and also there is a gap between reactive metabolite formation and its toxicological consequences (15). A solution especially in design is to incorporate in silico tools as reviewed here.

Several P450-related pharmacological processes can cause drug–drug interactions including induction, inhibition, and mechanism-based inactivation. Induction refers to a process that causes the increase of P450 enzyme expression, an adaptive response to xenobiotics or certain drugs by human body, which is mediated by nuclear hormone receptors such as constitutive androstane receptor (CAR), pregnane X-receptor (PXR), peroxisome proliferator activated receptor (PPAR) and glucocorticoid receptor (GR). Induction-caused drug–drug interactions can reduce the efficacy and/or increase the toxicity of co-administrated drugs (17). Inhibition occurs during substrate or molecular oxygen binding and mechanistically, P450 inhibitors are categorized as reversible or irreversible ones that are measured by analyzing the inhibition of typical substrates of P450s by inhibitors such as using Michaelis–Menten kinetics to determine a reversible inhibition (18). Mechanism-based inactivation is also referred as ‘suicide’ or ‘time-dependent’ inactivation. It is a process that reactive intermediates inactivate the enzyme without leaving the active site of P450s (19). Mechanism-based inactivators can either covalently bind to the amino acid residues of apoprotein or destruct the prosthetic heme group of P450s. Some known mechanism-based inactivators are acetylenes (such as 2-ethynylnaphatalene, 9-ethynylphenathrene, 7-ethynylcoumarin and 5-phenyl-1-pentyne), organosulfur compounds (such as disulfiram, cimetidine, tienilic acid, ticlopidine, and thiazolidinediones), arylamines (such as 1-aminobenzotriazole), cyclic tertiary amines (such as phencyclidine), and furanocoumarins (such as bergamottin, 8-geranyloxypsoralen, and 8-methoxypsoralen). Others include tamoxifen and raloxifene, both of which are the selective estrogen receptor modulators, glabridin (an isoflavan) and anticancer drug N,N′,N″-triethylenethiophosphoramide (19). In addition, the pneumotoxin 3-methylindole was found to be bioactivated by CYP2F1 in human lung to form electrophilic methylene imine reactive intermediates that cause mechanism-based inactivation. Studies showed that 3-methylindole structural analogues are also potential mechanism-based inactivators (Figure 1). For example, zafirlukast (a leukotriene receptor antagonist), MK-0524 (a prostaglandin D2 receptor antagonist), and SPD-304 (a tumor necrosis factor-α inhibitor) were found to form methylene imine reactive intermediates, and the mechanism-based inactivation of CYP3A4 was observed in zafirlukast, SPD-304 and tadalafil.

Figure 1.

 3-Methylindole and its structural analogues are potential mechanism-based inactivators of cytochrome P450s (86–91).

Drug Metabolism Prediction: insights from Docking, Molecular Dynamics and Quantum Chemical Calculations

The success of structure-based drug design is well known, by which approximately 50 compounds have been introduced to the clinical trials and/or approved until 2004 (20). To date, many more drug candidates have been discovered with the help of advanced computational techniques for the reason that in silico tools have been improved considerably and more widely used by pharmaceutical companies in the last several years. For example, using AutoDock combined with relaxed complex method has led to the approval of the first HIV integrase inhibitor raltegravir (Isentress) by Merck in 2007 (21). Furthermore, the advantage of structure-based drug metabolism prediction not only predicts the site of metabolism (metabolic liability) but also elucidates important molecular interactions between substrates and active site residues of drug-metabolizing enzymes, which produce invaluable information for chemical design and redesign to enhance pharmacokinetic profiles and reduce toxicities.

To perform structure-based drug metabolism prediction, experimental determined (such as using X-ray crystallography) structures of P450s or other drug-metabolizing enzymes are required. If no corresponding X-ray crystal structures are available, appropriate homology models are required, but need to be carefully evaluated during the construction process based on enzyme kinetics data and/or site-directed mutagenesis results. The first mammalian P450 structure, rabbit microsomal CYP2C5 was determined in 2003 by Dr. Eric Johnson’s group in the Scripps Research Institute (22) and subsequently, the crystal structures of major human forms become available including CYP1A2, 2A6, 2A13, 2C8, 2C9, 2D6, 2E1, and 3A4 (Table 1). These structures yielded invaluable molecular information of P450 enzymes, and also revealed the coming of fast crystallization of drug-metabolizing enzymes.

Table 1.   Available structures of human cytochrome P450s for public access, as determined by X-ray crystallography (37,77–85)
CYPsPDB CODEResolution (Å)Notes
CYP1A22HI41.95Bound with Alpha-Naphthoflavone
CYP2A61Z10, 1Z111.90, 2.05Bound with Coumarin and Methoxsalen
2FDU, 2FDV, 2FDW, 2FDY1.65 – 2.05Bound with N,N-Dimethyl(5-(pyridin-3-yl)furan-2-yl)methanamine, N-Methyl(5-(pyridin-3-yl)furan-2-yl)methanamine, (5-(Pyridin-3- yl)furan-2-yl)methanamine, and Adrithiol
2PG6, 2PG7, 2PG51.95 – 2.802A6 Mutants L240C/N297Q, N297Q/I300V and N297Q
CYP2A132P852.35Bound with Indole
CYP2C81PQ22.70No Substrate Bound
2NNI, 2NNJ, 2NNH2.28 – 2.80Bound with Montelukast, Felodipine, and 9-Cis-Retinoic Acid
CYP2C91OG2, 1OG52.55, 2.60Bound with Warfarin
1R9O2.00Bound with Flurbiprofen
CYP2D62F9Q3.00No Substrate Bound
CYP2E13E4E, 3E6I2.60, 2.20Bound with 4-Methylpyrazole and Indazole
CYP3A41TQN2.05No Substrate Bound
1W0E, 1W0F, 1W0G2.65 – 2.80No Substrate Bound, and Bound with Metyrapone and Progesterone
2J0D, 2V0M2.75, 3.80Bound with Erythromycin and Ketoconazole

Modeling drug metabolism requires predicting substrate binding to the active sites and adapting computational approaches to account for the catalysis. There is no single fully integrated structure-based computational tool that is able to simulate all the possible processes of drug metabolism. However, a combined computational strategy was designed to predict the regioselectivity of macrolide immunosuppressants sirolimus and everolimus: their hydroxylation and O-dealkylation reactions catalyzed by CYP3A4 (23). In addressing the molecular mechanisms of sirolimus and everolimus bioactivation, the proposed prediction method consisted of three separate steps. First, the substrates were docked into the active site of CYP3A4 with energetically favored binding poses selected. Second, the chosen docking poses were placed into AMBER force field for molecular dynamic simulation with the substrate binding orientation and the distance to the heme iron calculated, respectively. Third, the hydrogen atom abstraction of selected fragments of both substrates was calculated by quantum chemistry method (Figure 2). This prediction strategy has been gradually accepted and adopted by other computational chemists and drug metabolism scientists to predict drug metabolism.

Figure 2.

 Energy change from substrate binding to product formation in cytochrome P450-catalyzed drug metabolism. A combination of docking, molecular dynamics and quantum chemical calculation is proposed for in silico prediction of drug metabolism.

One of the goals of drug metabolism prediction is to find the lowest-energy binding structure complex containing drug-metabolizing enzyme and substrate, which in many cases may account for one of the most possible orientations for catalysis. Other goals include the prediction of the rates of metabolism based on drug molecule activation mechanisms. The success of prediction requires the development of an accurate energy potential function in the above-mentioned sequential calculations. Simplification of the process is invariably necessary in considering the computing time; however, a single one-step modeling is not adequate from a mechanistic view, but sometimes still produces relative or comparable energy information that is sufficient for drug design purpose especially for the compounds within the same chemical series. For example, docking was successfully used to predict CYP2C9 metabolic activation of the COX-2 inhibitor celecoxib and its 13 analogues (26) Figure 3 demonstrated the predicted two binding pose clusters of midazolam with docking method applied alone, and Figure 4 demonstrated the quantum chemistry-based calculation for the activation energy of testosterone carbon hydrogens indicating the 6-β position is the most likely CYP bioactivation place.

Figure 3.

 Two major energetically favored binding clusters of midazolam in the active site of CYP3A4 from molecular docking prediction (A) 1′-hydroxylation position (B) 4-hydoxylation position. CYP3A4 is shown in a cartoon format (transparency gray), and midazolam (yellow) and heme (pink) are shown with colored sticks: nitrogen, blue; oxygen, red; chlorine, green; fluorine, cyan.

Figure 4.

 Dynamic analysis of active site residues of CYP3A4 crystal structures (A) active site of CYP3A4 (B) overlapped selected active site residues from various crystal structures with different substrates bound.

Docking method which involves the prediction of substrate conformation and orientation within the active site of enzymes (27) is composed of a tandem prediction process: posing and ranking. Posing searches energetically favored binding orientations of substrates that is often determined by a certain computational algorithm. On the contrary, ranking scores various poses derived from posing with different interaction energy calculation functions such as electrostatic, hydrogen bonding, van der Waals, entropy, and explicit solvation (24). Searching protocols may be systematic, stochastic or a simulation. A systematic searching usually incrementally grows the substrate within the active site of CYPs, in other words, rigid cores of substrates are docked in first and then the flexible parts, based on libraries of pre-generated conformations. They are less challenging comparing to stochastic methods, which apply random changes of substrates and then evaluate a predefined probability function. Monte-Carlo and genetic algorithms are classic stochastic algorithms. Simulation method like molecular dynamics accommodates ligands in local minima of the energy surface that will be discussed later.

A scoring function is required to rank the favored ligand conformations, such as using force field-, empirical-, or knowledge-based methodologies. Fundamentally, these scoring functions make various assumptions; therefore, the evaluation process is simplified and can only roughly simulate the biological process. Researchers have tried to eliminate assumptions and combine as many as energy parameters to make it more reliable. In force field-based scoring functions, energetic terms are derived from classic mechanics such as AMBER95 or CHARMm22, and the atomic interactions between protein and ligands are simulated based on known factors controlling the molecular recognition. Both enzyme–substrate interaction energy and substrate internal energy are calculated, but the internal protein energy are generally excluded. For example, the van der Waals energy is given by a Lennard–Jones potential function and electrostatic energy is calculated using Coulombic formulation with a distance-dependent dielectric function. Empirical scoring functions depend on the molecular datasets used for regression analysis and fitting that are obtained from experimentally determined binding energies. It is usually simpler than force field-based scoring function but does not provide mechanistic information for the purpose of molecular modification. Knowledge-based scoring is designed to reproduce experimental structures rather than binding energies. The advantage is its computational simplicity, but the function is only based on limited sets of protein–ligand complex structures and also limited by the atom-types being used to derive the corresponding potential of mean force.

Our perception of the dynamic change of P450 active site grows with the expanding X-ray crystal structures, and also from site-directed mutagenesis and molecular dynamic analysis. Molecular dynamics simulate the flexibility of CYP active site residues in a time scale, generally in the order of nanoseconds. Depending on the size of the binding pocket, multiple binding poses are possible to give rise to different metabolic products. This might help explain why regioselectivity or stereoselectivity often occurs for P450 substrates. With molecular dynamics combined, docking methods can also be enhanced. Although our understanding of P450-catalyzed drug metabolism and our ability to model the process are expanding, the prediction of active site residue dynamic change in a time scale remains a challenge. The stability of the substrate-P450 complex is likely the result of attractive van de Waals interactions, solvation free energy, hydrogen bonds as well as hydrophobic and polar interactions. In light of these interactions, molecular dynamic approaches are often applied to identify specific residues that direct substrate binding. Ito et al. simulated the motion of active site residues of CYP2D6 in the process of propranolol binding, and found a significant movement of F-G loop and H-I loop, with several residues identified as key sites such as Phe120, Glu216, Asp301, Phe483, Phe219, and Glu222 (25). Their theoretical analysis was indeed correlated with site-directed mutagenesis results. In another study, molecular dynamic analysis was placed in-line with neural network modeling to predict the binding of 82 compounds to CYP2D6, which demonstrated well-defined predictive behavior (28). Another molecular dynamic analysis with spectroscopic study showed the large-scale structural transition of CYP2B4 when the substrate 4-(4-chlorophenyl)imidazole binds. Interestingly, Muralidhara and coworkers proposed a theory of ‘open from’ and ‘closed form’ of P450 enzymes (29), which states that the structural variations are mainly located in the B’ helix and F-G region.

A continuous time-course structural determination would be an ideal method to visualize the dynamic change of P450 active site residues, but still, not possible in the current experimental settings. However, the co-crystallization with different substrate bound has created a small database illustrating the dynamic change of P450 enzymes especially the dynamic orientations of side chains. Overlapping several CYP3A4 structures, we analyzed the dynamic change of active site residues. Primary interests have centered on different substrate binding, which showed some residues are extremely flexible such as the obvious movement of Arg212, Ser312, Phe304, and Glu308 (Figure 5). The interplay between these residues makes it difficult, if not impossible, to fully grasp the protein dynamics when individual substrate binds. But the information is important for us to determine docking parameters such as to set part of protein flexible and the rest in rigid to increase the prediction accuracy and reduce computing time.

Figure 5.

 Hydrogen atom abstraction of testosterone calculated by density functional theory. The DFT energy shows 6β position hydrogen is the most energetically favored for abstraction.

Quantum chemical calculation is a major tool for predicting P450 catalysis. From the calculated energy barrier value, we can tell the absolute or relative oxidation potential in xenobiotic bioactivation. The ab initio calculations using Hartree–Fock or density functional theory are often used to calculate the reactivity of substrates. For aliphatic hydroxylation and N,O-dealkylation reactions, the hydrogen atom abstraction to form a radical is the rate-limiting step for catalysis. However, the mechanism of aromatic hydroxylation is still unclear. The arene epoxides are generally accepted intermediates for the next-step hydrogen migration from hydroxylation site to the adjacent carbon (‘NIH shift’) (3). Except the activation energy, the orientation of the aromatic ring is also important, which is orientated for either ‘side-on’ addition (ring vertical to the heme porphyrin) or ‘face-on’ addition (ring parallel to the heme porphyrin). To ensure the active site environment parameters included to determine the orientation, heme-oxygen species and other interacting active site residues are needed to be incorporated into quantum chemical calculation. The hybrid quantum mechanical/molecular mechanical (QM/MM) method is ideal to do the calculation (30). For example, a hybrid QM/MM method was used to simulate the benzene hydroxylation by CYP2C9 by Bathelt et al. and the results suggested a roughly equal ‘side-on’ and ‘face-on’ pathways, which posed a theoretical challenge because early opinions differed because the ‘face-on’ addition was considered as the dominant mechanism for π-stacking interactions (31). Quantum chemical calculations also shed light on the regioselectivity studies of cyclohexene and propene. Cohen et al. (32) and Hirao et al. (33) studied the hydroxylation and epoxidation reaction pathway ratios and they concluded both electrostatic and hydrogen bonding interactions are important for the difference, especially for propene, the penta-radical and/or tri-radical determine the mechanisms of hydrogenation and epoxidation. For compound I, the Fe–S bond length changes when the substrate binds, which can be calculated by QM/MM methods, for example, it was found the Fe–S bond of CYP3A4 compound I tends to elongate (34). Moreover, the parameters calculated by quantum mechanics were tested as theoretical descriptors to build QSAR models such as molecular orbital derived descriptors for flavone substrates binding to CYP1A (35).

Water molecules often play an essential role in drug–enzyme interactions: the displacement of water molecules from binding site is one principal source of binding free energy (24). Water molecules were found in the binding pockets of drug-metabolizing enzymes, which can make favorable interactions with either ligands or protein active site residues. Therefore, active-site water molecules are often included in computational methods as one of the strategies to predict drug metabolism. But current efforts in the simulation of these water molecules mainly focus on the molecular dynamics of the ‘on’ and ‘off’ states to unravel their significant roles in substrate binding as well as their effects on docking and virtual screening accuracy. In some docking protocols water molecules are allowed to rotate around their three principal axes. With this consideration, a study on the prediction of 65 CYP2D6 substrates binding and their sites of metabolism demonstrated that water molecules are located in the energetically favored locations. Applying this strategy, the accuracy of docking algorithms was significantly enhanced (36). Thus, an improved drug metabolism prediction strategy could plausibly include water molecules to the settings of simulation. Not all water molecules are resolved in crystal structures of cytochrome P450s, but researchers have identified some of these structural water molecules recently (Figure 6). For example, an ordered water (wat733) was found in the active site of CYP1A2, which is located in an appropriate distance range to form hydrogen bonds with the carbonyl group of the inhibitor α-naphthoflavone (2.92 Å) and also the carbonyl group of Gly316 (2.84 Å) of helix I (83). Several water molecules were identified to be important factors for CYP2C9 substrate binding: one (wat600) is located between the substrate flurbiprofen and the heme iron at a distance of <3.0 Å that accounts for the hydrogen bond with the backbone carbonyl group of Ala297, one (wat819) is located at the distortion region of helix I, and another one (wat842) is between flurbiprofen and the turn region of the substrate recognition site-6 (85). In CYP3A4, a cluster of water molecules was also observed above the heme porphyrin (37), which can be replaced totally or partially that depends on the size of substrates.

Figure 6.

 Essential water molecules in the active sites of cytochrome P450s. (A) CYP1A2 and inhibitor α-naphthoflavone (ANF) (B) CYP2C9 and substrate flurbiprofen FBP, and (C) CYP3A4. Water molecules are depicted as red spheres.


Many drug metabolism prediction efforts have focused on the prediction of P450 regioselectivity or metabolic pathways (38,39). The derived information is important for modifying metabolic liability in hit-to-lead. Camitro’s metabolism models were widely used in early years to predict enzyme-substrate binding affinities and metabolic sites using empirical and quantum chemical approaches (40). By definition, the regioselectivity of P450-catalyzed reactions is the preference of one or several substrate sites of metabolism over others. Zhang et al. investigated the metabolism of the C3′ substituents of a series of taxane analogues by CYP2C8 and 3A4 (41) using docking approaches, and found the binding of molecules were re-oriented from in favor of 3′-p-hydroxylation to 4″-hydroxylation reaction when replacing the C3′ aromatic ring (paclitaxel) with an aliphatic chain (cephalomannine). In addition, several combined models were used to predict regioselectivity. In one model, docking was combined with an empirical activation energy model as well as reaction enthalpy and ionization potential descriptors to predict the substrate reactivity (42). The authors evaluated 72 well-known CYP3A4 substrates and produced a 76% success rate. In another model, docking and ligand-based in silico methods were applied to predict metabolism sites of 70 known CYP2C9 substrates successfully (43). These studies also indicated that the regioselectivity is probably controlled by one or several active site residues. For example, Lafite et al. investigated the hydroxylation of terfenadine derivatives by CYP2J2 (44) and found the presence of bulky residues Ile127, Ala311, Ile375, Ile376, and Val380 above the heme iron to form a hydrophobic core, and a hydrogen bond formed between the keto groups of terfenadine derivatives and Arg117. These results demonstrated that the control of substrate accessibility can direct the regioselectivity, although improved calculations are needed to further test the hypothesis. In another study, the bioactivation of tobacco procarcinogen 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone (NNK) by CYP2A6 and 2A13 was compared by docking approaches. It was found that both Met365 and Ile366 of CYP2A13 provided more space for NNK positioning in the active site for α-methyl hydroxylation and α-methylene hydroxylation reactions but not in CYP2A6 (84), which explains the regioselectivity difference of these two enzymes. The regioselectivity was also observed using enzyme mutants. Keizers et al. found that the O-demethylation of 3,4-methylenedioxy-N-alkylamphetamines was dominant in wild type CYP2D6 but N-demethylation and N-hydroxylation reactions occurred instead in its F120A mutants (45).


Structure-based drug metabolism prediction is also a good option to elucidate the metabolic differences in enantiomers. Indeed, many drugs are synthesized for one enantiomer only to avoid the stereoselectivity by drug-metabolizing enzymes (Figure 7). The difference of metabolic profiles of phenylahistin enantiomers was evaluated using the energy calculation of docked enantiomers by molecular mechanics and Poisson–Boltzmann surface area approaches. It demonstrated that (-)-phenylahistin is 1.5–8 times less metabolized than (+)-phenylahistin by CYP3A4 (46). Docking of (+)-phenylahistin in the active site of CYP3A4 showed that both isoprenyl group and nitrogen atom at the imidazolyl ring can interact with the heme iron for the hydroxylation reaction; however, for (-)-phenylahistin, the prediction results differed dramatically (isoprenyl group is far from the heme and the lone pair of the imidazolyl nitrogen does not point toward the heme iron), because they possess very similar physicochemical properties. The difference of enantiomer metabolites from non-steric compounds can also cause enantioselectivity. Bikadi and Hazai studied CYP2C-catalyzed O-demethylation of methoxychlor (47), and demonstrated that both CYP2C9 and 2C19 favored the formation of S-mono-OH-methoxychlor, no enantioselectivity found for CYP2C3, but CYP2C5 favored the formation of R-mono-OH-methoxychlor. The chiral preference was further studied by molecular dynamics, and the authors found it was caused by the variance and flexibility of active site residues. In addition, deoxypodophyllotoxin was found to be stereoselectively bioactivated by CYP3A4 to epipodophyllotoxin but not to podophyllotoxin (48). The docking and Monte-Carlo-based refinement revealed that the epipodophyllotoxin formation is in a more appropriate orientation, or specifically, the hydroxylation is mediated by the hydrogen bonding between deoxypodophyllotoxin and active site residues Arg212, Glu374 and Arg105.

Figure 7.

 Enantioselectivity and stereoselectivity of several known compounds.

Reactive metabolites

The mechanism of reactive metabolite formation by CYPs has been extensively studied computationally, mainly by quantum chemical calculations. With different substituents added to lead compounds, the relative free energy change for activation is calculated to guide lead optimization. Docking and molecular dynamics are often combined with quantum chemical calculations to increase the prediction power. Many successful applications have been documented. For example, the mechanism of phenacetin bioactivation to form reactive metabolite quinone imine was studied with ab initio energy and spin distribution calculation, to compare the hydrogen atom abstraction at the α-methylene carbon atom and electron abstraction from the nitrogen atom at the acetylamino side (49). A similar study on the metabolic activation of acetaminophen to quinone imine by Loew and Goldblum predicted the hydrogen atom abstraction from the phenol group is more probable than from the amide nitrogen (50), which was also confirmed by in vitro studies (51). In another study, the bioactivation of carbamazepine by CYP3A4 showed that the reactive metabolite10,11-epoxide formation is more probable than the hydroxylation of the aromatic ring calculated by density functional theory (52). With these mechanistic results, we can design new compounds with reduced oxidation potential and thus eliminate the reactive metabolite formation. For example, the bioactivation energy of the antipsychotic drug remoxipride to from a hydroquinone metabolite and further to an electrophilic quinonone was calculated by hybrid density functional theory (53). The authors then redesigned remoxipride by adding amide and methoxy groups. In addition, the ab initio calculation was used to compare the oxidation potential of nefazodone and buspirone to form reactive intermediate quinone imines (54). It demonstrated buspirone is much more difficult to undergo two-electron oxidation than nefazodone, and thus less reactive metabolite is expected. From these drug design examples, we know by calculating the oxidation potential of reactive metabolite formation, we can modify molecules to enhance drug metabolic profiles or reduce toxicity issues.


The mechanism of inhibition was studied by docking and molecular dynamic approaches. Quinidine is a potent competitive inhibitor of CYP2D6 but not a substrate. McLaughlin et al. investigated the binding modes of quinidine together with site-directed mutagenesis experiments (55). They found in CYP2D6’s F120A and E216F mutants, quinidine indeed became a substrate, indicating the importance of these two residues in controlling substrate/inhibitor switching. Both substrates and inhibitors bind to enzymes in energetically favored manner, but for inhibitors they are not in the appropriate orientation for catalysis or bind to no-catalysis sites such as the ligand entrance region of P450s. Structurally diverse compounds were evaluated by several research groups for their inhibition of CYP3A4 using docking methods and also considering ligand conformation, orientation, stereoisomer, and protonation state. The results demonstrated that docking approach is indeed a good option to screen a library for potential inhibition and drug-drug interactions (56,57). In addition, co-operativity properties that may determine the inhibition showed the homotropic and/or heterotropic co-operativity of 1-alkoxy-4-nitrobenzenes and 1,4-phenylene diisocyanide in CYP1A2 (58). These calculations were used to design new compounds with modified inhibition. For example, substitution of terfenadine benzylic alcohol group to ketone demonstrated 10-fold inhibition increase with CYP2J2 (59).


CYP3A4 is induced by the activation of PXR, and its structure has been determined recently (60). Docking showed the addition of polar groups to one end of the activator attenuated the activation because the new molecules destabilize the hydrophobic interactions in the binding pocket (Figure 8) (61). With more structures of key induction regulators determined, we expect to predict the induction using the similar in silico tools above.

Figure 8.

 Structure of pregnane X receptor. The monomers are in yellow and green, ligands in hot pink, and steroid receptor coactivator 1 (SRC-1) in red on the back.

Mechanism-based inactivation

Heme porphyrin ring and some active site residues are primary targets for the binding of reactive metabolites that can consequently cause the mechanism-based inactivation before their release from P450 active site (19). Recent structural modeling and proteomic studies have shed light on possible inactivation mechanisms for either heme porphyrin or apoprotein residue alkylation with drug molecules. Raloxifene is a mechanism-based inhibitor of CYP3A4, which is bioactivated to either diquinone methide or epoxide reactive metabolites. Several research teams have identified these adduction sites such as Cys239 and Tyr75 for quenching reactive intermediates (62–64), in particular, it was postulated that the nucleophilic OH group of Tyr75 or the sulfur group of Cys239 can adduct with reactive metabolite directly. In addition, CYP3A4 and CYP3A5 were compared to elucidate the mechanism of raloxifene bioactivation. It was found that the mechanism-based inactivation was only present in CYP3A4 but not in CYP3A5 because Cys239 is replaced by a serine residue in CYP3A5 (64). Kang et al. showed that the formation of reactive 10,11-epoxide, 2,3-arene oxide and iminoquinone metabolites of carbamazepine adducted with Cys239 for mechanism-based inactivation (65). The cysteine-specific modifying reagent iodoacetamide was also applied to test the importance of the cysteine residue, and the results showed that iodoacetamide can prevent the mechanism-based inactivation (63), and thus indicated Cys239 may be one of the cysteine to quench the reactive metabolite. Moreover, the apoprotein alkylation by reactive metabolites also depends on the structures of substrates. Docking showed the furan ring of bergamottin is bioactivated to an epoxide that inactivates CYP3A5 and CYP2B6. The furan site was observed as the primary site for the exposure to the oxyferryl center of heme (66). Ser360 was also found to be the adduction site by the reactive metabolites of 17-α-ethynylestrodiol (67).

Application in hit-to-lead Drug Discovery

It is important to maintain both quality and quantity of early lead series to discover a new therapeutic agent in a reasonable time frame (68). In hit-to-lead drug discovery, high-throughput screening requires a suitable assay to be developed; as attractive alternatives fragment-based and docking methods are especially useful for focused screening. The application of better hit-to-lead approaches can help us achieve the goal of fast and efficient drug discovery. Structure-based drug metabolism prediction has the potential to offer new drug design approaches based on metabolic profiles for hit-to-lead discovery, which should facilitate our endeavor on expanding chemical space and searching high-quality drug candidates. Drug metabolism prediction using docking, molecular dynamic and quantum chemical methods provided a powerful insight into the application of these in silico tools in drug discovery. We have found great success in predicting drug metabolism using the similar methods in our drug discovery projects. Indeed, we need a cost-effective and high-efficient hit-to-lead process to expand the chemical space and find the best lead compounds with minimum resources, so the in silico drug metabolism approaches described in this review can be a great benefit for the optimization of chemical space with the purpose of exploring best-in-class medicines (Figure 9).

Figure 9.

 Schematic representation of hit-to-lead and lead optimization approaches to defining chemical space based on both drug receptor space and ADME space.

Substantial attention has been given to use docking-related molecular mechanical approaches to make molecules more potent. The modification such as adding or changing functional groups primarily focuses on the increase of ligand–protein interactions, which generally lowers binding energies or increases the potency. For examples, a docking and scoring method was applied to explore potent β-secretase inhibitors (69). More importantly, to develop a safer compound means more chance to move a drug candidate into the market. For example, the cyclooxygenase-2 inhibitor celecoxib is less potent than rofecoxib, but much safer according to meta-analysis postmarket studies, which was the main reason it is still in the market but the rest of same-family drugs such as rofecoxib got withdrawal for increased cardiovascular toxicity. As discussed above, the current trend of hit-to-lead and lead optimization is the early evaluation of ADMET properties such as the optimization of clearance profiles and elimination of P450 inhibition (70). The trade-off for improved ADMET properties is sometimes decrease the potency, and vice versa. Therefore, to find the overlap region of ADMET and potency properties early trade-off is essential for reducing chemical liabilities and accelerating hit-to-lead process. Lead compounds can then be optimized based on the collected pharmacological and ADMET properties. It was noted that the receptor binding sites are difficult to define for many drug targets (71), but for P450 enzymes they are highly conserved in a fixed region. Considering this, the success rate for drug metabolism modeling can be higher than target protein simulation. Overall, these in silico approaches are helpful for the identification of tighter or specific binders with better ADMET properties (72).

The strategy of blocking the metabolic labile sites of compounds was driven by attaching a metabolic stable group such as a fluorine atom or a trifluoromethyl group. For example, the metabolic activation of a series of celecoxib structural analogues showed that replacing the methyl group of 4-methylphenyl moiety with a trifluoromethyl group totally blocked the methyl hydroxylation by CYP2C9 (but COX-2 inhibition rate remains the same) (73), which indicate that new compounds have improved metabolic properties without losing the potency and selectivity. However, the merit of adding a trifluoromethyl or fluorine group is unpredictable because new metabolic lability can be introduced by switching the metabolic pathways. Therefore, the blind practice of including them for every drug design without considering substrate-protein interactions can block our vision during the exploration of wider chemical space. Once the structures of receptors or P450 enzymes are available, we can use structure-based computational tools to expand the chemical space and also achieve better pharmacokinetic profiles for safer drugs.

Therefore, a promising strategy to reduce attrition rate and shorten discovery cycle is to harness these in silico tools to refine ADMET properties in the early stage. Indeed, many recent drug discovery efforts have focused on drug metabolism prediction early. In the discovery of chemokine receptor CCR5 antagonists, the original hits contained imidazopyridine moieties that are common CYP2D6 inhibitors (74). The authors redesigned the candidates after modeling the binding of selected compounds in the active site of CYP2D6, which demonstrated that the pyridine nitrogen atom is the closest atom to the heme iron. Based on the interaction information, imidazopyridine moieties were replaced by the carbon analogues benzimidazoles. Consequently, the drug-drug interaction was avoided early, and thus a high quality lead was discovered. The development of metabotropic glutamate receptor antagonists tells a similar story. The pyrazolo[3,4-d]pyrimidin-4-one series were found to be original hits as the potent and selective antagonists but showed high clearance. The authors then improved the ADMET properties by adding electron-withdrawing groups to the molecule (75). In addition, rational approaches to eliminating reactive metabolite formation are worthy of consideration. The trifluoromethylpyrimidine series were found to be proline-rich tyrosine kinase-2 inhibitors but also positive in reactive metabolite screening. Clearly, to remove the potential toxicities was required for further development. The modification of the carbonyl group of the 5-aminooxindole moiety to a stronger electron-withdrawing sulfone group was found to eliminate reactive metabolite formation, which was predicted correctly by density functional theory calculations (92). These successful hit-to-lead efforts in drug metabolic profile improvement suggest that the combination use of in silico tools at both receptor space and drug metabolism space will be likely to turn the serendipity of drug discovery into powerful drug design strategies.

Several limitations of these tools should be noticed. Crystal structure resolution, accuracy of the force field, docking algorithms and quantum chemical calculation all determine the prediction power. The quality of crystal structures are different because the flexible regions of some proteins are difficult to be determined. A higher-quality structure will generally mean higher prediction power. For trend analysis purpose, the high-throughput docking methods have been developed for fast and economic discovery process (71). To avoid these limitations various strategies have been applied. Scientists at Merck found that the virtual screening success rate was enhanced by the combination of two docking methods (93). In addition, free-energy perturbation calculation in the context of Monte-Carlo statistical mechanic simulation was used concurrently to optimize docked compounds. For example, modification of core heterocyclic moieties of 2-anilinyl-5-benzyloxadiazole with free-energy calculation led to potent non-nucleoside inhibitor of human immunodeficiency virus reverse transcriptase (76). Moreover, de novo ligand design such as fragment-based design and ‘scaffold hopping’ are attractive approaches that ultimately allow more hit series explored, and thus produce better chance of successful hit-to-lead drug discovery.

In summary, recent drug metabolism prediction efforts reviewed here highlighted the importance of computational methods for drug design and hit-to-lead drug discovery. Given the rapid accumulation of new structures of drug-metabolizing enzymes, it is increasingly clear that computational tools to predicting drug metabolism will be crucial in faster and more efficient drug discovery. To conclude, considerable challenges remain for in silico drug metabolism prediction using structure-based methods, but the opportunities for in silico drug metabolism prediction to become a mainstream component of drug design are now apparent, and the accurate prediction now appears to be an achievable goal in the near future.