Novel Cruzain Inhibitors for the Treatment of Chagas’ Disease

The protozoan parasite Trypanosoma cruzi, the etiological agent of Chagas’ disease, affects millions of individuals and continues to be an important global health concern. The poor efficacy and unfavorable side effects of current treatments necessitate novel therapeutics. Cruzain, the major cysteine protease of T. cruzi, is one potential novel target. Recent advances in a class of vinyl sulfone inhibitors are encouraging; however, as most potential therapeutics fail in clinical trials and both disease progression and resistance call for combination therapy with several drugs, the identification of additional classes of inhibitory molecules is essential. Using an exhaustive virtual-screening and experimental validation approach, we identify several additional small-molecule cruzain inhibitors. Further optimization of these chemical scaffolds could lead to the development of novel drugs useful in the treatment of Chagas’ disease.

The World Health Organization estimates that over 10 million people are infected by the protozoan parasite Trypanosoma cruzi, with another 25 million at risk a . The associated illness, called Chagas' disease, is spread through a triatomine vector or through blood transfusion (1,2). The acute phase lasts at most a few months and is characterized by mild symptoms such as fever, malaise, facial edema, generalized lymphadenopathy, and hepatosplenomegaly. In approximately 30% of infected patients, parasite multiplication via asynchronous cycles contributes to the chronic stage of the disease, with the associated cell destruction, reinfection within the reticuloendothelial system, and organ infection (3). Infection of the heart, digestive tract, and central nervous system can lead to fatal heartrhythm abnormalities, megacolon, and dementia, respectively (4,5).
Trypanosoma cruzi is not susceptible to many of the drugs used to treat closely related parasites like Trypanosoma brucei. Benznidazole and nifurtimox are the only available therapies for acute-phase Chagas' disease. These nitroheterocyclics are highly toxic and have poor efficacy in long-lasting chronic infections (6)(7)(8). No extensive studies of the long-term sequellae of these therapeutics have been conducted in humans, but several reports of neuropathy and tumorigenic or carcinogenic effects have been described (6,7). Efforts to develop a vaccine against T. cruzi have also failed thus far, likely because the disease pathology has an autoimmune component (9).
The major T. cruzi cysteine proteinase cruzain (also referred to as cruzipain, the full-length native enzyme) has been shown to be crucial for all stages of the parasite life cycle. This papain-like cysteine protease is thought to play an important role in differentiation, cell invasion, intracellular multiplication, and immune evasion (10,11). Furthermore, studies have demonstrated that cysteine proteinase inhibitors have trypanocidal activity with negligible mammalian toxicity (12).
Previous efforts have identified vinyl sulfones, sulfonates, and sulfonamides as high-affinity cruzain inhibitors (13,14); one of these vinyl sulfones, K11777, is currently undergoing Investigational New Drug enabling studies (15,16). a-ketoamide-, a-ketoacid, a-ketoester-, aldehyde-, and ketone-based inhibitors have also been described (17)(18)(19). While these successes are encouraging, many potential drugs, including those that enter clinical trials, ultimately fail to gain approval (20), and those that are approved are subject to growing parasitic resistance. Consequently, a diverse set of inhibitory scaffolds that can be optimized into distinct therapeutic candidates is urgently needed.
Hoping to contribute to this ever-growing diverse set of compounds, we here use an advanced virtual-screening methodology that accounts for receptor flexibility to identify three promising non-covalent inhibitors of T. cruzi cruzain.

Experimental Methods
Ligand preparation A small-molecule library was prepared from the ligands of the NCI Diversity Set II using the Schrçdinger LIGPREP program b . Protonation states were assigned at pH 5.5 to mimic the natural acidic environment of the T. cruzi digestive vacuole. Multiple tautomers and stereoisomers were generated. One ligand could not be processed by LIGPREP; instead, Discovery Studio c was used to add hydrogen atoms to this ligand and to optimize its geometry.
Initial screen against the crystal structure The prepared ligand models of this small-molecule library were docked into a 1.20 crystal structure of cruzain (PDB ID: 1ME4) (18), with hydrogen atoms included using PDB2PQR (21,22) at pH 5.5. Residues CYS25 and H159 (called H162 by some) formed the thiolateimidazolium pair required for the catalytic mechanism (23) of the proteinase at this pH. This initial virtual screen was performed using the CDOCKER docking software c with a docking sphere 15 in diameter centered on the coordinates of the crystallographic ligand.

Molecular dynamics simulations
The molecular dynamics simulations used in the current study have been described previously (28). In brief, the simulations were based on a 1.20 cruzain crystal structure (PDB ID: 1ME4) (18) protonated at pH 5.5 to mimic the natural acidic environment of the T. cruzi digestive vacuole. Following appropriate minimization and equilibration, five distinct 20-ns simulations of the cruzain protein bound to a hydroxymethyl ketone inhibitor, [1-(1-BENZYL-3-HYDROXY-2-OXO-PROPYLCARBAMOYL)-2-PHENYL-ETHYL]-CARBAMIC ACID BENZYL ESTER, were performed. The gromos clustering algorithm (29) was used to cluster 4002 conformations extracted from the simulations every 50 fs. We found that decreasing the cutoff below 0.95 resulted in a precipitous rise in the number of clusters; consequently, we chose an RMSD cutoff of 0.95 , which yielded 24 clusters. The central member of each cluster, considered most representative, was selected for subsequent analysis; this set of central members is said to constitute an ensemble.

Relaxed -complex screen
The 302 compound models of the enriched small-molecule library were docked into the 24 clusters of the ensemble using CDOCKER (Accelrys). Each of these docked small-molecule models was rescored with the following scoring functions: LigScore2 (24), PLP1, PLP2 (25), PMF (26), and PMF04 (27). For each ligand ⁄ scoring function pair, an ensemble-average score was calculated according to the following equation: where E is the weighted ensemble-average score, w i is the size of cluster i, and E i is the best score of each unique ligand, independent of tautomeric or stereoisomeric form, docked into the centroid of cluster i.
Two methods were used to select compounds for subsequent experimental validation. First, for each of the five scoring functions, the compounds were ranked from best to worst by the ensemble-average score. The top seven compounds were selected from each of these five ordered lists and merged into a single list of potential binders. Second, the average rank of each compound across all five scoring functions was calculated. The compounds were then reordered by this average rank, and the top thirty were likewise identified as potential binders. Any compound indicated by either of these two protocols was subsequently recommended for preliminary experimental validation.

Enzymatic assays
Each compound was obtained from the National Cancer Institute's Development Therapeutics Program, which guaranteed 90% purity. Compounds were tested for cruzain enzymatic inhibition using a protocol that has been described previously (30). The eight compounds with the lowest IC 50 values were assessed for aggregation by observing enzymatic activity under varying experimental conditions. As detergent is known to disrupt colloidal aggregation (31), inhibition in the absence of detergent was compared with inhibition in the presence of Triton X-100 (0.02%) and Tween (0.002%). The reducing agent was also varied (10 mM DTT or 10 mM Beta-Mercaptoethanol). Each experimental condition was tested in at least two separate experiments. Finally, a dynamic light scattering technique, described in detail elsewhere (32), was applied to the top four compounds to further confirm the presence or absence of aggregation. Additional experimental details can be found in the Supporting Information.
Final pose predictions All compounds submitted for experimental validation were subsequently docked a final time into the binding pocket of the crystal structure (PDB ID: 1ME4) using the Induced Fit Docking (IFD) module of the Maestro 9.2 (Schrçdinger, LLC) computer package with Glide XP precision (33,34). For each of the top three nonaggregating ligands, the best-scoring pose that positioned the inhibitor near the crucial catalytic triad was selected and visualized using PYMOL d . Although the top poses using this IFD protocol were generally similar to those from the relaxed complex scheme (RCS) CDOCKER work, we choose to show them here in the commonly represented crystal structure conformation for ease of recognition to the reader of subsites within the well-characterized active site of cruzain.

Results and Discussion
Discovered by Carlos Chagas in 1909 (35,36), T. cruzi is one of only two known pathogenic Trypanosoma species. Current trypanocidal therapeutics like nifurtimox and benznidazole are inadequate because they are toxic (6)(7)(8), subject to growing resistance (37), and ineffective at eradicating the parasite and preventing cardiomyopathy over the long term (38). Given the dire need for novel therapies, we here use virtual-screening methods to identify three promising inhibitors of cruzain, a critical cysteine protease required for T. cruzi survival.

Weaknesses of virtual-screening
Virtual-screening techniques have been used to identify a number of inhibitors in recent years [see, for example, references (39)(40)(41)(42)(43)(44)(45)]. Though widely used, these screens are often characterized by many false positives and negatives. Two principal weaknesses explain these inaccuracies. First, there are errors intrinsic to the scoring functions themselves. Because virtual-screening efforts often attempt to identify true binders from among the many thousands of molecules in a compound library, they are generally optimized for speed at the expense of accuracy. Second, current docking programs account for, at best, only limited receptor flexibility. When a small-molecule binder encounters its receptor in vivo, that receptor often undergoes conformational rearrangements or an 'induced fit' to better accommodate the ligand. These holo conformations can differ significantly from those of x-ray crystallographic structures. Even if the perfect docking scoring function did exist, it could not accurately predict binding affinity if receptor flexibility were ignored. In the current work, we use a scheme designed to minimize both these sources of error.
Overcoming inaccuracies inherent to the scoring functions themselves The ligands of the NCI Diversity Set II e were initially docked into a cruzain crystal structure using CDOCKER c because this program was able to capture the crystallographic poses of two positive controls. Docking with AutoDock (46) was initially performed, but docked and crystallographic poses differed significantly. The CDOCKER-docked poses were subsequently rescored using several different scoring functions, and potential binders were selected by consensus rather than by the score of any single function. Consensus scoring has two advantages. First, when multiple scoring functions are combined, the errors of each may in part cancel out (47). Second, different scoring functions likely have different intrinsic weaknesses and strengths. Some, for example, may better account for hydrophobic contacts, while others better capture electrostatic interactions. When combined, accuracy may be improved if the weaknesses and strengths of the constituent functions are complementary (48).
The scoring functions used in the current work come from different classes and thus provide independent assessments of ligand binding that may be complementary. Scoring functions fall into one of three categories: force field, empirical, and knowledge-based. Force field scoring functions like that used by CDOCKER, based on the CHARMm force field, evaluate ligand binding by accounting for bonded and nonbonded atomic interactions explicitly. Empirical scoring functions like LigScore1, LigScore2, PLP1, and PLP2 are based on counting the number of different types of receptor-ligand interactions (e.g., hydrogen-bond and hydrophobic interactions) as well as countable changes in molecular properties (e.g., the number of rotatable bonds immobilized upon ligand binding). Finally, knowledge-based scoring functions like PMF and PMF04 are based on statistical analyses of large structural databases like subsets of the Protein Data Bank (49). Intermolecular interactions that occur more often than expected by pure chance are assumed to be energetically favorable.
The top candidate binders from this initial crystal structure screen, as judged by consensus scoring, were compiled into a single list of 302 small-molecule models that were subsequently subjected to further study.
Overcoming inaccuracies caused by ignoring full receptor flexibility While consensus scoring may have helped overcome in part the errors intrinsic to each individual scoring function, those caused by ignoring receptor flexibility remained. To overcome this second challenge, we first studied receptor dynamics by performing five distinct 20-ns molecular dynamics simulations of cruzain, described elsewhere (28). The protein conformations sampled during these simulations were clustered into 24 groups using an RMSD-based clustering algorithm (29). Each of the centroid members of each cluster, considered to be the most representative, was identified, and the group of all centroids is said to constitute an ensemble.
Having identified multiple cruzain conformations, we redocked the 302 small-molecule models identified in the initial crystal structure screen into each of the 24 members of the ensemble, again using CDOCKER. The ligands were then reranked by an ensemble-average score that was not dependent on a single crystal structure but rather accounted for receptor flexibility. This multireceptor docking protocol, called the RCS, has been used successfully in the past to identify inhibitors of FKBP (40), HIV integrase (39), and T. brucei RNA editing ligase 1 (41), among others (42)(43)(44).

Final rescoring
As each of the 302 small-molecule models was docked into 24 different cruzain conformations, there were 7248 docked poses in all. Each of these 7248 poses was rescored, again using multiple scoring functions. Ensemble-average scores were calculated for each scoring function, and the 302 small-molecule models were appropriately ranked.
Two criteria were used to identify likely inhibitors from among these ranked compounds. First, we selected the seven best inhibitors as predicted by each of the individual scoring functions and merged them into a single list of 22 candidates. Second, the top 30 predicted ligands as judged by the ensemble-average rank were likewise identified. In all, 37 unique candidate inhibitors were selected; of these, the best 30 were tested experimentally.

Predicted binding modes
Of the 30 compounds tested, eight inhibited cruzain at 100 lM (Figure 1, all structures below the first arrow). Using standard conditions, the best-scoring compound had an initial IC 50 value of 471 nM (see Figure 1, asterisk). Subsequent experiments suggested that this compound inhibited cruzain non-specifically via aggregation (Supporting Information). Fortunately, three other compounds, although less potent, did in fact appear to inhibit cruzain specifically (Supporting Information). These three compounds, NSC 227186, NSC 67436, and NSC 260594, have IC 50 values of 16, 63, and 66 lM, respectively (see Figure 1, bottom row). While these compounds lack the nanomolar affinity characteristic of approved drugs, they do possess the low micromolar affinity typical of lead compounds. With proper optimization, including fragment addition, moiety swapping, and similarity searching, these compounds could be transformed into viable drug candidates. We are hopeful that these new leads will be helpful to those in the drug discovery community targeting Chagas' disease. Figure 2 shows the predicted binding poses (a, c, e) and important interactions (b, d, f) associated with each of the three most promising compounds using the IFD protocol and visualized in PYMOL (described in Experimental Methods). Figure 2B shows the predicted polar contacts of NSC 227186 with GLN19, THR185, and TRP184; TRP184 is also oriented for possible aromatic stacking with one of the ligand aromatic moieties. GLU208 (called GLU205 by some) is predicted to swing even further away from the cruzain S2 subsite than is seen in the 1ME4 crystal structure, allowing a ligand methylpyrrolidine moiety to occupy the site, reminiscent of the Phe and Tyr rings of several known inhibitors (13)(14)(15)(16)(17)(18)50,51). An oxane moiety is also predicted to occupy the S3¢ subsite (13).
NSC 67436 ( Figure 2D) is predicted to form even more interactions with the cruzain receptor, including various hydrogen-bond interactions with residues GLN19, GLY66, ALA141, ASP161, and ASN182, as well as possible aromatic stacking with HIS162. Two distinct cyclohexanamine moieties are predicted to occupy the S2 and S1¢ subsites, and a chlorocyclohexane (4-chlorocyclohexane-1,3-dicarbaldehyde) moiety is predicted to bind near S1. S4 and S3¢ are also occupied, both with distinct imidazoline rings (13). Figure 2F shows the hydrogen bonds predicted to form between NSC 260594 and MET68, ASN69, and ASP161. Again reminiscent of several known ligands, the S2 subsite is occupied by an aromatic ring system, and a slightly rotated GLU208 accommodates the larger aromatic moiety (1-methyl-6-nitro-decahydroquinolin-4-amine).
In an attempt to identify predicted binding characteristics that might aid the identification of future inhibitors, we used the IFD module of the Schrçdinger Maestro computer package to re-dock the 30 experimentally tested compounds into a crystallographic active site model (PDB ID: 1ME4) at pH 5.5. Unlike CDOCKER docking, the IFD protocol allows for local active site conformational changes and so is judged to better predict ligand binding, albeit at the cost of speed. The most favorable IFD poses of the validated inhibitors (Figure 2) consistently placed ligand atoms near the catalytic triad in a position that could conceivably compromise cruzain enzymatic activity. Additionally, the IFD poses of these compounds also place at least one aromatic ring in the S2 subsite, a site known to be favorable for the binding of Phe and Tyr aromatic side chains (13)(14)(15)(16)(17)(18)50,51).
These predicted binding poses are promising because they represent unique scaffolds that differ significantly from those of previously identified small-molecule inhibitors. While the poses included here are mere predictions, we are hoped that they will guide future optimization of these experimentally validated ligands.

Conclusions
Chagas' disease is the leading cause of heart failure in Latin America, affecting over 10 million individuals worldwide. Although progress has been made in the treatment of the disease, especially given the recent success of K11777, multiple cruzain inhibitors are needed given the difficulty of obtaining FDA approval and ever-progressing drug resistance. The work presented here provides chemical scaffolds that, with further optimization, could be developed into promising therapeutics for Chagas' disease. The exhaustive virtual-screening approach and thorough experimental validation used to identify these leads also represents a promising and useful method for inhibitor identification. Appendix S1. Description of experimental validation methods and assessment of non-specific inhibition via aggregation .
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.