Potential anti-SARS-CoV-2 drug candidates identified through virtual screening of the ChEMBL database for compounds that target the main coronavirus protease.

A novel coronavirus [severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2), or 2019 novel coronavirus] has been identified as the pathogen of coronavirus disease 2019. The main protease (Mpro, also called 3‐chymotrypsin‐like protease) of SARS‐CoV‐2 is a potential target for treatment of COVID‐19. A Mpro homodimer structure suitable for docking simulations was prepared using a crystal structure (PDB ID: https://doi.org/10.2210/pdb6Y2G/pdb; resolution 2.20 Å). Structural refinement was performed in the presence of peptidomimetic α‐ketoamide inhibitors, which were previously disconnected from each Cys145 of the Mpro homodimer, and energy calculations were performed. Structure‐based virtual screenings were performed using the ChEMBL database. Through a total of 1 485 144 screenings, 64 potential drugs (11 approved, 14 clinical, and 39 preclinical drugs) were predicted to show high binding affinity with Mpro. Additional docking simulations for predicted compounds with high binding affinity with Mpro suggested that 28 bioactive compounds may have potential as effective anti‐SARS‐CoV‐2 drug candidates. The procedure used in this study is a possible strategy for discovering anti‐SARS‐CoV‐2 drugs from drug libraries that may significantly shorten the clinical development period with regard to drug repositioning.

A novel coronavirus [severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), or 2019 novel coronavirus] has been identified as the pathogen of coronavirus disease 2019. The main protease (M pro , also called 3-chymotrypsin-like protease) of SARS-CoV-2 is a potential target for treatment of COVID-19. A M pro homodimer structure suitable for docking simulations was prepared using a crystal structure (PDB ID: 6Y2G; resolution 2.20 A). Structural refinement was performed in the presence of peptidomimetic a-ketoamide inhibitors, which were previously disconnected from each Cys145 of the M pro homodimer, and energy calculations were performed. Structure-based virtual screenings were performed using the ChEMBL database. Through a total of 1 485 144 screenings, 64 potential drugs (11 approved, 14 clinical, and 39 preclinical drugs) were predicted to show high binding affinity with M pro . Additional docking simulations for predicted compounds with high binding affinity with M pro suggested that 28 bioactive compounds may have potential as effective anti-SARS-CoV-2 drug candidates. The procedure used in this study is a possible strategy for discovering anti-SARS-CoV-2 drugs from drug libraries that may significantly shorten the clinical development period with regard to drug repositioning.
A novel coronavirus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2], or 2019 novel coronavirus [2019-nCoV]) has been identified as the pathogen of coronavirus disease 2019. The coronavirus has spread worldwide and exhibits strong contagious and infective characteristics. There are no effective anti-SARS-CoV-2 drugs at the time of writing. The discovery of potential anti-SARS-CoV-2 drugs from known drug libraries is thought to be an effective drug repositioning strategy for shortening the clinical development period.
Severe acute respiratory syndrome coronavirus 2 belongs to the betacoronavirus group. One of the bestcharacterized drug targets among viral constituents is the main protease (M pro , also called 3-chymotrypsin-like protease) [1]. Crystal structures of the M pro dimer, the biological active form, have been resolved with or without synthetic inhibitors, some of which covalently bind to Cys145 at the catalytic dyad (i.e., Cys145 and His41) or have been designed to covalently bind to Cys145 (Fig. 1) [2,3]. However, side effects, toxicity, and lower potency often cause covalent inhibitors to drop out. Therefore, noncovalent inhibitors with high binding affinity are more suited for treatment of such viral infections.
In this study, I performed stepwise structure-based virtual screenings using two different docking simulations in order to discover potential drugs that target M pro using the ChEMBL database [4], which mainly lists drugs and known bioactive compounds. I present these potential anti-SARS-CoV-2 drug candidates here. The structural information of the potential drugs will be useful for improving their pharmacokinetic properties more effectively and for developing specific anti-SARS-CoV-2 drugs.

Methods
Refinement of the M pro crystal structure I prepared the M pro homodimer structure suitable for docking simulations using a crystal structure (PDB ID: 6Y2G [2]; resolution 2.20 A). Structural refinement was performed using Homology Modeling Professional for HYPERCHEM (HMHC) software [5,6], and energy calculations were performed under the AMBER99 force field with the following parameters: root-mean-square gradient, 1.0 kcalÁmol À1 Á A À1 ; algorithm, Polak-Ribi ere; cutoffs, none; 1-4 van der Waals scale factor, 0.5; 1-4 electrostatic scale factor, 0.833; dielectric scale factor, 1.0; and distance-dependent dielectric condition. Structural refinement was conducted in the presence of peptidomimetic a-ketoamide inhibitors, which were previously disconnected from each Cys145 of the M pro homodimer. After adding hydrogen atoms automatically, I assigned Mulliken atomic charges of inhibitors; their a-carbonyl moiety (disconnected from Cys145) was treated as a hydroxy carbocation using single-point calculations of the semiempirical MNDO/d method. Mulliken atomic charges obtained from the MNDO/d calculation showed empirically good correlation with AMBER charges [7]. In addition, AMBER99 atom types were assigned. N-and Cterminals of the M pro homodimer were treated as zwitterions, aspartic and glutamic acid residues were treated as anions, while lysine, arginine, and histidine residues were treated as cations under physiological conditions. Next, a free glycine included in the crystal structure was removed, and the initial coordinates of hydrogen atoms of crystal waters (one water, WAT620, was removed, since the initial coordinate of the hydrogen atoms could not be assigned) were predicted using HMHC. Subsequently, partial optimization by Belly calculations for all components, except heavy atoms, was performed, and distance restraint conditions (7.0 kcalÁmol À1 Á A À2 ) were applied to all heavy atoms of the above structure; next, geometry optimization calculations were performed. The resulting structure was subjected to low-temperature molecular dynamics annealing (starting temperature 0 K; heat time 30 ps; simulation temperature 300 K; run time 100 ps; final temperature 0 K; cooling time 30 ps; step size 0.001 ps; and temperature step 0.01 K). Finally, all distance restraint conditions were removed, and the structure was further optimized to obtain the final structure. Precision of the final structure was confirmed using a Ramachandran plot program of HMHC.

Preparation of 3D structures from the ChEMBL database
Planar structures of the ChEMBL database (ChEMBL26; 1 950 760 distinct compounds, including 13 308 drugs) [4] were downloaded from the ChEMBL website in SDF file format. MayaChemTools (2019) [8] was used to remove counterions and inorganic compounds from the database. Then, 3D structures were obtained using BALLOON version 1.6.9 [9] under an MMFF94 force field. The resulting 3D structure database was treated using BABEL version 2.4.1 [10]; the compounds' protonation state was prepared under physiological conditions (pH = 7.4) and filtered by molecular weight (MW ≥ 100 and ≤ 500) to reduce the database to a more drug-like library. A total of 1 485 144 compounds were used for subsequent virtual screenings.

Structure-based virtual screenings
Structure-based virtual screenings were performed using RDOCK (2013) [11] and AUTODOCK VINA version 1.1.2 [12]; both interfaces are available in Docking Study with HYPER-CHEM (DSHC) software [5,13], and the resulting docking modes filtered by the RDOCK score threshold were more precisely simulated using AUTODOCK VINA.
Prior to docking simulations, the inhibitor of the A chain of the M pro homodimer was removed from the system. Then, docking simulations for the 3D structure database (1 485 144 compounds) were performed using the relatively reliable, high-speed docking simulation program RDOCK under the default condition. I expected that the concentrated potential drugs would consist of~70 distinct drugs, the others being bioactive compounds. On the basis of this, I determined the RDOCK score threshold to be ≤ À50 kcalÁmol À1 . The cavity condition for RDOCK docking simulations was prepared using peptidomimetic a-ketoamide inhibitor of the A chain under a default condition. Docking simulations output the top three docking modes per trial compound in SDF file format. As a result, 4 455 432 docking modes were obtained. These docking modes were filtered by the RDOCK score threshold of ≤ À50 kcalÁmol À1 to obtain 27 561 distinct compounds (57 649 docking modes) in SDF file format. The ChEMBL IDs of these distinct compounds were subjected to KNIME version 4.1.2 [14] to collect compound information from the ChEMBL web server; some information was manually collected from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [15].
From the 57 649 docking modes obtained by virtual screenings, the 27 561 distinct hit compounds had two docking modes on average. The hit compounds, including the 64 drugs I found, could be more precisely investigated using AUTODOCK VINA docking simulations with these docking modes as the initial structure. Subsequently, the resulting 57 649 docking modes were separated and converted into individual PDBQT files using DSHC. Then, more precise docking simulations were performed using the AUTO-DOCK VINA In Silico Screenings interface integrated into DSHC. The M pro homodimer system prepared above in PDB file format was also converted to a PDBQT file using DSHC. A configuration file with cavity information was prepared using DSHC, and other docking conditions were set to default values (the top nine docking modes per trial compound were maximally outputted). Docking simulations with AUTODOCK VINA produced 513 597 docking modes, which were filtered by the AUTODOCK VINA score (empirical binding free energy) threshold of À10 kcalÁmol À1 . Since the AUTODOCK VINA score is an empirical binding free energy, I expected that À9 kcalÁmol À1 of a score would theoretically show an nM order of binding affinity with M pro . When the threshold for screening was set to less than this value, I obtained 659 distinct compounds (1216 docking modes) as hit compounds. To more realistically concentrate the number of hit compounds, I determined the threshold value to be ≤ À10 kcalÁmol À1 . As a result, I obtained 29 distinct compounds (total 41 docking modes). The ChEMBL IDs of these distinct compounds were subjected to KNIME to collect compound information from the ChEMBL web server.

Results and Discussion
Structure-based virtual screenings of the ChEMBL database In the ChEMBL database, drugs, including approved, clinical, and preclinical drugs, constitute~0.7% of the total number of compounds; the others are mainly bioactive compounds, whose synthesis is, therefore, promising. The advantage for using the ChEMBL database is that it covers all types of drugs, from preclinical to approved stages. I expected that the hit compounds would largely differ from candidates obtained from virtual screenings using focused and targeted libraries [16,17]. With regard to drug repositioning, the ChEMBL database is more suitable for searching for effective known drugs or bioactive compounds when urgent therapy is necessary and effective drugs are not known. The RDOCK score threshold of ≤ À50 kcalÁmol À1 showed relatively high binding affinity with M pro . Table 1 shows the 64 potential drugs that showed high binding affinity with M pro , with some drug information collected from the ChEMBL web server using KNIME. I found 11 approved, 14 clinical, and 39 preclinical drugs from the hit compounds (27 561 distinct compounds with 57 649 docking modes); the other 27 497 were bioactive compounds. The 64 drugs were largely classified into antibacterial, antidiabetic, antiinfective, anti-inflammatory, antineoplastic, cardiovascular, gastrointestinal, human immunodeficiency virus, and neuropsychiatric drugs. Interestingly, the potential drugs obtained contained sepimostat and curcumin, which are recommended as potential anti-SARS-CoV-2 drugs by researchers [18,19]. Table 2 shows the 29 hit compounds obtained using AUTODOCK VINA virtual screenings with ≤ À10 kcalÁmol À1 of binding free energy for M pro . For the 64 drugs,  Virtual screenings predict potential anti-SARS-CoV-2 drugs M. Tsuji  Virtual screenings predict potential anti-SARS-CoV-2 drugs M. Tsuji AUTODOCK VINA scores of the most stable docking modes are also shown in Table 1.

Additional docking simulations for hit compounds
Almost all the 29 hit compounds were bioactive compounds registered to the ChEMBL database, except for eszopiclone (CHEMBL1522; approved drug for neuropsychiatric disease). I believe that these hit compounds were not developed for common targets, although the structural feature could be categorized into some mother skeletons, such as diazole, azine, and sulfone derivatives ( Table 2). Figure 2 shows the most stable docking modes of sepimostat ( Fig. 2B; À7.9 kcalÁmol À1 ), curcumin ( Fig. 2C; AUTODOCK VINA score À7.3 kcalÁmol À1 ), and eszopiclone ( Fig. 2D; AUTODOCK VINA score À10.0 kcalÁmol À1 ) obtained from AUTODOCK VINA docking simulations, in addition to the binding mode of peptidomimetic a-ketoamide inhibitor in the crystal structure ( Fig. 2A). Researchers consider sepimostat, curcumin, and the a-ketoamide inhibitor to be potential anti-SARS-CoV-2 drugs. In this study, eszopiclone was also the only approved drug with the highest score on AUTODOCK VINA docking simulations (Tables 1 and 2). The carbonyl moiety of these compounds was close to the catalytic site of the Cys145 and His41 catalytic dyad, and docking modes were similar to each other. These results suggest that these compounds may function through the same underlying mechanism.

Conclusions
This study was performed to rapidly identify potential anti-SARS-CoV-2 drug candidates from a known drug library on the basis of drug repositioning. The drug candidates presented in this study could be further examined for their anti-SARS-CoV-2 activities, together with those of earlier studies using more limited drug libraries [20][21][22]. Bioactive compounds with high binding affinity for SARS-CoV-2 M pro could be used as a basis for improving pharmacokinetic properties and for developing specific anti-SARS-CoV-2 drugs. Combinations of structure-based docking simulations are valuable for high-throughput virtual screenings to identify urgently needed therapies for viral infections. Determination of the effect of potential anti-SARS-CoV-2 drugs obtained in this study is in progress, and the results will be published in the near future.