A community effort in SARS‐CoV‐2 drug discovery

The COVID‐19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availability of vaccines, searching for efficient small‐molecule drugs that are widely available, including in low‐ and middle‐income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the “Billion molecules against COVID‐19 challenge”, to identify small‐molecule inhibitors against SARS‐CoV‐2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find ‘consensus compounds’. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding‐, cleavage‐, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS‐CoV‐2 treatments.


| INTRODUCTION
There is great interest in small molecule therapeutic agents for COVID-19 with high efficacy to save human lives.Even more than three years after the outbreak of the pandemic and despite the availability of vaccines [1], COVID-19 poses a threat to individuals across the world [2].The initially-developed vaccines and boosters have so far proven protective against COVID-19, but because of multiple factors, such as new variants of the virus [2], the disease continues to pose substantial risk to life and health.Recent studies also show that reinfections act cumulatively, which is worrisome in the long term [3].Additionally, many people cannot be vaccinated due to their medical status or refuse vaccination, and breakthrough infections occur despite vaccination.Therefore, having a small molecule therapy as an additional option or alternative is highly demanded [4].The applicability of currently available small molecule treatments, such as nirmaltrelvir [5], baricitinib [6], remdesivir [7], and molnupiravir [8] is still restricted.For instance, the application of Paxlovid (nirmatrelvir and ritonavir) is limited due to drug-drug interactions [9], drug resistance [10][11][12], and rebound effects [13,14].In addition, molnupiravir is a mutagenic antiviral, which possibly could increase the emergence of new variants [15,16].Ensitrelivir has recently been developed as a small molecule antiviral specifically targeting SARS-CoV-2 [17], and has been shown to decrease viral clearance by 50 h [18].Overall, improved pharmacological approaches are still needed.
The standard drug development process is slow compared to the time scale at which the SARS-CoV-2 virus emerged and mutates, and could easily last up to 15 years [19].This period comprises pre-clinical phases in which large numbers of virtual or physically available molecules are considered and tested, and then clinical phases in which few molecules are validated in human trials.In early phases of the drug discovery process, computational methods have been shown to help in screening and navigating through the large chemical space [20].Such methods should also suggest new promising ligands [21][22][23].However, 90 % of the molecular candidates turn out to fail later, somewhere between phase I trials and regulatory approval [24].Therefore, using accurate computational methods to screen and filter chemical space is key to a successful and fast drug development process.With accurate computational methods, the early phases of drug discovery that usually require 3-6 years [19], might be reduced to a few weeks, after which pre-clinical studies could start [25].
In response to the pandemic, scientists and research groups around the world started to self-organize and work together (e. g., https://covid19-nmr.de/participants/ core-team/; https://insidecorona.net/; https://app.jogl.io/; https://foldingathome.org,https://news.cnrs.fr/articles/covid-19-15-billion-compounds-to-undergo-virtualscreening); MEDIATE [38], EXSCALATE [39]).The COVID moonshot project [40][41][42] for example, yielded new potential inhibitors with a collaborative, crowdsourcing Open Science Discovery approach [43], now continued within the Drugs For Neglected Diseases Initiative.Here we present the results of an ad hoc crowdsourced community initiative, the "Billion molecules against COVID-19 Challenge", which was organized as a competition (starting May 2020) to identify inhibitors of SARS-CoV-2 proteins.Participating teams screened at least one billion molecules each using diverse computational methods.Then, the most promising drug compounds were synthesized and evaluated in wet-lab experiments.We present the computational approaches taken, biological assays performed, and the overall lessons learned from the challenge.

| Set up of the community challenge
Our community effort to identify SARS-CoV-2 inhibitors was organized as a challenge, where academic and industry researchers worldwide were asked to form teams to virtually screen at least a billion small molecules each and then submit 10,000 virtual molecules as potential inhibitors for SARS-CoV-2 progression, within the timeframe May-June 2020.In response to the announcement to join, 130 teams registered, of which 31 made the submission deadline.In addition to compound lists, teams had to deliver a report outlining the methods used (see Supporting Information Section 1).Of those, 20 teams were admitted after peer-review of their reports by an ad-hoc scientific committee.
Overall, a four-step process was used during the challenge (Figure 1).The aim put forward to the teams was to find a < 100 nM binder to a SARS-CoV-2 protein or human receptor of choice, which should ideally have a 100-fold reduction of live SARS-CoV-2 viral replication in whole cell assays.The teams were initially free to identify the most promising protein targets.In terms of screening databases, Zinc 15 [44], CAS (anti-virals) [45], and SweetLead [46] were suggested by the organizing team but the computational teams were free to choose other sources.The following sections will describe the four processes in detail, followed by a discussion and conclusions.
In terms of the hit rate, the team that ended up with the most compounds (team jku, see below) used descriptor-based deep learning methods with small molecules as inputs, thus a ligand-based approach.The self-normalizing network approach renders the models robust against domain shifts from training data to testing data.The second-ranked method by team kyuken used shallow, ligand-based, and descriptor-based machine learning methods as a first step and subsequently used structure-based approaches to refine the search.The hit compound of kyuken showed significant viral reduction in cell-based assays (see section 2.6.5 below).The thirdranked method (team aiwinter) used docking-based methods and QSAR models.For details, see Supporting Information Section 1.

| Molecule selection and consensus ranking
A single list of molecules was made for subsequent synthesis and testing against each of the six selected SARS-CoV-2 (or host) protein targets.In total, 639,024 molecules (of which 423,466 unique ones) were submitted across all targets and teams.Many teams suggested identical compounds for the same protein target: 656 for Nsp5, 155 for Nsp3, 57 for TMPRSS2 and 54 for Nsp12.
Interestingly, 7391 compounds were suggested by multiple teams for multiple protein targets, but in 3843 cases the teams disagreed on what the target was.Also, several teams had the same identical compound on their compound list for the same target, but those duplicates were removed.
The screening capacity was estimated to be maximally 2,000 compounds for each of the 6 protein targets, considering the time and cost to synthesize compounds and perform experimental assays.~40 % of this screening capacity was reserved for testing the top-ranked molecules from each team, i. e., according to the ranking the team had determined for their own lists.The other ~60 % of the screening capacity was reserved for testing consensus molecules, which are molecules that had been suggested by multiple teams or for which very similar molecules had been suggested.Two different approaches were employed to determine the set of consensus molecules: a) k-medoids clustering, and b) generative topographic mapping [79], see Supporting Information Section 2. The 'selected molecules list' for each of the 6 protein targets, ended up consisting of 38 % top-ranked, 15 % from k-medoids, and 47 % from GTM (see yellow/ green/blue cartoon in Figure 1).Overall, six sets of compounds for each of the protein targets were obtained amounting to 11,440 unique compounds in total.

| Chemical synthesis of selected compounds
All compounds were synthesized by WuXi Apptec (China), based on instructions from the organizing team.11,440 compound suggestions across 6 protein targets were provided to them.The compounds to be synthesized were selected based on 3 criteria by WuXi Apptec using proprietary methods: 1) cADME (computational absorption, distribution, metabolism, and excretion) filtering was done to arrive at compounds with molecular weight (MW) below 500 g mol À 1 , CLogP < 5, HBA < 10, HBD < 5, TPSA < 140, Rotatable bond < 5.In addition, possible PAINS (Pan-assay interference compounds) were removed; 2) Chemical feasibility: a similarity search versus the WuXi Apptec virtual library was performed to assess feasibility (see Supporting Information Section 4.1); 3) reagent availability and cost were considered.
After the selection, 1414 compounds were selected, and synthesis was started.The synthesis period lasted from November 2020 to February 2021, and 878 compounds were delivered as 20 mM DMSO (dimethylsulfoxide) stock solution on well-plates.It was not feasible to synthesize all compounds due to delays in the delivery of starting compounds or due to practical synthetic issues (e.g., low reactivity, difficulties in purification, etc.).The compound purity was determined by LC-MS and has been reported previously [83].Of all 878 compounds, 58 (i.e., 6.6 %) had a purity below 90 %, but were included in experimental assays nonetheless.The latter data set also includes information on solubility and compound chirality.Duplicate compound wellplates with DMSO stock solutions were shipped to the MIT-Broad institute (USA), Crelux GmbH (Germany), Pasteur Institute (France), and the Diamond Light Source (UK), for further experiments (see next sections).
Biases in compound selection and synthesis.Both the methods used to obtain the list of selected compounds (from 423,466 unique ones to 12081 selected, see section 2.3) and the synthesizability of the compounds (878, see section 2.4) introduced biases.Table 1 shows that team imolecule, lci, lci, virtualflow, molecule, and cermn had the largest numbers of compounds selected for the targets N, Nsp3, Nsp5, Nsp12, S and TMPRSS2, respectively (see bold numbers).Figure S5 displays these results by the method of selection, i. e., either by GTM, k-medoids, or top-ranked.Some teams had most of their selected compounds originate from consensus selection.For example lci, cermn, kyuken, and pharmai had many compounds selected by GTM (Figure S5a,b).In contrast, other teams (e. g., cov-id19ddc and sarswars) had most of their selected compounds directly from their top-ranked ones.Overall, the selected compound list and the synthesized compound lists are skewed toward the top 200 positions of each team for each protein target (Figure S6).For jku, a bias was found in the number of synthesized compounds (62) versus those selected (259) likely due their chemical similarity and the fact that they can be easily synthesized (see 'benzotriazolyl acetamide' family in the next sections and in discussion section 3 below).Some teams had large numbers of molecules selected in the first step but none were finally synthesized.For example team belarus had 32 compounds for Nsp5 and 67 for S, but none of them were selected by WuXi Apptec since these compounds did not pass their ADME filters and/or cost/feasibility analysis.If the filtering would have been known a priori, the teams could have likely had more suitable compounds in their submitted lists thus avoiding the fact that some teams ended up with T A B L E 1 Overview of selected and synthesized molecules across teams (rows) and drug targets (columns).Molecules which were selected or tested for a specific target but submitted for another target do not contribute to the team counts.For statistics which also includes molecules which were originally submitted for a different target, see Supporting Information Section 3. *of 929 total team-selected compounds, there are 878 unique chemical compounds.That is, 51 identical compounds were suggested.A dash indicates the team did not submit compounds for that specific target.

Selected molecules (section 2.3)
Synthesized compounds (section 2.  S7), but the percentage of GTM compounds did increase ~10 % in favor of top-ranked compounds (Figure S8).We do not deem this significant, that is, the selection method did not influence which compounds were eliminated by the WuXi Apptec filtering.

| Comparison of computational methods
Hit rate.With the four-stage procedure described above (see Section 2.1), 27 compounds were found to have detectable biological activity (see Figure 2, Figure 3, 4, Table 2, and details in Table 3) across all SARS-CoV-2 protein targets.The experimental testing is described in the following paragraphs.Due to the multiple team F I G U R E 3 Chemical structures of 27 hit compounds that bind to one of the protein targets or have biological activity.Molecules are grouped with respect to the experimental protein target they were found to have activity, which is not always the one that was initially predicted by the teams.The benzotriazolyl acetamide family (14 compounds) of Nsp5 is shown in the dashed box.
T A B L E 2 Number of active compounds, i. e. hits, confirmed with in-vitro testing and hit-rates (ratio of active against tested compounds).The best hit-rate is marked bold.The number of tested compounds is taken from Table  [a] This is the hit rate from the pooled analysis described in this paper.Some teams performed their own analysis with different results (see Supporting Information Section 3).
[b] One hit was found by two teams.
submissions and the compound selection procedure some teams submitted compounds which were tested on a target which is different to the suggested one.We tackle this issue by providing a) an analysis for which these compounds are excluded (Table 1 and Table 2) and b) an analysis for which these compounds are included (Supporting Information Section 3).For a) 14 hits had been suggested by the team jku and bind to Nsp5 (see Table 2).This amounts to a hit rate of 20.9 % [95 % confidence interval: 11.9-32.6%] (14 actives of 67 tested) of the best team, which is followed by the teams kyuken with a hit rate of 7. Novelty of hits.To evaluate the novelty of the found hits, the hit compounds are compared to priorart molecules, which are molecules either used in filtering operations such as similarity searches or used as an active training instance for Machine Learning methods by any of the teams.The activity cut-offs for the metrics pKi, pKd, pIC50 and pChEMBL were set to 6.3.Scatterplots in t-SNE coordinates (Figure 2a and Figure S9a,b) show the relative location of the hit compounds in comparison to the prior-art compounds.Notably, compared to Nsp12 and S, Nsp3 and Nsp5 contain many prior-art molecules, due to the availability of SARS-CoV data that was assumed by the teams to be similar (in terms of binding sites) as compared to SARS-CoV-2.The hits identified by jku (14 compounds) and aiwinter (1 compound) build a cluster and overlap in the Nsp5 scatterplot.Looking in more detail we find many benzotriazolyl acetamide derivatives in the prior art data in this cluster (Figure S10).The benzotriazole family had been considered indeed for SARS-CoV in 2008 by Verschueren [84], with published protein databank structures.For secondary clusters of hits (e. g., cermn & virtualflow; lower left quadrant of Nsp5 scatter plot in Figure S10), we could not identify similar functional groups or motifs in the proximal prior art compounds.The S hits (kyuken and deeplab) and Nsp12 hit compound (imolecule) do not reside in the neighborhood of prior art compounds which is why they can be considered as highly novel (Figure S10).For targets other than Nsp5, too few hits were found to draw statistically relevant conclusions on cluster size or novelty.

| Experimental testing of candidates
The synthesized (878) compounds were tested for their inhibitory activity or binding activity to SARS-CoV-2 targets using various assays and X-Ray crystallography.Protease cleavage assays (Nsp5, TMPRSS2, Nsp3) have been performed by the MIT-Broad Foundry to determine activity.Microscale thermophoresis (MST) assays for RdRp (Nsp12 domain), N, and S proteins have been done by Nanotemper GmbH.Details on the assays can be found in Supporting Information Section 4. In this section, we detail salient experimental issues that were encountered in the assay development, as many (especially the binding assays using MST) were not yet available or described in the literature.Initially, compound sets were only tested versus their virtually predicted protein target, but having an available chemical library, some assays were performed for all compounds (irrespective of the predicted target).
T A B L E 3 Overview of active compounds and their labels used in the manuscript., respectively [85,86] (see Supporting Information Section 4).A first brute-force screening at 100 μM showed a single compound for each of the three proteases (see red bars in Figure 4a-c).Those compounds were selected for dose-response curves, where their concentration was changed to calculate IC 50 values (see Supporting Information Section 4).Nsp5-1 produced an atypical dose-response, where activity was first enhanced by ~50 % and then dropped to < 50 % at 100 μM concentration (Figure 4d), which hampered the calculation of the IC 50 .Nsp3-1 showed a classical inhibition with IC 50 = 24.7 � 3.7 μM (Figure 4f).In addition, from cell-based Nsp5 assays (see section 2.6.2 below), 5 additional compounds were identified that did not make the < 50 % inhibition threshold, but were measured in dose-response using the same cleavage assay (Figure 4e).These measurements identified the IC 50 of Nsp5-2 ~288 μM, whereas the remaining compounds Nsp5-3 to Nsp5-6 had much higher IC 50 's that could not be determined.

| Nsp5 protease cleavage assays in cells
The Pasteur Institute in Paris had previously set up a cell-based Nsp5 protease assay [87], in which cleavage of a reporter Rev-Nluc protein by Nsp5 decreases the luminescence signal.In the presence of an inhibitor, the luminescence signal is restored (see Supporting Information Section 4.3).
Here we show the data in terms of %restored activity, where no inhibition is 0 % and full inhibition is 100 %.GC376 was used as a control inhibitor and yielded an IC 50 = 4.2 � 1.0 μM.Out of all 878 compounds screened, 6 compounds had activity in the high micromolar range, while Nsp5-3 was the best inhibitor, albeit a weak one with IC 50 = 37 � 6 μM (see Figure 5).Interestingly, the same compound had given negligible activity in the (cell free) Nsp5 cleavage assays (see Figure 4e, purple line). is extensively used in the pharmaceutical industry and in CROs [88,89].Therefore, this method was used for the three protein targets without protease activity, i. e., S, RdRp (Nsp12 domain), and N. Various constructs of whole-length or subdomains of the targets are available from commercial sources.In this section, we will describe the assay development, the choice of positive controls (that are absolutely needed for MST), and binding outcomes.
For S, it was decided to use the stabilized trimer (R683 A, R685 A, K986P, V987P), since participating teams had also modeled trimer-specific or cryptic binding sites, other than the (classical) RBD domain.As a positive control, the natural choice was the Ace2 (human receptor) protein.Surprisingly, recombinantly expressed Ace2 did not show binding to S (stabilized trimer), we suspect due to improper folding of the construct.Fortunately, His-tagged Ace2 did provide good binding curves with a K D of 4.25 � 1.52 nM (over 6 runs performed during the 3 days of assay measurements, see Supporting Information Section 4.2.3).This is stronger binding than previous measurements performed by Surface Plasmon Resonance [90] that showed 94.6 � 6.5 nM for (monomeric) SARS-CoV-2-S1, but can be explained by multivalency of the trimer as shown by Kruse et al. [91].All 152 compounds were first analyzed using 8point dilution series between 50 nM and 100 μM concentrations, revealing 7 compounds to be potential binders.The latter 7 were measured in triplicate 12-point dilutions from 0.2 nM to 200 μM, and 3 compounds were identified as high micromolar binders: S-1, S-2 and S-3 (see Figure 6 and Table 3

above).
For RdRp, we were unable to obtain the stable trimeric complex of Nsp7/8/12 (see Supporting Information Section 4.2.1), and therefore we used only the Nsp12 subdomain.As a first control, we tried Remdesivir metabolite GS-443902, but could not detect binding.This is because the latter compound inserts itself into the RNA chain during polymerization, and therefore inhibits RdRp function, but it does not bind efficiently to Nsp12.Instead, Suramin was used as a control with a determined K D = 827 � 306 nM (over 4 triplicate measurements).Dilution (8-point) series from 0.5 nM to 250 μM were performed on 147 predicted compounds, and after pre-selection of 8 compounds and further triplicate 12-point experiments, 2 high-micromolar binders were identified: Nsp12-1 and Nsp12-2 (see Table 3 below For N, we used full-length nucleocapsid (see Supporting Information Section 4.2.2), and used nanobodies developed to bind to the N-and C-terminal domains.A total of 119 compounds were analyzed in 8-point and 12point dilution assays between 45 nM and 100 μM.However, it was found that N would show a drop in normalized fluorescence intensity F norm upon the addition of 1-5 % of DMSO (dimethylsulfoxide, see also Figure S12), likely due to slow polymerization and sedimentation of N over time.This made it impossible to determine K D values, and the assay development had to be abandoned.

| X-ray structures
In collaboration with the Diamond light source (DLS), crystallization and X-ray diffraction experiments were carried out on Nsp5 and Nsp3 compounds.For Nsp5, 148 compounds were soaked at 2 mM and measured by synchrotron X-ray diffraction, which identified 14 potential hits all from the benzotriazolyl acetamide family: Nsp5-1 and Nsp5-7 to Nsp5-19.Comparison to the DLS database (accessible via https://fragalysis.diamond.ac.uk/ viewer/react/preview/target/Mpro, use tag 'JEDI -Benzotriazole') showed that several other benzotriazoles had previously been identified for Nsp5.Some representative structures are shown in Figure 7 below.For Nsp3 we found two compounds that could be resolved (also shown in Figure 7).

| Viral reduction assays
For a selection of compounds we performed whole-cell live-virus reduction assays using either Vero-TMPRSS2 or HeLa-ACE2 cells (see Supporting Information Section 4.3.3).In Figure 8 below, the dose-response curves of % infection and cell viability are shown.Remdesivir was used as a positive control with an IC 50 = 347 nM (95 % confidence interval CI is 161-533 nM), which is in agreement with previous reports [92].Most of the compounds show no significant reduction of viral replication in this assay.Nsp5-3 gave significant viral reduction with IC 50 = 9.41 μM (95 % confidence interval is 5.32-19.27),but had cytotoxicity CC 50 = 19.16μM (95 % CI is 7.191-70.01),and we cannot exclude that the latter is responsible for the viral replication reduction.
We have summarized the experimental findings of the previous sections in Table 3 (above).We found 6 compounds that had a quantifiable binding interaction S(3), Nsp3(1), Nsp12(2), of which only the compound for Nsp3-1 showed in vitro (cell-free) protease cleavage activity.The latter compound shows structural similarity to previously found SARS PLpro inhibitors derived from GRL-0617 [95,96].In live cell Nsp5 assays, 6 compounds showed weak inhibition, with the best one Nsp5-3 with IC 50 = 37 � 6 μM.The same compound also showed viral reduction in whole-cell live-virus reduction assays, with an IC 50 = 9.41 μM (95 % CI is 5.32-19.27),but we cannot exclude that inhibition is a side-effect of cytotoxicity.Further studies will be needed to chemically improve Nsp5-3 to increase antiviral activity.

| DISCUSSION
The COVID-19 pandemic has given an unprecedented push to scientists in academia and industry to try their hand at drug discovery.We have seen this during our "Billion molecules against COVID-19 challenge", where even private individuals initially participated (but did not pass our internal peer-review at the report submission stage).Some novice teams were allowed to continue and submitted their compound lists, but not taking into account synthetic feasibility or ADME caused them to not have physical compounds made.We realized during the challenge that mistakes can easily be made when starting from questionable quality 3D protein structures from the Protein Databank (PDB).Fortunately, we had help from Insidecorona.net to point the teams to the best quality PDB entries for the protein targets the teams were working on.Since the challenge was organized as a winner-takes-all competition, the initial communication and sharing of results among teams was limited.The organizing team (coordinated by the last author) arranged the synthesis of compounds and all experimental studies.In hindsight, it would have been better to have a fully open communication with the teams immediately after the compound list submissions (July 2020).This would have further strengthened collaboration between protein crystallographers, computational scientists, and experimentalists.Overall, the challenge enhanced bridging of research fields, and accelerated communication (versus communication via peer-reviewed publications more traditionally).
In addition, the teams were free to choose the protein target they deemed most promising, and 6 final targets were selected by the organizing team.The experimental studies needed to validate each compound therefore took considerable effort, funding, and time (~2 years).An iterative approach on fewer targets would have likely been better and faster.With the experimental protocols in place, subsequent rounds of predicted compounds could likely be screened in < 3 months, and could have served as input for additional computational rounds.Screening a library of off-the-shelf compounds, or evenbetter, known drugs [97] would also have accelerated things (on-demand synthesis is not as fast and costs significantly more; new molecules will require going through all clinical phases).
The computational teams chose approaches from a vast variety of different methods (see Figure 2) and therefore considered diverse orthogonal approaches.However, from today's perspective few-(and zero-) shot methods, developed more recently, would have been an intuitive fit [98][99][100][101][102][103][104][105].An important aspect of this challenge was its emphasis on the exploration of billions of candidate compounds for activity against the target proteins.This deviates from a more common strategy of focusing on either known drugs (e. g.DrugBank [106], DrugCentral [107]) or bio-like molecules (e. g.ChEMBL [108], SWEETLEAD [46], GEOM [109]) in that it explores a massive space of synthesizable molecules that may bear little recognized similarity to known bioactive compounds.While known drugs carry the benefit of faster path to clinical distribution, and bio-like molecules are generally perceived as being more likely to successfully translate to clinical relevance, there is reason to expect that exploration of a much larger set of candidates may yield drugs that are unlike others identified previously.For example, Lyu et al. [110] observe that billion-scale libraries are dramatically diminished for bio-like molecules relative to more focused libraries, yet still contain many experimentally-confirmed actives, as well as thousands of high-ranking molecules in docking assays.This observation justifies continued emphasis on development of methods for computationally screening billionscale libraries.We also note that de novo generation of candidate molecules may offer a viable path to discovery.Whereas consensus scoring has long been established in docking methods [111], extending it to other computational methods had not previously been considered until the current work.The discovered compounds have weak micromolar affinities, thus requiring further hit-tolead development.Overall, the most potent compound Nsp5-3 found has an IC 50 = 9.41 μM (95 % CI is 5.32-19.27) in live cell assays, but with significant cytotoxicity that would need to be further addressed.The most prominent family was the benzotriazolyl acetamide family (Figure 3, Nsp5 dashed box), which has been found in other studies [112,113] likely because several teams used ML methods starting from similar training sets, combined with the fact that benzotriazoles in general can easily be synthesized using 'click chemistry' [114], which is high-yielding and fast, and thus preferred by the CRO that performed the chemical synthesis.In addition, the CRO performed a proprietary synthetic feasibility and ADME screening that introduced a bias in the number of compounds that were eventually synthesized for each individual team.
In addition to the evaluation in this paper, some teams independently validated their predictions (see Supporting Information Section 3).Pharm.aicompared their top 100 predictions for Nsp5 against public data published after the competition deadline and obtained a hit rate of 17 % on a highly diverse set of scaffolds.An interaction-based drug discovery screen explains known SARS-CoV-2 inhibitors and predicts new compound scaffolds [115].The sarstrooper team experimentally tested top-ranked compounds they had submitted and found 7 compounds with IC 50 < 10 μM (Mukherjee et al., in preparation).
Overall, we are convinced that an open communication (Open access/Open data/Open source [37]) is of the greatest importance, as previously advocated [40][41][42]116].For example, leads from the COVID Moonshot have recently been advanced by others to find a broad-spectrum nM inhibitor for SARS-CoV-2 [113].The latter study [113], and the recent success story of Ensitrelvir (Xocava) from ultra-large computational approaches demonstrate the soundness of the approach [17].To further accelerate the response to future pandemics, large and chemical diverse government-managed compound libraries should be readily available (such as the "Chimiothèque Nationale" [117] containing 80000 compounds and 15000 natural extracts), EU-OPENSCREEN's unique compound collections containing over 96000 compounds [118], NCATS library containing over 10000 compounds including about 3000 drugs [119], to provide the first experimental activity/ structural data, immediately and publicly shared, needed for computational researchers as a starting point.

| CONCLUSIONS
Using a crowd-sourced approach, we performed the hit-finding stage of (anti-viral) drug discovery using a wide range of computational approaches that were bundled using a consensus approach.Many participating teams chose docking-or machine learning-based computational methods, for which little data was available at the start of the project (May 2020).The communication between different fields, e. g. protein crystallization, computational methods, and wet-lab experiments, was suboptimal and should be improved by direct communication and collaboration (vs.'communication via the scientific literature').This would ensure that critical know-how that is easily overlooked (or not explicitly written down) in papers is efficiently transferred.Overall, the pandemic has accelerated the breaking down of silos [120] between research fields, but more is needed to act quicker to respond to future pandemics [121].

F I G U R E 1
Overview of the main stages of the Billion Molecules Against COVID-19 Challenge.
1 % [2.0-17.3%] (4 actives out of 56 tested) and aiwinter with a hit rate of 5.0 % [0.1-24.9%] (1 active out of 20 tested).Note that three different types of assays, a) in vitro (cell-free or live cell) activity, b) biophysical binding and c) x-ray crystallography, have been used to experimentally test the compounds (see Section 2.6).

F I G U R E 4
Overview of protease cleavage assays.a-c) relative activity over triplicate experiments at a fixed compound concentration of 100 μM for Nsp5, Nsp3 and TMPRSS2, respectively.Red bars show compounds that reduce cleavage (relative) activity by more than 50 %.Asterisks show highly fluorescent compounds that could not be analyzed.Not all compound labels are listed for clarity.d-f) doseresponse curves at different compound concentrations.Solid lines in panel e-f show fits, panel d to guide the eye.

2. 6 . 1 |
Protease cleavage assaysProtease cleavage tests were done for the compound sets of Nsp5, Nsp3, and TMPRSS2.In the assay, a peptide FRET (Förster resonance energy transfer) substrate is cleaved by the protease, which results in an increase of fluorescence intensity.The increase in fluorescence intensity over time is proportional to the rate constant of the protease, and by adding compounds at different concentrations, inhibitors can be identified.As positive controls, GC376 (IC 50 = 9.4 � 2.5 nM) and GRL0617 (IC 50 = 2.8 � 0.4 μM) were used for Nsp5 and Nsp3

2. 6 . 3 |
Binding assays to N, RdRp (Nsp12 domain), S Microscale thermophoresis emerged as a high-throughput label-free method to evaluate binding constants and F I G U R E 5 Dose-response curves of compounds in cell-based Nsp5 protease assay.IC 50 values are also in Table 3 below.Solid line: curve fit result.Dashed lines: 95 % confidence interval.Data are expressed as the mean � standard deviation of 3 independent experiments each performed in triplicate.Green triangles show positive controls for inhibitor GC376 (see Supporting Information section 4.3.2).Cytotoxicity was detected above 20 μM, so higher concentrations were excluded.

F I G U R E 6
Binding curves of S compounds using Microscale thermophoresis performed in triplicate.Error bars show standard deviations.The gray region shows the K D for positive control Ace2.See Supporting Information Section 4.2 for details on assay conditions.

F I G U R E 7
Crystal structures with examples of the Nsp5 benzotriazolyl acetamide family and Nsp3 (macrodomain) binders.The compounds are shown with purple sticks and balls and the PanDDA event map is shown as an orange mesh.PDB files can be downloaded from https://github.com/hermanslab/COVID-19.

F I G U R E 8
Viral reduction assays of compounds found by the teams compared to Remdesivir as the control.Error bars show standard deviations over triplicate measurements.An IC 50 value could only be determined for Nsp5-3.