Beam Search for Automated Design and Scoring of Novel ROR Ligands with Machine Intelligence

Abstract Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. Herein, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model‐intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded novel inverse agonists of retinoic acid receptor‐related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low‐micromolar to nanomolar potency towards RORγ. This model‐intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data‐driven drug discovery.


Introduction
Generative deep learning, [1,2] that is,aclass of machine learning models able to generate new data, can be applied to computationally design pharmacologically active compounds de novo. [3][4][5] Deep learning-based molecular design algorithms can extract high-level molecular features from "raw" molecular representations, [6][7][8][9][10] such as molecular graphs and the Simplified Molecular Input Line Entry System (SMILES, Figure 1a), [11] potentially allowing them to access unexplored regions of the chemical space. [12] Previous studies showed that chemical language models (CLMs), [13,14] in particular generative deep learning models trained on SMILES strings,c an generate novel molecules with experimentally validated bioactivity. [9,15,16] CLMs have shown the ability to learn focused chemical features from small collections of template molecules by means of transfer learning,that is,amethod to reuse previously learned knowledge on anew task for which the available data is scarce. [15,17,18] Tr ansfer learning is performed in two steps.I nt he first step,amodel is trained on al arge amount of data that relate to the task to be performed ("pre-training"). In the case of CLMs,t his is usually done using large collections of molecules (e.g.,inthe order of 200 000 to 1000 000 [9,16,17] ). Pre-training enables the generative model to capture a) the SMILES "syntax" (i.e., how alphanumeric characters should be assembled to generate strings that correspond to valid molecules, Figure 1) and b) the properties of the pre-training dataset, such as physicochemical features and synthesizability of the molecules in the dataset. In the second step,the pre-trained CLM is further trained ("fine-tuned") with as maller set of task-specific molecules. [13,19,20] During this transfer learning process,t he CLM is biased towards the chemical space of interest, that is, molecules with desired biological and physicochemical properties.T his ability to learn in al ow-data regime ("few-shot" learning [21,22] )r enders CLMs particularly useful for application to biological targets for which only few ligands are known. Thef ully trained CLM can be used to generate new molecules in the form of SMILES strings.S uch data generation is performed by predicting one character of aS MILES string ("token") at at ime,b ased on all the previous tokens.I mportantly,t his process does not require handcrafted molecule design rules,asCLMs learn solely from the SMILES strings used for training.
Previous prospective applications of CLMs for de novo molecule generation used the so-called "temperature sampling" to generate large virtual molecular libraries. [9,13,15] Te mperature sampling allows to sample new SMILES strings by adding tokens to the (growing) string according to the probabilities learned by the CLM, wherein the most likely token at ag iven position will be sampled more often (Figure 1b). However, the generated SMILES strings might not always be "chemically meaningful" (invalid strings), or they might not match the feature distribution of the training data because of the random component of temperature sampling. Therefore,a dditional methods are usually needed to select the most promising designs from the virtual molecular libraries,e .g., based on the similarity to known bioactive molecules,e xternal activity prediction, or reward functions. [9,13,15,23] Here,weuse the beam search algorithm as am odel-intrinsic alternative to temperature sampling. This method enables the CLM to simultaneously generate and prioritize the molecular designs in an automated fashion, without employing additional selection methods. [24,25] Beam search scoring was successfully validated in ap rospective application aiming to generate new retinoic acid-related orphan receptor (ROR) [26] ligands from scratch.
RORs were chosen as molecular targets because these receptor proteins are an attractive but not extensively studied family of potential drug targets.T hey constitute af amily of ligand-activated transcription factors that mainly act as monomers and are involved in the circadian control of energy homeostasis [27,28] and immune system regulation, [29,30] among other functions.R ORs hold promising pharmacological potential for various indications,specifically for autoimmune diseases. [29,30] No RORl igand has reached drug approval to date,partially owing to compound-related issues such as poor aqueous solubility,l ack of selectivity,a nd clinical safety concerns. [29,31,32] Results and Discussion

Chemical Language Model and Beam Search Sampling for De Novo Design
We explored the beam search algorithm [33] to generate molecules from aC LM as ap otential alternative to temperature sampling combined with an external ranking method. Given the probabilities learnt by aC LM, av ast number of SMILES strings could in theory be sampled. As it is computationally not feasible to sample all outputs,aheuristic method such as beam search can be used to find the likely outputs.H ere,o ur underlying hypothesis was that the probability for generating acertain SMILES string correlates with the quality of the corresponding molecule regarding the implicit design objective as represented in the fine-tuning set (e.g.,desired bioactivity,physicochemical properties). During molecule generation by beam search sampling, the algorithm progressively adds tokens to aS MILES string while keeping track of the k most likely SMILES string(s). To add an ew token, the algorithm computes the conditional probability of each possible token given the tokens in the existing string and defines the k most likely tokens to extend the string (Figure 1c). Thes et of k most likely selections is based on ascoring function ("beam search score"), which is computed as the product of the probabilities of each token (Figure 1c). This process is repeated until the SMILES string is completed (i.e., the "end-of-string" token is added) or ap redefined maximal string length is reached. Thus,b eam search can be used to generate highly probable molecules,a sc omputed by (i)the underlying model and (ii)the beam search score.T he beam search score allows to rank the de novo designs according to the probability of their SMILES tokens.
As af ramework to probe beam search sampling, we employed ar ecently published CLM based on ar ecurrent neural network with long short-term memory cells (LSTM), which are suited for sequence modeling. [34] TheC LM was trained with the SMILES strings of 365 063 molecules from ChEMBL [35] to iteratively predict the next token of each SMILES string given the preceding tokens (Figure 1b). The training procedure was carried out over ten epochs,meaning that each molecule used for training was seen by the CLM ten times.This pre-trained CLM was then fine-tuned using sets of known RORl igands ( Figure S1, Table S1), to obtain ab ias towards the design objective,n amely the generation of new molecules with bioactivity on RORs,b yt ransfer learning. Open-source code for the CLM and the beam search algorithm, and the data used in this study are available at https://github.com/ETHmodlab/ molecular design with beam search. learning from purely synthetic molecules,b ecause of the overall higher structural diversity,g reater three-dimensionality,a nd often superior selectivity of bioactive natural products. [38,39] We aimed to obtain de novo designs possessing three properties:( i) natural product-inspired chemical structure,(ii)synthesizability,and (iii)bioactivity on RORg.Aiming to fulfil all three objectives during transfer learning, the previously pre-trained CLM on bioactive molecules from ChEMBL [17] was fine-tuned on one synthetic and four natural product RORg modulators described in literature [30] (Figure S1). From the fine-tuned model, beam search sampling was started after the fifth epoch of fine-tuning,toensure that the CLM had sufficiently captured the molecular features of the small fine-tuning set.
All valid SMILES strings generated between epochs 5and 16 (last fine-tuning epoch) were ranked by beam search scoring. Thet op five designs according to the beam search score (Figure 2a)w ere deemed synthetically inaccessible by medicinal chemists.T his was further highlighted by the predictions of amachine learning algorithm for retrosynthetic analysis (IBM RXN) [40] which did not find as ynthetic route for any of these molecules.T hus,w hile the CLM captured natural product likeness,the model failed to meet the generic design criterion of synthesizability.T hese findings point to abenefit of beam search sampling in revealing the most likely CLM molecules to assess the success of fine-tuning in terms of the design objectives.
Aiming to improve upon these results,w ep erformed as econd experiment in which we applied at wo-step finetuning strategy.F irst, the pre-trained model was fine-tuned for 20 epochs on 255 synthetic RORg ligands from the US patent subset of the Protein Data Bank [41] (255 molecules, Table S1) to capture both bioactivity and synthesizability. Then, the model was fine-tuned with four natural product RORg modulators [30] (Figure S1) for 16 epochs,aiming to bias the model towards natural-product-likeness.A gain, valid SMILES strings generated by beam search sampling between epochs 5a nd 16 of the (second) fine-tuning step were considered. With this second approach, the top 5s ampled molecules (Figure 2b)were synthetically accessible according to IBM RXN, [40] which could propose as ynthetic route for each of them. Importantly,the computer-generated molecules possess certain natural product characteristics (Figure 3, Table S2), as indicated by ah igh fraction of sp 3 -hybridized carbon atoms (Fsp 3 ). Thet op five designs have Fsp 3 values ranging from 50 %t o75 %. These values are comparable to those computed for the MEGx natural product library (Analyticon Discovery GmbH, rel. 09-01-2018), and exceed the average Fsp 3 value of the ChEMBL molecules used for pre-training (51 AE 30 %a nd 33 AE 20 %, respectively). These results suggested that the two-step fine-tuning procedure complied with the design objectives and the implemented two-step approach was chosen for prospective application.
We then compared the beam search designs obtained with the chosen computational strategy to known RORg modulators and to the fine-tuning molecules (Figure 3a,b). Despite favoring only some of the most likely tokens while generating new SMILES strings,a nd examining only al imited set of possibilities,the beam search sampling still allowed to explore the chemical space beyond the regions that are populated by the fine-tuning compounds (Figure 3a). Compared to the inverse RORg agonists annotated in ChEMBL (IC 50 < 1 mm, Figure 2d), the beam search designs are structurally more diverse in terms of substructure fragments,asrepresented by Morgan fingerprints. [42] Still, the designs possess ac ertain degree of similarity to the known active molecules in terms of their three-dimensional shape and partial charge distribution (as represented by the Weighted Holistic Atom Localization and Entity Shape [WHALES] descriptors [43,44] ). Apparently, the CLM, in addition to learning the SMILES "syntax", also  Figure S2). learned certain "semantic" ligand features that are relevant for binding to macromolecules,such as their molecular shape and partial charge patterns.

Prospective Experimental Validation
Three beam search designs were synthesized and characterized in vitro.W eselected them based on their beam search score.F rom the five most likely designs (Figure 2b), we selected molecules 1 and 2,which were ranked first and third. Compound 2 showed the highest Tanimoto similarity (Morgan fingerprints) to ak nown RORg modulator (Figure 2b). Thescaffolds of both compounds were also prominent among the beam search designs not included in the top 5, suggesting astructural preference.The scaffold of 1 also appeared in the design ranked 6 th .M olecules ranked 10 th and 13 th resembled compound 2.H ence,w ea dditionally chose compound 3 of this abundant chemotype from rank 13 for prospective validation. Compounds 1-3 were synthesized according to Scheme 1.
In vitro characterization of compounds 1, 2,and 3 in Gal4-RORh ybrid reporter gene assays confirmed inverse RORg agonism with micromolar to sub-micromolar IC 50 values (Table 1). Thet op-ranked compound 1 counteracted RORg activity with an IC 50 value of 4.6 mm.Itwas additionally active on RORa and RORb,b ut precise IC 50 values could not be determined due to cytotoxicity. Compounds 2 and 3 blocked RORg activity with IC 50 values of 0.37 mm (2)and 0.68 mm (3), respectively.I na ddition to being inverse RORg agonists,a ll three synthesized designs revealed pronounced preference for the RORg subtype,with compounds 2 and 3 possessing more than tenfold higher potencyo nR ORg compared to the related RORa and RORb isoforms.T hese results show that the CLM with beam search sampling conserved the bioactivity of the training molecules in the computational designs.

Conclusion
Herein, Beam search sampling from CLMs was applied to generating new molecules with desired bioactivity on the ligand-activated transcription factor RORg.T he algorithm automatically generated and scored the designs,w ithout the need of additional prioritization rules.P rospective experimental validation yielded three novel, potent inverse agonists of the nuclear receptor with various degrees of similarity to known RORg modulators (ranging from 0.28 to 0.71, as captured by Tanimoto similarity on Morgan fingerprints). Apparently,t he beam search approach coupled with aC LM conserves structural features necessary for the desired bioactivity but still generates structurally diverse compounds in terms of fragments.T his observation corroborates beam Figure 3. Characteristicsofdesigns from the CLM with double finetuning. a) Stochastic neighbor embedding (t-SNE) [45] projection of the compound sets based on Morgan fragmentf ingerprints (length = 1024, 2-bond radius,T animoto similarity). The location of the two-fine tuning sets, the RORg modulators annotated in ChEMBL (IC 50 < 1 mm,1091 compounds), and the beam search designs are shown. b) Comparison of the sampled moleculardesigns with known RORg modulators (IC 50 < 1 mm)i nterms of Morgan fragmentfingerprints ("Morgan") and three-dimensional shape and electrostatics descriptors (WHALES). The pairwise distance distribution among known RORg modulators contained in ChEMBL is shown as areference. ForMorgan fingerprints, the Tanimoto distance is shown;f or WHALES the range-scaled Euclidean distance is shown. "Beam (15)" and "Beam (5)" indicate the top 15 and top 5d esigns, respectively. Boxplots indicate 25 th ,50 th ,and 75 th percentiles (lines), mean values (circle), and outlier boundaries (whiskers,1.5 interquartile range).
search sampling as at echnique for the de novo design of bioactive molecules by aC LM. Thec omputational and experimental results suggest two attractive properties of the beam search algorithm. Firstly,b ys earching for the most likely molecules aC LM can generate,t he beam search algorithm probes the suitability of aCLM for the given task. Evaluation of the resulting designs allows to check the compliance of the CLM designs with the design objectives and to assess the success of fine-tuning.T his is in contrast to standard temperature sampling, which might lead chemists to consider designs that are not likely according to the model. Secondly,b eam search sampling could potentially overcome the need for external compound prioritization. It should be noted, however,t hat the number of designs that can be sampled by beam search is limited compared to temperature sampling, which can virtually generate an infinite number of chemical structures.T he two techniques complement each other, and both offer characteristic advantages.T he desired application should guide the choice of either strategy.I f corroborated in future prospective studies,b eam search sampling may help to further the applicability of CLMs for de novo molecular design in medicinal chemistry.