Hit identification of novel small molecules interfering with MALAT1 triplex by a structure‐based virtual screening

Nowadays, RNA is an attractive target for the design of new small molecules with different pharmacological activities. Among several RNA molecules, long noncoding RNAs (lncRNAs) are extensively reported to be involved in cancer pathogenesis. In particular, the overexpression of lncRNA metastasis‐associated lung adenocarcinoma transcript 1 (MALAT1) plays an important role in the development of multiple myeloma (MM). Starting from the crystallographic structure of the triple‐helical stability element at the 3'‐end of MALAT1, we performed a structure‐based virtual screening of a large commercial database, previously filtered according to the drug‐like properties. After a thermodynamic analysis, we selected five compounds for the in vitro assays. Compound M5, characterized by a diazaindene scaffold, emerged as the most promising molecule enabling the destabilization of the MALAT1 triplex structure and antiproliferative activity on in vitro models of MM. M5 is proposed as a lead compound to be further optimized for improving its affinity toward MALAT1.

Administration (FDA) for spinal muscular atrophy, is a successful example of a drug discovery history in small molecules targeting RNAs. [5,6] Among all RNA classes, lncRNAs have recently emerged as key controllers of gene expression and potential therapeutic targets. [7][8][9] These molecules show developmental-and tissue-specific expression and are able to regulate many cellular processes, including the expression of oncogenes. [10] Structured RNA elements are constantly identified in these transcripts, so the use of small molecules to probe the specific functions and interactions of these domains represents an exciting journey to better characterize noncoding RNA (ncRNA) biology. For the rational drug design of small molecules interfering with lncRNAs, there are three main steps to be successful: (i) elucidation of the mechanism of action of lncRNAs, (ii) analysis of the related functional structure pocket, and (iii) search for molecules that can accommodate into specific binding sites with this pocket. [2,11,12] While small interfering RNA (siRNA) (short interference), antisense oligonucleotide (ASO), and clustered regularly Metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) is a well-known nuclear~8 kb lncRNA. It is a highly conserved lncRNA, mostly studied as a biomarker for cancer cells due to its upregulation and pathophysiology by promotion of the hallmarks of cancer. [13] It is involved in several physiological processes, such as alternative splicing, nuclear organization, and epigenetic modulation of gene expression. [14] MALAT1 has been overexpressed in a wide variety of both hematological and solid tumors [15] and plays a key role in the maintenance of the undifferentiated status of hematopoietic stem cells [16] and in B-cell activation. [17] In particular, high expression of MALAT1 has been associated with the onset of the disease and progression from normal to malignant plasma cells (PCs) leading to the arising of multiple myeloma. [18] Thus, the use of ASOs [19][20][21] or gene knockdown strategies on in vitro cellular models [21][22][23] demonstrated promising antiproliferative effects due to MALAT1 depletion. Moreover, MALAT1 can be considered an excellent candidate as an antitumor target since its knockdown was found to reduce tumor growth and metastasis in preclinical models of various types of cancer with minimal side effects in normal tissues. [24] Concerning its three-dimensional (3D) structure, four domains have been experimentally characterized: (i) two stem hairpin regions that promote the cell cycle through the G2/M phase, through their binding to the heterogeneous nuclear ribonuclear protein C (HNRPC) [25] ; (ii) a transfer RNA (tRNA)-like structure involved in the 3' end processing of MALAT1, leading to the production of MALAT1-associated small cytoplasmic RNA (mascRNA), a small noncoding RNA [25] and (iii) a stabilizing 3′ end triple helical domain which prevents the RNA degradation mechanism and also acts as a nuclear retention element (ENE) similarly to Kaposi's sarcoma-associated herpesvirus polyadenylated nuclear RNA. [26] Currently, the triple helical domain is the only known three-dimensional crystal structure deposited in the Protein Data Bank (PDB) with the code 4PLX. [26] In particular, it consists of an A-rich tract, characterized by a bipartite triple helix containing stacks of five and four U·A-U triplets separated by a C + ·G-C triplet and C-G doublet, extended by two A-minor interactions. In vivo decay assays showed that this blunt-ended triple helix is very important to inhibit its rapid nuclear decay.
Recently, some studies have supported the druggability of the MALAT1 triple helix. In particular, Hargrove and coworkers demonstrated, for the first time, that MALAT1 triple helix can be selectively targeted with small molecules by developing a library of compounds characterized by the RNA-binding scaffold diphenylfuran (DPF). [27,28] Through a high-throughput small molecule microarray, Abulwerdi et al.
identified a specific and bioactive compound able to decrease the cellular levels of MALAT1 in a mammary tumor organoid model. [29] Furthermore, starting from the analysis of the different possible targetable pockets, Khanna and coworkers found a new benzenesulfonamide compound characterized by a weak affinity, but a high specificity versus MALAT1. [30] However, to our knowledge, no small molecules interfering with MALAT1 have reached an advanced phase of preclinical development in the perspective of clinical studies. In this regard, here we performed a highthroughput in silico screening of a large commercial database using a structure-based virtual screening (SBVS) approach with the aim to identify promising MALAT1 binders with anticancer activity. Then, we performed in vitro assays to confirm their structure-function relationships and actual pharmacological effects. Therefore, we selected in vitro models of MM to assess the effect induced by the most promising compound on tumor cell viability. This work paves the way to understand at the atomistic level the molecular features needed to obtain MALAT1 binders through a virtual screening approach.

| Computational studies
In this study, we used an SBVS approach to screen a multiconformational database from the Asinex vendor, including comprehensively ∼103 thousand of compounds against the 3D structure of MALAT1 triple helix. The virtual screening protocol applied in this work is summarized in Figure 1.
First of all, with the aim to save computational time, such a database was reduced in size by using Qikprop, [31] which predicts the widest variety of pharmaceutically relevant properties to filter out candidates with suitable absorption, distribution, metabolism, excretion (ADME) properties. Accordingly to the popular empirical Lipinski's rule of five (RO5), [32] we obtained 71,943 compounds.
Then, we further filtered these compounds by removing 495 molecules that were identified as Pan Assays Interference compounds (PAINS). The resulting 71,448 compounds were submitted to docking simulations against the 3D structure of MALAT1 triplex and evaluated on the basis of their G-Score value. In particular, we focused our screening on a specific portion of MALAT1 characterized by a C + ·G-C triplet and an adjacent C-G doublet that induces a helical reset. Since the C-G doublet stabilizes the U·A-U triplet in vivo, we targeted this site to find compounds able to destabilize it and the entire triplex structure. The PDB model 4PLX is not complexed to a co-crystallized ligand, so after docking simulations, we selected the first 2000 molecules with G-score values ranging from −11.40 to −8.17 kcal/mol and we submitted them to a thermodynamic analysis.
Thus, by applying the Multi-Ligand Bimolecular Association with Energetics (eMBrAcE) tool, we estimated the free energy (ΔG (calc) ) of the best docking complex for each previously selected compound.
After this calculation, the ligand-receptor complex was relaxed, allowing MALAT1 to adjust its conformation to better accommodate the docked small molecule, according to the process known as Induced-Fit. Both energetic and geometric evaluations were carried out for each molecule. Specifically, the energetic aspect was evaluated for identifying the compounds able to bind to the MALAT1 triple helix with a better thermodynamic profile. Therefore, we selected the best 100 compounds with ΔG (calc) values ranging between −72.79 and −50.14 kcal/mol. On the other hand, the geometric aspect allowed to filter those compounds responsible for inducing the main structural alteration of the triplex.
Indeed, considering that compounds interfering with MALAT1 have to destabilize its 3D structure, we calculated the root-mean-square deviation (RMSd) value between the crystallographic model and the minimized one complexed to each compound, by taking into account only the heavy atoms of the two triplexes. Thus, the first 50 molecules with the highest RMSd values were submitted to a careful visual inspection process. Finally, after checking their commercial availability, five compounds were purchased (Table 1).

| UV-thermal melting studies
The ability of the selected compounds to bind to MALAT1 triple helix was experimentally evaluated by performing UV-thermal melting studies. Thermal melting is widely used to determine the stability of nucleic acid secondary structures and to evaluate the effects of putative ligands on their stability. [33] Raising the temperature of a solution containing a structured nucleic acid leads to dissociation into single strand(s), which gives rise to a change in absorbance properties. The temperature at the midpoint of this transition, commonly defined as the melting temperature (T m ), provides information on the structure stability, and changes in this value (ΔT m ) can be used to compare the effects of drug binding. An increase in the T m of nucleic acid structure in the presence of a ligand denotes the binding of the ligand to the folded form of nucleic acid and an overall stabilization of its secondary structure. Conversely, a decrease in the T m indicates that the compound is shifting the equilibrium between the folded and unfolded forms toward the latter, that is, it induces a destabilization of the structure. UV-thermal melting experiments are usually performed by recording absorbance at 260 nm as a function of temperature. As the nucleic acid structure unfolds, the bases become unstacked and exposed to solvent, thereby producing an increase in absorbance.
As expected for a triplex-forming oligonucleotide, [34] the melting curve of MALAT1 showed two sigmoidal transitions corresponding to dissociation of the third strand (T m1 = 58.5°C) followed by melting of the duplex (T m2 = 78.5°C). UV-melting curves for MALAT1 in the presence of the selected compounds indicated that they exert different effects on the RNA structural stability (Supporting Information: Figure S1 and Table 2). Interestingly, compound M5 slightly decreased triplex stability, thus suggesting that its interaction with the triplex could likely induce local structural rearrangements resulting in a structural destabilization.
Conversely, compounds M3 and M4 slightly increased the T m of the triplex, thus indicating that their binding to the RNA molecule stabilizes the triple helix structure. The remaining compounds M1 and M2 did not produce any significant change in the triplex stability (ΔT m < 1.0°C), suggesting that they may weakly and/or nonspecifically bind to the triplex-forming region of RNA or may not bind it at all. Regarding the effects on the duplex-to-single-strand transition, most of the compounds showed the ability to slightly increase the thermal stability of the duplex. The highest effects were observed for M3 and M5. F I G U R E 1 Virtual screening workflow. RO5 and SBVS indicate, respectively, the Lipinski's rule of five and the structure-based virtual screening approach.

| Fluorescence intercalator displacement (FID) assay
To gain insights into the affinity of M5 toward MALAT1, FID experiments were performed. Basically, this experiment is based on the competitive displacement of a light-up fluorescent probe, the thiazole orange (TO), from the folded nucleic acid upon the addition of increasing amounts of a candidate ligand. [35] TO is almost nonfluorescent when free in solution, while it displays an increased fluorescence when bound to double-, triple-, or quadruple-stranded structures of DNA and RNA. [36][37][38] Indeed, TO can intercalate or stack between base pairs and inside grooves of nucleic acids, leading to constricted torsional motion within the dye and

| Binding mode analysis of the best hit
Based on the obtained biophysical results, we analyzed the binding mode of M5, the only compound capable of destabilizing the MALAT1 T A B L E 2 Melting temperature values of the MALAT1 triple helix RNA in the absence and presence of the investigated compounds (1 molar equiv). T A B L E 1 Vendor codes, 2D chemical structures, Glide XP scores (XP G-score), binding free energy (ΔG (calc) ), and RMSd values of the five best hits complexed to MALAT1.

| Biology
We  tion. [39] Indeed, nucleic acids are very flexible and often lack spatial structure information from X-ray crystallization, nuclear magnetic resonance (NMR), or other structural methods. Thus, small molecules targeting these structures have become a major and challenging topic in recent years.
Conversely to the previously reported virtual screening work, [30] we defined the C-G doublet between the two triplexes as the binding site to investigate in our simulations and we discovered a new small molecule, compound M5, able to bind and interfere with the oncogenic activity of MALAT1.  Compared with the ligands already reported in the literature as MALAT1 binders, [27][28][29][30] compound M5 possesses a diazaindenic structure never reported before. In addition, its binding mode seems to differ from other known MALAT1 ligands. Indeed, although it binds the Triplex I as the ligand reported by Le Grice and coworker, [29] the main interactions of our compound involve residues U7 and U8. Moreover, its potential drug-like profile makes this compound an excellent starting point for optimizing new derivatives with higher affinity to the triple helix. Finally, we investigated the antiproliferative activity of compound M5 on in vitro models of MM, which is still an incurable PC neoplasia. Specifically, the crucial role of MALAT1, as well as the anticancer activity obtained through its inhibition via oligonucleotides-based therapies, have been established, [21] but to our knowledge no studies concerning a small molecule-based targeting are available.

Analysis of UV
We found that M5 reduces the proliferation of MM cells in a doseand time-dependent manner. Importantly, we observed a significant impairment of cell viability also in a cellular model of resistance to Bortezomib, which is one of the main drugs actually included in clinical regimens. Therefore, the treatment with the selected compound could be potentially effective on both sensitive and resistant/refractory MM.
The targeting of lncRNAs in human cancers provides an advantage in terms of selectivity and toxicity due to their tissuespecificity and low abundance with respect to microRNA (miRNA) and messenger RNA (mRNA), as well as due to the possibility to modulate target gene expression in a more specific manner with respect to other approaches, such as epigenetic therapies. Moreover, the use of small molecules can be proposed as a promising approach on par with ASOs, but with the possibility of more easily improving some limits related to oligonucleotide-based therapeutics. In particular, for small molecules, it is possible to improve their absorption, distribution, oral bioavailability, and stability in body fluids, which play a relevant role in terms of interindividual variability. [40,41] In light of our findings, we can conclude that the applied workflow was successful in retrieving new promising drug-like scaffolds, different from the known MALAT1 binders and worthy of further investigation. In particular, starting from M5, future lead optimization and SAR studies will be carried out to enhance its binding affinity and MALAT1 destabilization.

| Computational methods
In this study, we used a multiconformational database from the Asinex vendor. Thus, the LigPrep module, [42] as implemented in Schrodinger Suite, [43] was employed to calculate physiologically relevant ionization state (pH = 7.4) of all compounds and to determine the stereochemistry of chiral centers. All molecules were energy minimized using OPLS2005 as force field. [44] Afterwards, such databases were reduced in size by applying different physical and chemical filters. First, the popular empirical Lipinski's rule of five [45] was applied to select only molecules with drug-like properties. Hence, all molecules were characterized for their ADME profile by means of QikProp, [31] and those without appropriate pharmacokinetic properties were removed. Subsequently, we applied the ZINC15 algorithm [46] for removing all Pan Assay Interference Compounds (PAINS) from the screening libraries. [47] Lastly, the final database was adopted for the following docking simulations.
So far, the only known three-dimensional structure of the lncRNA MALAT1 is the crystallized triple helical domain (PDB ID: 4PLX). [48] Thus, the 3.1-Å-resolution crystal structure of the human MALAT1 ENE and A-rich tract was prepared and refined using the Protein Preparation Wizard tool implemented in Maestro ver. 9.7, [49] to correct common structural problems and to create a reliable allatom triplex model. Moreover, hydrogen atoms were added and the geometry of all the hetero groups was corrected.
The screening was performed by means of a structure-based approach by applying docking simulations. [50] The energy grid was built centering the docking box on a central C + ·G-C triplet and C-G doublet in the MALAT1 ENE+A and setting its outer box size to 35 × 35 × 35 Å.
The scaling factor of the van der Waals radii was set to 1.0. Thus, the Glide software ver. 7.8 was used [51] and the binding affinity for each filtered compound of the Asinex database against MALAT1 was predicted using Glide Extra Precision (XP) protocol, generating 10 poses for each ligand. Specifically, docking calculations were conducted by considering ligands as flexible structures and the receptor as rigid.
For each compound, the best docking pose was selected, considering the G-score value, which expresses the theoretical interaction affinity between the compound and the receptor. Finally, a thermodynamic analysis was performed by applying the Bimolecular Association with Energetics (eMBrAcE) tool, [52] thus calculating the binding energy (ΔE) of the compounds with the best theoretical affinity. Therefore, we obtained the molecular mechanics' minimization for the best complex of each ligand and the corresponding ΔE value, which is expressed in the following equation:  Scientific) at 37°C in a 5% CO 2 atmosphere as previously described. [53,54]

| Cell viability assay
Cell viability was evaluated as previously described [55] by CellTiter-Glo Luminescent Cell Viability Assay (Promega), according to the manufacturer's instructions. Briefly, cells were seeded at 250,000/ mL in a 24-well plate and treated for 24-48 h with DMSO (vehicle) or increasing concentrations of the compound M5 (2.5, 5, and 10 µM). The luminescence was recorded by using a Glomax multidetection system (Promega). PBMCs were isolated from healthy donors who provided informed consent according to our institutional bioethical requirements and then cultured as previously reported. [55]

| Statistical analysis
Each experiment was performed at least three times and values are reported as means ± SD. Statistical evaluations were determined using Student's t test by GraphPad software (www.graphpad.com).
GraphPad Prism version 6.0 was used to obtain graphs. p-values <0.05 were accepted as statistically significant.

This research was funded by the Italian Association for Cancer
Research (AIRC) research project "Small molecule-based targeting of lncRNAs 3D structure: a translational platform for the treatment of multiple myeloma" (Code 21588). The authors also acknowledge the PRIN 2017 research project "Novel anticancer agents endowed with multitargeting mechanism of action", grant number 201744BN5T.

CONFLICTS OF INTEREST STATEMENT
The authors declare no conflicts of interest.