Machine learning in drug design: Use of artificial intelligence to explore the chemical structure–biological activity relationship

The paper presents a comprehensive overview of the use of artificial intelligence (AI) systems in drug design. Neural networks, which are one of the systems employed in AI, are used to identify chemical structures that can have medical relevance. Successful training of neural networks must be preceded by the acquisition of relevant information about chemical compounds, functional groups, and their possible biological activity. In general, a neural network requires a large set of training data, which must contain information about the chemical structure–biological activity relationship. The data can come from experimental measurements, but can also be generated using appropriate quantum models. In many of the studies presented below, authors showed a significant potential of neural networks to produce generalizations based on even relatively narrow training data. Despite the fact that neural network systems have been known for more than 40 years, it is only recently that they have seen rapid development due to the wider availability of computing power. In recent years, there has been a growing interest in deep learning techniques, bringing network modeling to a new level of abstraction. Deep learning allows combining what seems to be causally distant phenomena and effects, and to associate facts in a way resembling the human mind.


| INTRODUCTION
SIEVE-Score, QML) are open-source applications implemented in Python. Therefore, it is up to the user/researcher to decide whether to run the code using an interpreter that performs calculations in the traditional way on the CPU or on the GPU. GPU vendor Nvidia has pioneered the use of GPU-based interpreters to run SciPy or NumPy packages included in drug design tools.
The current rapid development of ML algorithms, such as deep learning (DL), which allows building complex and flexible models based on data, and the success of these techniques in numerous and sometimes very distant areas, have further contributed to a huge increase in interest in ML in pharmaceutical companies over the last few years. 30 In general, AI is the broadest concept, also in health care 31 including problems related to training of neural networks. ML is a slightly narrower area usually considered to encompass algorithms using large, extensive neural networks, but designed for selected problems. Currently, the narrowest classified area of AI systems is DL. The main difference between ML and DL is the level of abstraction of the problems that these techniques cover. For example, an ML algorithm can be used to study the relationship between the structure and a selected physicochemical property of a substance. In turn, a DL algorithm can be used to give an answer on the potential relationship between disease symptoms and the structure of a therapeutically active compound required for treatment, which requires a much more complex neural network to cover a higher level of abstraction. DL algorithms are used to combine causally distant events and effects, a task that has so far only been possible for humans. Although ML and DL are in some aspects similar AI approaches the important distinction between them is the scope and complexity of the problems that they can operate on. ML is a broader term that includes DL in its meaning. DL, however, has capabilities that are broader than ML, due to the use of a larger network structure and its greater complexity. In general, ML gives the ability to create classification models however only with the provision of appropriate features whereas DL gives the ability to generate classification features on its own. DL is used to solve much more complex problems where the datasets are huge, characterized by high diversity and the data is less structured. A significant advantage of DL over ML is that as learning progresses, the network learns to extract features independently eliminating the need for manual feature extraction. However, it should be noted that DL requires significantly more hardware resources. This review focuses on recent developments in the field of drug design based on AI.

| STAGES OF DRUG DESIGN
Hughes et al. 32 present the typical stages of drug discovery ( Figure 2). Exploration of available biomedical data has significantly intensified target identification. Data mining, refers to the use of a bioinformatic approach to improve identification, and also to indicate and prioritize potential targets for diseases. 33 Another effective method is the search for potential genetic relationships, for example, between genetic polymorphism and the risk of disease or disease development, or establishing whether a polymorphism is functional. 34 A further method is the use of phenotypic screening to determine disease-relevant objectives. Phage display is one of the most potent and extensively used laboratory techniques for studying protein-protein, protein-peptide, and protein-DNA interactions. This approach is mostly based on the protein display on the surface using phages, and is then used to investigate purpose-built libraries containing millions or even billions of bacteriophage that were displayed. 35 Validation methods include techniques from in vitro approaches through the use of complete animal models and modulation of a required objective in ill patients. Certainty in the observed results is considerably increased by a multivalidation approach. After the target validation process, complex screening tests are developed during the "hit" identification phase and the main discovery in the drug discovery process. A "hit" molecule is defined as a chemical compound which has the desired activity in a complex screening test and whose activity is proven after reexamination. A method called high throughput screening (HTS) involves screening of the whole chemical component library directly in relation to the drug target. Alternatively, a more complex assay arrangement is used, such as a cell-based assay, in which the activity is target-dependent, but which consequently would also need secondary assays to verify the mechanism of action of the investigated substance. 36 Besides HTS, compound libraries, such as datasets based on rule of five, 37 are used to determine the "hit" set of molecules for further investigations. Analysis of the substance "hit" list based on algorithms of computational chemistry permits refining and selecting hits for further progression based on a chemical cluster understood as an ensemble of molecules and factors such as ligand performance, which give an idea of how well a compound produces an effect of required or expected magnitude. This is followed by the "hit to lead" phase in which the effort is made to extract more effective and selective compounds from the hit series, such that they exhibit properties sufficient to test their efficiency in any in vivo models available. Normally, this task involves carrying out intensive structure-activity relationship (SAR) studies of each of the main complex structures, with measurements to determine the activity and selectivity of each individual compound. Quantitative structure-activity relationship (QSAR) and SAR models [38][39][40][41][42] are mathematical modeling techniques that can be used to predict physicochemical properties and biological activities for the analyzed chemical compounds based on their known chemical structure. These models are available free of charge or as commercial computer programs. QSAR models must be scientifically valid, and the substance must belong to the field of application of the model. The aim of this final phase of drug discovery is to preserve the promising features and characteristics in lead components while making improvements of flaws in the lead structure. All molecule data collected at this stage will allow the development of the final candidate profile which, along with the toxicological and chemical production and control conditions, will provide the basis for a regulatory application to start administering to humans.
In the case of drug design, AI is used primarily to assess the potential properties of active substances and, to a lesser extent, to discover new drugs or new uses for already existing drugs (drug repurposing) and synthesis routes. At each of these steps it is necessary to know the structure of the compound, and its interactions. 43

| DRUG DESIGN IN PRACTICE
It was shown by Lipinski, 37 who introduced a rule of five which defines molecular properties essential for a drug's pharmacokinetics in the human body, that the chemical space might contain as many as 10 60 compounds when taking into consideration only basic structural rules. 44 In the light of the above, researchers have been creating databases of drug like chemical structures. The biggest databases are GDP-13, 45 containing approximately 970 million compounds, and GDP-17, 46 containing 166 billion organic small molecules, both freely available for researchers. There are also databases created purely on an ab initio basis using quantum calculations. Maho 47 derived a database containing 1.52 million substances using a density-functional theory (DFT) approach with the B3LYP exchange-correlation functional and basis set 6-31+G* able to represent electronic wave functions of chemical elements up to argon. Such databases create the potential to research possible pathways for drug design.
One of the key problems that occurs when comparing chemical compounds for selected structural features is the relatively high complexity of the process of searching for and identifying selected chemical substructures. It is assumed 48 that searching for chemical structures belongs to the class of non-polynomial-complete computational problems O(k N ), where N is the number of atoms. This means, in a worst-case scenario, an exponential increase in the duration of calculations with each successive atom added to the investigated structure. In the traditional approach, the solution used to describe similarities between substances was to capture the structure in topological indices, for example the Wiener, Balaban or Hosoy index. In strict applications in drug design procedures, topological indices carried too little information about the indexed compound. The solution proposed was then to use structural keys, in which appropriate information about structure was encoded using bit string expressions. The disadvantage of structural keys, however, was the requirement to employ a definite and unique agreement on how to code chemical structures, which limited their level of generalization. To overcome this problem, a higher level of abstraction was proposed in the form of molecular fingerprints, in which the necessity of using predefined patterns was eliminated and which, consequently, enabled the generalization without the use of predefined patterns. Similar to cryptographic fingerprints, a given chemical substructure is represented by a numerical hash being a sequence of bits. The specific binary representation of a given substructure is irrelevant, but it is important that each substructure is represented equally. The coding of the molecular fingerprint is done by means of a specific, typically proposed by researchers or software developers, randomization function, also called the hash function. Hashed fingerprints are a type of black box encoding a structure, which at the same time ensures that similar substructures receive a similar set of bits representing them. For example, convolutional neural networks (CNNs) are used in many areas of medical expertise. 49,50 CNNs can progressively filter different portions of training data and refine important features in the discrimination process used to recognize or classify patterns. A typical artificial neural network using neural connections on any-to-any basis can easily be overtrained. However, in the CNN's convolutional substructure, each neuron is connected only to the local input region. Local areas are defined by width and height, while depth extends through the entire input image layer. Such a limited area of hyper-connections is called a reception area. Convolutions allow extracting simple features in the initial layers of the network, for example, during image processing they recognize edges with different orientation or areas with different colors, and then shapes and geometric objects in the subsequent layers. Convolutional layers perform mathematical convolution operations on the input data and pass their results on to the next layer. This is similar to the reaction of the neuron in the visual cortex to a specific stimulus. Exemplary, processing of data describing a chemical structure involves recognizing its fragments by the convolution layer in individual iterative steps (Figure 3), thereby identifying individual characteristics of the structure. Such a network can be trained by applying input data encoded with SMILES (simplified molecular input line entry specification) notation 51 or even a bitmap image of a classical chemical structure. Eventually, the convolutional network is able to learn the relationship between the structure and the target parameter of biological, chemical, or physicochemical activity.
Various neural network algorithms have been proposed in the literature in drug design, ranging from very simple to extremely complex. 52 Due to the enormous breadth of the topic of neural network algorithms, only a very brief summary of the models discussed in this paper is given below. Some of the simple ones include multilayer perceptron (MLP) based on McCulloch-Pitts neurons or more complex regression classifiers such as logistic, naive Bayes, shallow neural networks, ridge, lasso, or support vector machines (SVM). Logistic regression is used when a variable is dependent on a dichotomous scale 53 and the explanatory variable has a two-point distribution. Naive Bayes regression is linear method in which statistical analysis is carried out using the method of Bayesian inference 54 and classifiers are based on the assumption of mutual independence of independent variables. Shallow network models usually have up to two layers of neurons and require properly prepared features to perform the learning process. Such models are relatively easy to overtrain which is characterized by too faithful adherence to specific data that such a network has already observed. Ridge and lasso regression is a type of regularization that consist on introducing additional information to the ill-conditioned problem to improve the quality of the solution and is used as a method to increase the generalization F I G U R E 3 Illustration of identification of chemical substructures by the convolutional layer of neural network of the trained model. Lasso (least absolute shrinkage and selection operator, L1) originally proposed by Santosa et al. 55 is able to reduce variability and improve the quality of linear regression methods. Ridge (Tikhonov regularization, L2) regression 56 is a method used when independent and explanatory variables are strongly correlated. The standard errors of ridge regression are reduced by the addition of a certain amount of bias to regression estimates. Besides SVM is a type of kernel class of algorithms and an abstract concept of a machine that acts as a classifier, whose training is to determine a hyperplane which separates examples belonging to two classes with a maximum margin. 57 Taking into account separate objects, the kernel function evaluates a certain similarity measure. Generalized regression neural network (GRNN) is a neural network that combines the advantages of radial network and MLP. The first hidden layer utilizes radial neurons, performing clustering of the input data. The second layer consists of only two summation neurons and is called the regression layer. Interesting solution is used hierarchical linear models which allow to take into account the structure of relationships between variables grouping observations.
Among the techniques used in drug design, tree based models also show satisfactory results. The decision trees include many learning algorithms to express given hypotheses. 58 Decision trees are widely used in problems concerning classification and prediction of ideas and concepts, among others in medical diagnostics. Random forest 59 involves constructing multiple decision trees while teaching and generating a class which is a mode, indicating the value with the highest probability of occurrence, or the value most frequently occurring in the sample, or predicted average of individual trees. 60 Random forests are a way of averaging many deep decision trees, trained on different parts of the same training set, to reduce variance.
The relatively simple radial networks and their more elaborate successors are a different way of solving learning problems. Radial network is a type of unidirectional neural network in which radial basis functions (RBF) is used and radial neurons are applied. RBF are real functions whose value depends on the distance from a certain point, that is, it is a measure of distance. A representative radial network contains an input layer, a hidden layer consisting of radial neurons and an output layer, working out the network response. Radial neurons are used to recognize repetitive and characteristic features of clusters of input data. More complex models that utilizes radial networks are probabilistic neural networks (PNN) in which the number of neurons in the hidden layer is equal to the number of training cases. The main feature of probabilistic networks is to normalize the values of output signals in such a way that their sum on all outputs of the network has the value of one. It can then be assumed that the values on the individual outputs of the network represent the probabilities of categories assigned to those outputs.
The most complex DL network models include GANs, CNNs, and capsule networks. GANs are type of networks used to create very realistic content. GAN consists of two parts, a generator and a discriminator, which engage in competition with each other during training. The generator creates artificial content and the discriminator tries to distinguish it from real world data. The network is trained to the point where the discriminator cannot differentiate the artificial data from the real data.
Capsule networks proposed by Geoffrey Hinton 61 improve generalization to new points of view, which means that after training in handling rotation, they learn that an object can be viewed from several different sides. Single computing unit in capsule networks is a capsule which is a generalized type of neuron. Vector carries information about the strength of activation through its length and about the context of activation through its direction.

| Determining drug properties
The great potential of AI was recognized by the pharmaceutical industry and medical community several years ago. There have even been large programs integrating scientists, which have made it possible to create large databases which form the basis for ML. An excellent example of this is the Tox21 Data Challenge 62 containing details of 12,000 environmental chemicals and drugs, including 12 different toxic effects, comprised stress response effects and nuclear receptor effects. Stress response panel consisted of the nuclear factor (erythroid-derived 2)-like 2 antioxidant responsive element, heat shock factor response element, genotoxicity indicated by ATAD5, mitochondrial membrane potential, and DNA damage p53 pathway. Nuclear receptor panel (biomolecular targets) contained the following elements: estrogen receptor alpha; androgen receptor; estrogen receptor alpha, luciferase; androgen receptor, luciferase; aryl hydrocarbon receptor; peroxisome proliferator-activated receptor gamma; aromatase. Based on 12,000 compounds as training data, ML was proposed for the evaluation of 647 compounds with excellent accuracy. It should be noted that the success of learning methods largely depends on the training datasets which are, therefore, the first prerequisite for obtaining reliable models.
In silico methods in drug properties prediction are based on several techniques. Artificial neural networks are the main method proposed for QSAR models (Figure 4). 15 This technique is widely used by the pharmaceutical industry in the drug discovery process. As early as at the beginning of the century, scientists noted that increasing computer power can support decision making in this area. For example, a study 63 compared a SVM with ML methods (RBF kernel and C5.0 decision tree) in predicting inhibition of dihydrofolate reductase by pyrimidines. The authors showed that SVM is an effective deterministic learning algorithm with reproducible results, with the lowest model error, as well as the shortest calculation time compared with the RBF ML methods. Based on this methodology, it is possible to predict the properties of drugs in the context of their toxicity.
Chemical carcinogenesis prediction is very important in drug discovery because of the crucial impact of drugs on human health. 64 In this case, two main mechanisms are considered: genotoxicity (by the mutagenicity of DNAdamaging chemicals) and non-genotoxic carcinogenic action. Distinction between both mechanisms is very important for risk assessment. It is crucial for non-genotoxic carcinogens which are classified as promoters for tumor development. However, genotoxicity is a risk factor at different concentrations and may result in mutations causing tumor growth initiation. Many recent studies have shown that environmental factors, including various chemicals, play a key role in cancer development. 65 Therefore, it is extremely important to identify substances with such activity and to prevent exposure to such carcinogens. Traditionally, animal assays were used to indicate substances with the carcinogenic potential. However, this method is not only costly and time-consuming, but also complicated by regulatory policies demanding changes in protocols of examination of toxicological effects. Singh et al. 66 showed a possibility to use the PNN and GRNN modeling approaches in prediction of carcinogenicity of diverse chemicals (by determining the tumorigenic dose, ÀlogTD 50 ). Authors employed the dataset from Carcinogenic Potency Database 67 including: for rats, 834 compounds (466 positive and 368 non-positive carcinogens), for mice, 632 (292 positive), for hamsters, 57 (38 positive). Of the various molecular descriptors, 12 non-quantum mechanical molecular descriptors were used. They could be divided into four categories: (i) physicochemical (octanol-water partition coefficient as Log P, density, melting point, half-life in water or in air, persistence time), calculated by molecular structures; (ii) constitutional (hydrogen-bond acceptor or donor, and carbon or hydrogen percentage); (iii) geometrical (maximum Z-length); and (iv) topological (Balaban index), computed based on 2D structures of the molecules (in the form of SMILES). It should be noted here that the authors employed relatively simple descriptors based mainly on physical and chemical properties for the evaluation of complex final estimators, prediction of carcinogenicity. Both models proposed differ in architecture, 5 or 9 input, and hidden layer for PNN and GRNN models, respectively. Moreover, PNNs are based on the Bayesian classification and classical estimators for probability density function, 68 while GRNNs are trained by a K-means clustering algorithm. The authors showed that the optimum PNN exhibited a high ability to predict and differentiate substances between positive and non-positive carcinogens and may be treated as a preliminary stage for the possible exclusion of new substances with a carcinogenic potential. The GRNN model, on the other hand, allowed predicting the tumorigenic dose with high accuracy.
These relatively simple models were presented in a study 66 and initiated further research in this area. For example, various ML models of in vitro and in vivo bioassays for rat carcinogenicity prediction were presented in reference 69. The first advantage over the previously described models is that here the authors used a much larger set of training data, including GreenScreen with genotoxicity results (in vitro GADD-45a-GFP assay) for 1415 compounds, 70 Syrian Hamster Embryonic with in vitro Syrian Hamster Embryonic (pH 7+) cell transformation assay results (356 compounds), 71 Hansen Toxicity Benchmark dataset with Ames bacterial mutagenicity results (6512 compounds), 72 F I G U R E 4 Machine learning methods in drug design ISSCAN (in vivo rat carcinogenicity, 854 compounds), 73 in vivo rodent pharmaceutical carcinogenicity results (374 compounds). 74 Moreover, to predict the assay results in this case, higher numbers of ML algorithms were compared: J.48 Decision Tree, Random Forest, MLP, k-nearest neighbor and Adaboost 75 with 10-fold cross validation. Moreover, descriptors associated with physicochemical properties were used to describe the compounds. The authors used for this purpose, such parameters as: (i) presentations of chemical structures by ChemAxon Standardizer with redrawn 3D coordinates, the explicit representation of hydrogens and reconfigured aromaticity; (ii) physicochemical properties (octanol-water partition coefficient as Log P), number of hydrogen-bond acceptor or donor, as well as rotatable bonds, polarizability, polar surface area, and molecular weight. It should be noted that the assessment of properties of substances (potential drugs), that is, in the context of carcinogenicity, is always supported by the assessment of chemical properties. It should be noted that the assessment of properties of substances (potential drugs), that is, in the context of carcinogenicity, is always supported by the assessment of chemical properties. This confirms the important role of coordination of chemistry in drug design with the use of ML techniques. The authors concluded that k-nearest neighbors model was the best one of all considered for in vivo rodent carcinogenicity prediction, and that the results obtained can contribute to future development of new drugs and determination of their properties with AI methods.
From the point of view of drug design, acute toxicity analysis is important as well. This parameter indicates unequivocally whether it is worth considering a given substance as a potential drug or whether, in view of the strong hazard to human health, any further stages of research in this area should be abandoned. It could also predict the sideeffects of overdosage and should support all phase III clinical trials of drugs. Evaluation of acute toxicity could help in the identification of patients at higher risk for overdosing, for example, those suffering from depression or dementia. In terms of acute oral, dermal and inhalation toxicity, the most common studies reported in the literature are related to oral toxicity assessment (mainly as median lethal dose, LD 50 parameter). In these studies, authors mainly use the database created by Zhu et al. 76 for 7385 compounds with their most conservative lethal dose. For example, in one study, authors predicted oral acute toxicity based on a molecular graph encoding a convolutional neural networks standard (MGE-CNN) with regression model, a multi-classification model and a multi-task model for deep fingerprints. 77 Analysis of data allowed extracting structural fragments of molecules responsible for toxicity: nitriles, alyl (thio)phosphates and thicarbonyl. The presented DL architecture for acute oral toxicity could be used for prediction and exploration of other toxicity or property endpoints of chemical compounds. Moreover, by using the ability to learn automatically from DL, it was also possible to create fragments from information about atoms and bonds and then identify their potential toxicity. Researchers from Peking University, Center of Quantitative Biology and Molecular Design Laboratory, made these DL models available at a website. 78 The issue of oral toxicity prediction was also discussed in. 79 The authors showed the superiority of dual-layer hierarchical models (by integration regression and classification QSAR models) over classical base models in the prediction of categories (binary toxic/nontoxic and four hazard categories under the U.S. Environmental Protection Agency [EPA] classification system) and continuous (LD 50 ) endpoints for rat acute oral toxicity. The first layer of the proposed model was based on regression, binary and multiclass ML techniques, and molecular descriptors and fingerprints, while the second one was based on collection of the outputs from the base models. In order to confirm the validity of the adopted learning model, the authors presented calculations for two substances: Furaserenon-X ([(1S,2R,3S,7R,9R,10R,11S,12S)-3,10-dihydroxy-2-(hydroxymethyl)-1,5-dimethyl-4-oxospiro[8-oxatricyclo[7.2.1.0 2,7 ]dodec-5-ene-12,2 0 -oxirane]-11-yl]acetate) and VX (Ethyl({2-bis(propan-2-yl)amino] ethyl}sulfanyl)(methyl)phosphinate). Furaserenon-X is a class of trichothecene mycotoxins. It causes disruption of DNA synthesis by inhibiting protein synthesis, 80 with logLD 50 at the level of À1.95 mmol/kg (EPA, class I). The base regression model predicted that this compound is nontoxic (EPA, class III, logLD 50 = À0.39 mmol/kg), while the hierarchical classification model identified it as toxic (EPA, class I, logLD 50 = À1.14 mmol/kg). VX is an extremely toxic class of organophosphorus compounds belonging to thiophosphonates (EPA, class I, logLD 50 = À4.34 mmol/kg), which potentially blocks the function of acetylcholinesterase. As a consequence, flaccid paralysis of all muscles in the body occurs. The immediate cause of death is asphyxiation caused by paralysis of the diaphragm muscle. 81 In this case, both tested and comparable models were not very accurate, although both correctly predicted toxicity (EPA, class I). It should be noted that the low accuracy of prediction was due to the small amount of training data in such a high toxicity range. However, a smaller error could also be seen here for the hierarchical model (logLD 50 = À1.19 mmol/kg) in comparison with the base model (logLD 50 = À1.08 mmol/kg). It is worth pointing out that the artificial neural network should have the ability to generalize, just like humans, and the low accuracy of prediction obtained indicates that the training data did not include the range of testing data. Thus, the main problem here is the volume of database. Alberga et al. 82 proposed prediction of toxicology endpoints related to the acute oral systemic toxicity as binary classification: nontoxic (LD 50 > 2000 mg/kg) and very toxic (LD 50 < 50 mg/kg), as well as classification according to EPA and GHS (Globally Harmonized System of Classification and Labeling of Chemicals) based on k-nearest neighbors techniques and 19 different fingerprints ( Table 1). The authors concluded that the increasingly accurate methods of predicting acute oral toxicity may replace the necessary animal tests.
Cardiotoxicity is often described with regard to blockade of human ether-à-go-go-related gene (hERG) cardiac potassium channel. 89 Cardiovascular toxicity comprises heart failure due to toxin-induced abnormalities with injury of the muscles, and therefore may reduce blood flow and circulation. It is also the main reason for withdrawal of many drugs from markets globally. Lengthening of the QT interval related with lethal ventricular arrhythmia is responsible for such situations. Since this aspect is very important in drug design (mainly as safety evaluation of drug candidates), in silico methods are described in the literature. 90 Zhang et al. 84 proposed prediction of hERG activity by deep neural networks (optimal form of calculation with three hidden layers) based on 697 molecules data from. 85,91 Based on the results, the authors concluded that the proposed DL could offer effective prediction of hERG toxicity, and as a consequence, have a great potential to aid developing novel drug candidates. Similar observations were described in another paper 86 looking at ML and DL algorithms using fingerprints and principal component analysis (including partition coefficient, molecular weight, H bond acceptors and donors, number of rotatable bond, rings and aromatic rings, as well as molecular fractional polar surface area) as descriptors and a training set of 3991 compounds. Authors compared SVM methods (linear, polynomial, radial), with random forest, and artificial neural network (layer size 100, 200, and 400) for DL. Based on the results, it can be seen that accuracy of hERG-blocker prediction depends on the selection of T A B L E 1 Targets, selected descriptors, and statistics (classification accuracy, sensitivity, specificity) for selected models fingerprints. Better results for ML models were obtained with the use of integer-type fingerprints, while binary-type fingerprints are appropriate for DL. Ryu et al. 87 proposed a step further-model that predicts both hERG-blockers and non-blockers for input compounds (DeepHIT). The criterion indicating the blocking or non-blocking properties of hERG was the value of the half maximal inhibitory concentration (IC 50 ): hERG-blockers had IC 50 < 10 μM, hERG nonblockers had IC 50 ≥ 10 μM. 92 The calculations required a preliminary standardization of the compounds by selection of the largest fragment, removal of explicit hydrogens, ionization, and calculation of stereochemistry. The authors compared six traditional ML algorithms (i.e., k-nearest neighbors, logistic regression, naive Bayes, shallow neural network [simpler configuration, less neural layers], random forest, and SVM) with deep multilayered neural network with molecular descriptor-, molecular fingerprint-, and graph-based feature datasets (Table 1). Their proposals are available at websites. 93, 94 In the case of this cited study, it is worth emphasizing that the authors, based on the trained network, indicated a new novel urotensin II receptor antagonists without hERG-blocking activity obtained from a seed compound of a previously reported UT antagonist (KR-36676) with a strong hERG-blocking activity. Capsule networks also showed excellent performance in the classification of hERG-blockers and non-blockers with prediction accuracies of approximately 92%. 95 This is the first example of using such a technique in drug discovery-related studies. Furthermore, work 88 presented an interesting comparison of ML prediction (linear regression, ridge regression, logistic regression, naïve Bayes, neural network, and random forest) for results regarding 10 drug compounds ( Table 2). The presented model correctly predicted 8 out of 10 compounds with 80% accuracy, 60% sensitivity and 100% specificity, which indicated that this model could be used for virtual screening in drug discovery.
It should be noted that in all cases of these cardiotoxicity predictions, the chemical structure of the compounds played an important role. The analysis indicates that most hERG channel blockers have in their structure a tertiary amine group and aromatic rings. The first fragment has the ability to protonate at physiological pH and plays a significant role in the binding of the channel blocker and the hERG channel. Aromatic rings are associated with π-stacking or hydrophobic interactions with the aromatic rings of amino acids within the hERG channel cavity. 96 Known for their ability to be creative, generative adversarial neural (GAN) networks have also found application in de novo drug design. Based on compound databases such as ChEMBL or ZINC Database, the application of these networks allows the generation of new structures-drug-like compounds which can be treated as potential new drugs with desired properties. 52,97,98 For example drug-like Prykhodko et al. 99 successfully proposed latent vector based generative adversarial network (LatentGAN), combination of autoencoder and Wasserstein GAN, for generation of drug-like compounds (set limited to SMILES of containing only [H, C, N, O, S, Cl, Br] atoms and a total of 50 heavy atoms or less) and target-biased compounds (EGFR, HTR1A and S1PR1 targets, based on ExCAPE-DB). The authors of this paper indicated that the proposed model allows the prediction of compounds according to the planned target, while also indicating that a significant portion of the compounds are new with respect to the training set. Another example of adapting the GAN for drug design is its connection with reinforcement learning (RL), known as Objective-Reinforced Generative Adversarial Networks (ORGAN) 100 or its implementation for inverse-design chemistry (ORGANIC). 101 Based on two drug-likeness indicators: chemical beauty 102 and Lipinski's rule-of-five 37 Aspuru-Guzik group 101 showed that ORGANIC allows to generate molecules (based on SMILES sequences format) which are consistent with a comparable list of FDA-approved drugs in the amount of 148 and 207, for both indicators respectively. Among the substances proposed by the model were very well-known compounds, for example, paracetamol and salicylic acid. There are also T A B L E 2 Prediction results of 10 drug compounds 88

Drug
In vivo results Model results

Toxic Toxic
Chloropromazine Toxic Toxic

Toxic Toxic
Cimetidine

Nontoxic Nontoxic
Sotalol Toxic Nontoxic proposals in the literature for drug design software based on neural networks, including GANs. For example, MolAICal software can be successfully used to generate 3D structural ligands in the 3D pocket of protein targets. 103 The software is based on two modules: fragments of FDA-approved drugs or from the ZINC drug database are used to train the WGAN model, and then the generated fragments are used to grow 3D ligands in the protein pocket. In this approach molecular docking is used for check the affinities between the generated molecules and proteins. It is worth noting that the software supplies the filter rules, for example Lipinski's rule-of-five, synthetic accessibility (SA) and pan-assay interference compounds (PAINS). Moreover, other user-defined rules can be added. The authors indicate that the proposed software can create ligands with 3D structural similarity to the crystalline ligand of GCGR or SARS-CoV-2 M pro and could become a useful tool for drug design. Although in silico techniques are still relatively new, they are becoming increasingly important in drug design. On the one hand, the enable the reduction of animal experiments, which is in line with general scientific trends. On the other hand, these techniques enable an initial assessment of the broadly defined toxicity of a compound before it is synthesized, thus at a very early stage of drug design. The above examples of use of such computational techniques for analyzing quantitative structure-activity relationship, ML and DL, undoubtedly justify the use of these methods in the determination of toxicity in silico, especially when there are no experimental results, and the possibility of AI generalization of data is very helpful.

| Drug mechanism
Another aspect to consider in drug design procedures is predicting the interaction between the drug and the target (enzymes [E], ion channels [IC], nuclear receptors [NR], G protein-coupled receptors [GPCRs], known as gold standard according to Yamanishi et al.). 104 At the same time, such a procedure may facilitate understanding of the drug mechanism of action, pathology of the disease and possible side effects of the drug. 105 In a simplified form, it can be said that the drug binds to the target molecule by formation of temporary bonds and reacts with the target to inhibit its functioning and to avoid certain catalyzed reactions occurring in the body in order to treat diseases. Depending on the type of drug, its molecule interacts directly with the active site of the target to inhibit reaction (competitive inhibitors) or with an allosteric site on the target to change the reaction (allosteric inhibitors). 106 Regardless of the mechanism, assessment of the drug-target interaction (DTI) potential should take into account the structure of both the drug and the target under consideration, together with the possibility of bond formation and reaction. 107 Identification of DTIs is a crucial step in drug discovery. AI techniques are often proposed for the prediction of DTIs thanks to, as mentioned above, the opportunity to use increasingly large databases and the ability of neural networks to generalize. 108 For example, Rayhan et al. 109 proposed connection of two deep, CNNs for DRI prediction: FRnet-Encode and FRnet-Predict. The first of them was used to generate 4096 features for dataset, and the second one to classify and identify probability of interaction with an accuracy of over 97%. This approach of analyzing data using two models is very effective. The authors also proposed new pairs of compounds with a high probability of interaction in all four gold standard datasets (i.e., [E] protein ID hsa:10825/drug: threonine with score: 0.8351, [IC] protein ID hsa: 285242/drug: diazoxide with score: 0.9823, [NR] protein ID hsa: 2099/drug: tazarotene with score: 0.9912, [GPCR] protein ID hsa:9052/drug: isoetharine with score: 0.9013). However, these results were not verified by the authors and are only a hypothesis. An interesting approach was also presented in a study. 110 The authors used DL with convolution on protein sequences to predict DTI. The model proposed by them employed raw protein sequences both for different target protein classes as well as protein lengths. The model, similarly to other cited works, was validated by prediction DTIs from bioassays such as PubChem BioAssays and KinaseSARfari with a high accuracy. Pliakos and Vens 111 presented heterogeneous networks with biclustering trees. They used descriptors based on chemical structure for drugs and descriptors based on the alignment of protein sequences for proteins. This is the most frequently used method, and differences between authors are mainly related to the amount of data of learners, databases, and network topology. As was suggested by the authors, use of tree-ensemble learning models with output space reconstruction allowed obtaining higher prediction results in comparison with traditional models. Moreover, such a solution is known for its scalability, interpretability and inductive setting, which is very important in prediction. Li et al. 112 showed usefulness of combinations of position-specific scoring matrix (including protein secondary structure, protein binding site and prediction of disordered regions) and local phase quantization methods, as well as rotation forest classifier in the prediction of DTIs with average accuracies equal to 89.15%, 86.01%, 71.67%, 82.20% for the four targets, respectively. To confirm the validity of the predictive model developed, the authors tested the algorithm on a commercial drug sulfasalazine ( sulfamoyl]phenyl}diazen-1-yl]benzoic acid) and two target protein sequences: arachidonate 12-lipoxygenase, 12S-type (ALLOX 12) and lipoprotein lipase (LPL). According to the prediction results, sulfasalazine interacts with ALLOX 12 with a possibility score of 0.844, and does not interact with LPL (possibility score = 0.3200). Another approach is the method named DTiGEMS+ based on graph embedding, graph mining, and similarity-based techniques. 113 Authors proposed a heterogeneous network by connecting the known DTI graph with two complementary graphs based on drug-drug and target-target similarities. An interesting approach was to validate the model on unknown data for a group of drugs and targets (enzymes [E], ion channels [IC], nuclear receptors [NR], GPCRs) and to assess their interactions on the basis of experimental data (e.g., from PubMed identifier, PMID or drugs base, DB) not included in the teaching databases (Table 3). The authors did not mention which descriptors were used in the model. It can be assumed that they were similar to those described in their previous work. 114 In this work, random forest model using heterogeneous graph, containing known DTIs with multiple similarities between drugs and multiple similarities between target proteins, was proposed. Chemical structure fingerprints, Gaussian interaction and side-effect profiles were considered as descriptors for drugs, while amino acid sequence profiles of proteins, parameterizations of the mismatch and the spectrum kernels, proximity within the protein-protein interaction network and the Gaussian interaction profile were descriptors for target proteins. The authors also verified the correctness of the model proposed by analyzing new drugs, with 22 out of 25 being correctly identified. Additionally to the models based on heterogeneous graph, a new DL model multimodal deep autoencoder with a similarity network with drugs as nodes and drug-drug similarity values as the weights of edges was proposed by Wang et al. 115 The presented literature review indicates the possibility of predicting interaction between the drug and the target, which allows a conclusion that AI, just as in the prediction of drug properties, is an interesting alternative in the process of drug design, as well as drug ranking. In this second case, Geres et al. 116 demonstrated the applicability of ML, based on proteomics and phosphoproteomics data derived from 48 cell lines, for predicting therapy in cancer treatment, by evaluating of >400 drugs for their antiproliferative efficacy in tumor cells.

| Drug repurposing
Drug repositioning allows finding new uses of existing drugs. [117][118][119] It is also considered as a suitable method for finding drugs for orphan and rare diseases. This procedure reduces the time needed to place a new drug on the market, simultaneously reducing the time and risk of failure because preclinical development and optimization issues can be omitted in a large scale. There are three stages of the drug repurposing strategy: (i) identification of potential molecule; (ii) preclinical tests-mechanistic assessment of the drug effect; (iii) phase II clinical trials-evaluation of efficacy. 118 In the first of these steps, computer-based calculations can be used successfully. The above-discussed DTI prediction is also beneficial for searching novel uses of existing drugs. Based on the increasing access to medical databases, [120][121][122][123][124][125] it is possible to analyze drugs in the context of their new uses by means of, that is ligand-based approaches. Another excellent review concerning drug databases which could be helpful in DTI with their advantages and disadvantages has been published. 126 The basis of this method is the assumption that similar compounds have similar biological properties, thus, in the case of drug design, it could be concluded that similar ligands have similar activities in respect to similar targets. 127 The importance of this issue is underlined by the increasing volume of publications in this area. Patrick et al. 128 proposed a word-embedding-based ML approach for drug repurposing for nine cutaneous diseases (including psoriasis, atopic dermatitis, and alopecia areata) and eight other immune-mediated diseases. Based on the validation results, authors concluded that model could predict new drugs for psoriasis, with the highest prediction scores for budesonide (a corticosteroid, currently used to treat asthma and inflammatory bowel disease) and hydroxychloroquine (an antimalarial drug that is also used to treat lupus and rheumatoid arthritis). Anderson et al. 129 used Bayesian ML models for drug repurposing in chordoma. Of the available data, the mTOR inhibitor AZD2014 was indicated as the most potent against chordoma cell lines (IC 50 0.35 μM U-CH1 and 0.61 μM U-CH2). Moreover, two currently FDA-approved drugs, afatinib and palbociclib (EGFR and CDK4/6 inhibitors, respectively) demonstrated synergy in vitro (CI 50 = 0.43), alongside AZD2014 and afatanib which also showed synergy (CI 50 = 0.41) against chordoma cells in vitro. As shown in the paper, 130 ML could be applied for prediction of a new therapeutic system for drugs. The authors proposed a model based on decision trees (Bayesian tree-structured) with several molecular features as descriptors. They showed that both 2D and 3D chemical similarity should be used. It is worth emphasizing here that often authors use only one type of molecular similarity, but according to the conclusions in the cited work, it is better to use both at the same time, because each of them transfers notation of different components to the model. Moreover, the authors used new types of features: drug-gene phenotype similarity and the gene-gene expression profile similarity across different tissues. Such a solution allowed them to obtain 78% precision in the case of top 50 predictions, and 48.2% for 500 predictions. Using the constructed neural network, the authors concluded that the antipsychotic drug fluphenazine is a highly probable drug targeting the PRKDC gene which is a potential target for treatment of ATM-deficient cancer. 131 Thus, fluphenazine used for schizophrenia treatment could be considered in cancer treatment. ML and DL approaches for cancer drug repurposing are also discussed in detail in another paper. 132 It is worth noting that cases discussed here do not exhaust the number of uses of AI in the analysis of drugs. For example, there is the aspect related to drug metabolite prediction. 133 Any additional information about a drug undoubtedly provides a basis for even better understanding of the potential ingredients of therapeutics and provides excellent support for decisions regarding continuation of research into the development of new active compounds for medications.

| CONCLUSION
In 2019, an article in Nature Machine Intelligence was published in which the author indicated that "Now AI is backthis time, apparently, for good." 134 The review presented in this article confirms this hypothesis. Progress in computers and computational algorithms has become an opportunity to support medicine. In the area of drug design, the widest applications, as indicated in the literature, are found not only in networks with a very basic architecture such as MLP or RBF, but also and especially in networks with a very complex design such as CNN, capsule or GAN. There is no definite indication which network is the best tool for such design purposes. However, DL solutions are currently the most popular, as they are becoming more and more faithful reflection of complex ways of thinking characterized by human mind. DL allows not only to analyze data, but also finds and determines the characteristics of the observed sets on its own, while becoming an increasingly versatile tool to support the course of drug design. And although the role of a computer cannot be overestimated, in the end there is always a human who makes the final decision. The promise made by AI for the future are better drugs, discovered and delivered faster. It should also be noted that in the case of drug design, basic properties of the molecules, for example, bonding, quantum and physicochemical properties, are not the only aspects to be taken into account. Medicines may have multiple biological targets and effects, and their efficiency depend on several factors such as bioavailability, effect of formulation and administration, as well as individual genetic profiles of patients.