MFAE: Multilevel Feature Aggregation Enhanced Drug‐Target Affinity Prediction for Drug Repurposing Against Colorectal Cancer

Colorectal cancer (CRC), a leading cause of cancer‐related deaths globally, demands innovative therapeutic strategies to improve patient outcomes. Drug repurposing, identifying new uses for existing drugs, provides a cost‐effective solution. To this end, this study constructs the first drug‐target affinity dataset specifically for two novel therapeutic targets for CRC, P2X4 and mTOR, and designs a new deep learning‐based multilevel feature aggregation enhanced (MFAE) model. The model implements hierarchical feature extraction and multilevel feature aggregation and enhancement to simulate complex drug‐target interactions. Evaluations using fivefold cross‐validation on the collected CRC dataset showcase MFAE's superior predictive accuracy. Fine‐tuning of the model on external experimental data further enhances its performance, with a concordance index of 0.930, a determination coefficient of 0.782, and a mean squared error of 0.191. Ablation studies further highlight the key role of the group‐wise feature enhancement mechanism and ensemble learning strategy in enhancing the model's performance. Virtual screening of the Food and Drug Administration‐approved drugs identifies Ponatinib and Talazoparib as potential repurposing candidates. Despite limitations in experimental validation, this study establishes an innovative computational framework designed for CRC drug discovery. Overall, this research offers a valuable perspective on leveraging computational approaches for precision oncology.

epithelia more sensitive to mTOR inhibition.The research further showed that the combined inhibition of P2X4 and mTOR can result in significant tumor regression, highlighting the potential of these two targets in CRC treatment.14] However, there are only a few known drugs for these targets.How to quickly find candidate drugs for drug repurposing (DR) for these two new targets is a new challenge for us.
In recent years, computational biology and artificial intelligence technology have provided new ideas for drug development. [15,16]DR has become a popular research direction.It predicts potential interactions between known drugs and newly discovered targets through computer models, thereby achieving the goal of quickly and cost-effectively finding drugs that can be repurposed for new targets. [17,18]However, the accuracy of predicting the affinity between drugs and targets directly affects the success of DR research. [19]At present, relying only on traditional molecular docking and quantitative structure-activity relationship models to predict drug-target affinity (DTA) has many limitations.They cannot fully utilize the hidden information in the chemical structure of drugs and the sequence of targets, making it difficult to make reliable affinity predictions for new targets. [20,21]oreover, deep learning technology has made significant breakthroughs in predicting DTA. [20]Deep learning models can automatically learn the features of drugs and targets from complex molecular structures and protein sequences and perform interactive modeling, thereby improving prediction accuracy. [22,23]Compared with traditional methods, deep learning strategies have stronger scalability and predictive power.Various methods based on graph neural networks have been proposed and applied to DTA prediction. [24,25]GraphDTA [26] represents drugs as graphs and uses graph neural networks to predict DTA, and also tests the predictive performance of its four variants GCNNet, [27] GATNet, [28] GINNet, [29] and GAT-GCN.MgraphDTA [30] is a deep multiscale graph neural network based on chemical intuition.It introduces dense connections in graph neural network (GNN) and constructs a super-deep GNN with 27 graph convolutional layers to capture the local and global structures of compounds simultaneously.FusionDTA [31] applies a new multihead linear attention mechanism to replace the coarse pooling method.This method uses attention weights to aggregate global information, and at the same time transfers learnable information from the teacher model to the student model through knowledge distillation, saving model parameters.NHGNN-DTA [32] uses a hybrid graph containing amino acid and atomic nodes to utilize the structure of drugs and proteins, and constructs a feature generator to achieve adaptive updating of node features, increasing information interaction between drugs and proteins at the graph level.
Although these methods [26][27][28][29][30][31][32] have achieved remarkable results on some standard datasets, there are still some limitations.First, most methods are trained on general drug-target databases, which means they may not handle those uncommon or newly discovered tumor-related proteins well.In addition, some methods may encounter computational efficiency problems when dealing with large-scale or complex datasets.Some methods may also rely too heavily on specific feature representation methods, limiting the generalization ability of the model.To improve the performance of the model and the generalization ability of the model, we seek improvements in several aspects of encoding, model structure, and training strategy.The research of Onan et al. in the field of text processing provides us with valuable insights.They successfully adopted word embedding techniques [33,34] and bidirectional long short-term memory (BiLSTM) [35,36] in text classification to capture the context and sequence information of the text.They also proposed a hierarchical framework with contextual node embedding and dynamic fusion, emphasizing the importance of hierarchical feature learning and fusion, which resonates with our approach to handling complex sequence data. [37]These studies highlight the critical role of context and sequence information, providing direction for our processing of sequence data for drugs and targets.In our DTA prediction task, the drug information is usually represented by a simplified molecular input line entry system (SMILES), while the target information is represented by its amino acid sequence.These works inspired us to utilize similar sequence encoding techniques and model architectures for drug target affinity prediction.In addition, the group-wise feature enhancement (GFE) mechanism [35] provides us with a useful reference for the design of the feature enhancement module.The ensemble techniques [38,39] adopted by Onan et al. in various studies emphasize the importance of feature fusion and model generalization, and provide a reference for the prediction strategy of DTA.
To this end, this study constructed the first DTA dataset specifically for P2X4 and mTOR and designed a new deep learningbased multilevel feature aggregation enhanced (MFAE) model.As illustrated in Figure 1a, our comprehensive experimental workflow commences with the identification of CRC targets.This is followed by the construction and training of the MFAE model.Once fine-tuned, the model can begin on a virtual screening of the Food and Drug Administration (FDA)-approved drugs, ultimately identifying potential lead compounds.This approach not only speeds up drug discovery but also provides calculation strategy preparation for subsequent experimental validations and in-depth pharmacological explorations.MFAE can more accurately learn to express the affinity relationship between drugs and targets by simulating the complex interactions between drugs and targets.In addition, MFAE also adopted an ensemble learning (EL) training strategy, allowing it to better handle those uncommon or newly discovered tumor-related proteins.The computational model constructed in this study provides efficient support for DR, quickly predicting potential targeted therapeutic drugs.This provides a very valuable direction for subsequent experimental verification and pharmacological research.

Design of the MFAE Model
As illustrated in Figure 1b-d, we propose a multilevel feature deep neural network model aimed at end-to-end prediction of drug-target binding affinity.MFAE not only hierarchically extracts features of the drug and target but also simulates their multilevel feature dependencies, thereby deeply exploring the relationship between the drug and the target.
Initiating the process, the model transforms drug and target data, represented as SMILES strings and amino acid sequences, respectively, into numerical feature matrices using word embedding technology.This transformation ensures that the deep semantic nuances of the sequences are aptly captured, setting the stage for advanced feature extraction.To accentuate relevant features and diminish background noise, the MFAE model introduces a GFE mechanism.This mechanism segments the feature map into groups, normalizes them, and emphasizes features with high correlation to overall statistical features.This selective focus ensures the model focuses on the most relevant information, thereby improving its predictive accuracy.
The model leverages the power of the contextual richness of sequence data, blending BiLSTM and convolutional neural network (CNN) architectures.While the BiLSTM captures sequential context from both directions, the subsequent multilayer CNN refines these features, ensuring a comprehensive understanding of the drug and target interactions.Adding a final layer of robustness, the model adopts an EL approach.By training multiple MFAE instances on varied data subsets and averaging their predictions, the model achieves enhanced reliability in its outputs.Specifically, the model integrates BiLSTM with a multilayer CNN network, forming a multilevel feature aggregation framework.During the feature extraction process, the model aggregates not only the current features but also retains the previously hidden features, ensuring the continuity and integrity of the features.This design strategy allows the model to better capture the interaction information between the drug and the target, thereby improving prediction accuracy.
In summary, the MFAE model, with its complex design and innovative methodologies, presents a state-of-the-art solution for DTA prediction, ensuring precision and adaptability in its predictions.In the Experimental Section, we study each module in depth, revealing the underlying technical principles and design ideas in detail.

Dataset Construction
CRC presents as a complex malignancy orchestrated by multiple signaling pathways. [40]Schmitt et al. [12] demonstrated that combined inhibition of P2X4 and mTOR can lead to significant tumor regression and can serve as a key target in CRC treatment.
We further analyze the mechanism by which P2X4 complex regulates apoptosis and tissue repair pathways, [41,42] whereas mTOR emerges as a key in the balance of cell death-proliferation [43] and the intertwined reactive oxygen species production-DNA damage pathways. [44,45]These pipelines not only critically govern cellular growth, proliferation, migration, and metabolism but also underscore the critical role of P2X4 and mTOR in CRC's pathogenesis.To illustrate the vast interplay between CRC, its associated pathways, and the potential therapeutic targets, we constructed a Sankey diagram, depicted in Figure 2a.This diagram reveals the interconnections and shows that P2X4 and mTOR are critical in the CRC domain.
Given their important role in the CRC, we targeted P2X4 and mTOR as our primary targets.To examine the DTA concerning these instances, we set out to collect a specialized dataset.Amino acid sequences for P2X4 and mTOR were retrieved from the Uniprot database, [46] associated with accession numbers Q99571 and P42345, respectively.Leveraging the ChEMBL database, [47] we then compiled a summary of small molecular entities specifically targeting P2X4 and mTOR, accompanied by their affinity data.This work resulted in a dataset of 226 compounds for P2X4 and 5573 compounds for mTOR.To further enrich our research, these datasets were merged, resulting in a comprehensive dataset of 5799 drug-target pairs containing with affinity data.
In-depth examination of CRC datasets revealed varying molecular complexities and interaction strengths.As shown in Figure 2b, the SMILES string length mainly fluctuates in a specific range, indicating that the molecular structures of the compounds have a certain consistency.The mean length is 57.77 and the standard deviation is 17.41.Most lengths range from 25 to 75, with a density peak around 70.In contrast, the distribution of affinity values, as shown in Figure 2c, provides insight into the strength of interactions between a drug and its corresponding target.Most molecules exhibit moderate binding affinity.The mean affinity was 7.01 with a standard deviation of 1.35.Most are distributed between 4 and 10, with a density peak around 7.5.The distribution exhibits a range of affinity values, with specific concentrations suggesting a typical affinity range for many compounds in the dataset.The richness of the dataset is indicated by the broadly distributed affinity landscape.

Performance Comparison
We trained and compared the proposed model with baseline methods on the collected CRC-DTA dataset.We employed a set of evaluation metrics, including mean squared error (MSE), concordance index (CI), determination coefficient (R 2 m ), and Pearson and Spearman correlation coefficients, to comprehensively evaluate the MFAE model's predictive accuracy and ranking capability.For a detailed introduction to these metrics, see the Experimental section.To ensure the fairness and accuracy of the experiments, all the models were used with the same experimental configuration, and their performances were compared using fivefold cross-validation.The comprehensive performance evaluation is shown in Figure 3a, where the bars list the performance of the MFAE model and the baseline model on the CRC DTA dataset and the standard deviation of each performance indicator.Among them, GCNNet, GATNet, GAT-GCN, and GINConvNet are four variants of GraphDTA.The MFAE model shows advantages over other methods in all indicators and obtains better performance values and smaller deviation values.Specifically, our model achieves 0.470 on MSE, 0.727 on R 2 m , and 0.838 on CI.Compared to the second best performing method, our model performance has been greatly improved, with a 6% improvement in MSE and a 6.4% improvement in R 2 m .Experimental results demonstrate that our model has higher predictive power compared with other models.Furthermore, we observed that graph-based models like GraphDTA variants and MgraphDTA could not make accurate predictions.Their MSE exceeded 1.7, indicating a significant deviation between predicted and actual values.Additionally, their Pearson and Spearman correlation coefficients were very low, with the Spearman metric even showing negative values.This might be because these graph-based models could not effectively utilize the interaction information between drugs and targets.Moreover, observing the standard deviations of all these metrics in the fivefold cross-validation, we found that although all models have relatively large standard deviations, our model's standard deviation is relatively smaller.This might be due to the uneven distribution of data, or it might be one of the reasons for the poor performance of other models.Overall, our innovative feature fusion mechanism can better simulate the interactions between CRC drugs and targets, and thus our model has better DTA prediction ability in this task.

Visual Analysis of Predictions
As illustrated in Figure 3b-d, we provide a detailed visual assessment of the prediction capabilities of the FusionDTA, NHGNN-DTA, and MFAE models.In Figure 3b, it is evident that for the MFAE model, most of the data points closely align with the red dashed line, which represents perfect prediction.This indicates that the MFAE model's predictions are generally accurate.In contrast, the FusionDTA and NHGNN-DTA models exhibit a more scattered distribution, suggesting a wider variance in their predictions relative to the actual values.Particularly, some regions in the FusionDTA and NHGNN-DTA scatter plots show significant deviations from the ideal line, highlighting areas where these models may struggle to provide accurate predictions.In addition, the R 2 values and fitting slopes of the regression fitting of the scatter plots of the three models both show that the MFAE model has the best prediction accuracy.The MFAE model obtained a higher R 2 of 0.762 and a fitted slope of 0.775, indicating a higher positive linear relationship between its predictions and actual values, while the other two models had slightly lower slopes.Moving on to Figure 3c, the MFAE model exhibits a sharp and narrow peak, suggesting that most of its predictions are concentrated around a certain value, which aligns closely with the actual values.In contrast, the FusionDTA and NHGNN-DTA models display broader peaks, indicating a more spread-out distribution of predictions.This scattered distribution can be indicative of less consistency in the predictions made by these models.Figure 3d provides insights into the difference between the predicted and actual values.For the MFAE model, most residuals are clustered close to the zero line, signifying that the errors in its predictions are small.The FusionDTA and NHGNN-DTA models, however, have residuals that spread out more widely, with some even exhibiting considerable deviations from zero.This suggests that these models, at times, either significantly overestimate or underestimate the values.In summary, while all three models exhibit their strengths and weaknesses, the MFAE model consistently outperforms the other two across all visual assessments.

Ablation Study
To understand the significance of different components in our proposed model, an ablation study was executed.This study aimed to understand the contributions of the GFE module and the EL approach to the model's DTA prediction performance.Specifically, we examined the impact of the GFE and  EL approach, in addition to our base multilevel feature aggregation modules, on DTA prediction performance.The GFE and EL components were independently removed while retaining the baseline feature aggregation model.Fivefold cross-validation was conducted on these modified architectures and evaluated on the test set using metrics including MSE.As visualized in Figure 4a, ablation of either the GFE or EL resulted in performance declines, with ensemble removal incurring greater losses.This signifies both components impart notable gains.Nonetheless, the based fusion model still outperforms the state-of-the-art method on metrics such as MSE, indicating our innovative feature fusion approach alone can appropriately capture latent information to predict drug-target interactions.

Fine-Tuning the MFAE Model with External Data
To enhance our model's drug screening performance for CRC, we utilized an external dataset for fine-tuning the pretrained complete MFAE model.This dataset contains in vitro binding experimental data of prescription drugs and experimental drugs related to the P2X4 and mTOR targets, sourced from the IUPHAR/BPS Guide to Pharmacology (GtoPdb) database. [48]he GtoPdb database is renowned for its high-quality pharmacological and medicinal chemistry data, capturing detailed ligand-activity-target relationships, including drug targets and their associated prescription and experimental drugs.We ensured the drug-target pairs from this external dataset were distinct and not present in our model's initial training data, to maintain the integrity and fairness of the fine-tuning process.
As shown in Figure 4b, the distribution of SMILES string lengths indicates that most molecules have compact structural representations, majorly clustered below a length of 50.This indicates a fairly uniform molecular complexity in the finetuning dataset.The affinity value distribution, as shown in Figure 4c, underscores that the majority of the molecules exhibit moderate binding affinities, especially concentrated around values of 7-8, which are typical affinity profiles for many drugs.
Upon inputting the SMILES strings of these molecules into the fine-tuned model, the affinity values for the corresponding targets were predicted.A comparative analysis between the model's predictions and the actual experimental values, as shown in Figure 4d, confirmed the model's capability to precisely predict the affinity trends of most molecules and their targets.The data points, densely packed around the diagonal, are indicative of a robust correlation between the predicted and actual values.This correlation is further evidenced by an impressive Pearson coefficient of 0.958 and a Spearman rank correlation of 0.954.Figure 4e showcases a significant superimposition of the predicted and actual value distributions, both of which peak at analogous affinity intervals.This fusion further demonstrates the model's prediction fidelity and its consistency with the actual experimental values post-fine-tuning.Moreover, the model's performance metrics post-fine-tuning on the external dataset were commendable: a CI of 0.930, a R 2 m of 0.782, and an MSE of 0.191, indicating minimal discrepancies between the predicted and actual affinity values.These findings not only demonstrate the enhanced predictive prowess of our model post-fine-tuning but also underscore its potential for practical deployment and broader applicability.

Case Study: Screening Analysis of FDA-Approved Drugs
Utilizing a comprehensive analytical pipeline, depicted in Figure 5a, we integrated a variety of computational tools to identify potential drugs for P2X4 and mTOR.This strategy incorporates a suite of computational tools, ensuring a rigorous and systematic drug screening process.Commencing with predicting the DTAs of FDA-approved drugs through the fine-tuned MFAE model, the workflow progressed through drug-target network visualizations, molecular docking simulations, the absorption, distribution, metabolism, excretion, and toxicity (ADMET) property predictions, and concluded with lead compound optimizations.Each phase was carefully designed to provide critical insights into the therapeutic potential of the evaluated candidates.
Initially, we turned to the DrugBank database, extracting the SMILES information of 2509 FDA-approved small molecule drugs.This dataset served as our virtual screening library for DR work.To predict the potential binding affinities of these drugs with P2X4 and mTOR, we employed the fine-tuned MFAE model.The subsequent predictions allowed us to select the top 20 drugs based on their DTA values.The complex relationships between these top candidates and their respective targets were illustrated through a Sankey diagram, as shown in Figure 5b.Notably, Ponatinib (DB08901) and Talazoparib (DB11760) emerged as frontrunners, displaying the highest binding affinities.These two drugs were then designated for more in-depth analysis, positioning them as our lead molecules.
For a detailed understanding of the molecular interactions between our lead molecules and the targets, we employed the CB-Dock2 tool. [49]Renowned for its capabilities in protein-ligand blind docking, CB-Dock2 integrates cavity detection, docking, and homologous template fitting.The results of this analysis, as showcased in Figure 5c,d, show the key residues supporting the drug-target interactions.Ponatinib and Talazoparib docked in the cavities of mTOR and P2X4 targets and produced 8 and 9 interactions with mTOR and 8 and 5 interactions with P2X4, jointly participating in drug-target binding.
Given the significance of ADMET properties in drug discovery, we employed the SwissADME tool [50] for a comprehensive assessment.Recognized for its ability to predict pharmacokinetics, drug similarity, and medicinal chemistry parameters, the platform plays a key role in the review of lead compounds.As illustrated in Figure 5e, we present a comparative display of two candidate compounds across five crucial ADMET attributes: lipophilicity (influencing drug absorption and distribution), molecular weight (related to the in vivo absorption characteristics of the drug), polarity (determining the solubility of the drug), degree of unsaturation (revealing metabolic stability), and rigidity (reflecting the conformational adaptability of the drug).Points farther from the center in the plot indicate superior performance in that attribute.The analysis largely validated the safety profiles of Ponatinib and Talazoparib.However, Ponatinib's molecular weight and Talazoparib's insaturation metrics demonstrated minor deviations from the ideal ranges, though the majority of metrics remained within acceptable bounds.To further refine our lead compounds, we turned to the ADMETopt2 tool, [51] which specializes in optimizing ADMET properties using scaffold hopping and transformation rules.The optimization strategies, as shown in Figure 5f, mainly revolved around substituting functional groups, specifically those containing the F atom.This strategic change aimed to enhance the drugs' pharmacological properties while maintaining their foundational structural integrity.
In summary, this comprehensive approach, based on a range of computational tools, screened our initial extensive dataset to focus on potential drugs with therapeutic efficacy against P2X4 and mTOR.This study underscores the synergistic potential of computational biology in DR, thus providing a computational strategy for future experimental verification.

Discussion
CRC remains a significant clinical challenge, requiring innovative treatment strategies.In this context, our study introduces a DR approach targeting the novel CRC targets P2X4 and mTOR.The development of a specialized DTA prediction model for these targets aims to offer a more targeted treatment case.Our model demonstrated superior performance on the CRC dataset compared to other models.This might be attributed to its capability to capture complex relationships in DTA prediction.EL, employed by our model, combines the predictions from multiple deep learning models, enhancing prediction accuracy.Deep learning models, in contrast to traditional methods such as molecular docking, can effectively harness the complex inherent features of drugs and targets, and can better exploit the complex features of drugs and targets, showcasing robust learning capabilities for new targets. [52,53]Our model's multilevel feature fusion mechanism offers a detailed understanding of drug-target dependencies, potentially outperforming other popular models.The deep learning approach, with its ability to discern complex interaction patterns and perform deep feature fusion, provides a novel perspective on drug-target interaction mechanisms. [17,22]espite the encouraging outcomes, our model presents certain limitations.The dataset's size, while sufficient for supervised feature extraction, remains relatively limited, directly affecting the model's performance.Future computational efforts should prioritize dataset expansion and the incorporation of more comprehensive biological knowledge, such as drug chemical properties and protein-protein interaction information. [54,55]he model's validation, currently limited to the P2X4 and mTOR targets, should be expanded to more tumor-related targets to better assess its generalization capability.It is also critical to integrate the ADME properties of drugs into the model training, especially considering their role in determining drug efficacy and safety. [56,57]Enhancing model interpretability [25] by identifying key patterns and features that drive predictions remains a valuable direction for future work.
It is worth noting that the current study focuses on the computational modeling aspect without performing experimental validations.This is a common approach in computational biology, where models are first constructed and evaluated on available datasets before labor-and resource-intensive experimental validation. [58,59]Nonetheless, we recognize that experimental validation is an integral future direction to truly assess the model's predictive power.Follow-up studies are required to experimentally validate the top candidate drugs predicted by the model.Useful experimental techniques include high-throughput in vitro binding assays to determine the compound's affinity for the targets, and preclinical animal studies to evaluate the in vivo efficacy and toxicity profiles of promising candidates before clinical trials. [60,61]Only those candidates demonstrating activity in experiments should proceed to clinical trials.Despite current limitations, we believe this computational study offers valuable insights and sets the stage for subsequent experimental endeavors.Despite the inherent constraints of a purely computational approach, we remain confident that our study provides a reasonable computational strategy and reference value for future experimental studies.
This study introduces a novel approach to precision medication computational strategies, specifically targeting CRC, a highincidence malignancy.Given the clinical prevalence of CRC, the importance of developing effective CRC treatment strategies cannot be overemphasized.Our research not only reveals innovative methodologies for its treatment but also serves as a foundational reference for future drug development and clinical applications.The model we proposed has identified a series of potential drug candidates, setting a clear direction for subsequent experimental endeavors.However, it must be noted that the predictions made by our model require careful pharmacological validation before clinical translation.We advocate for enhanced interdisciplinary collaborations between the medical and pharmaceutical sectors to accelerate the transition of these potential therapeutics into clinical trials.Our research offers a fresh computational perspective on CRC treatment, providing valuable insights and methodologies into CRC therapeutics.
In conclusion, the novelty of our work lies not in proposing new machine learning algorithms, but in addressing the specific clinical need of CRC drug discovery by combining and optimizing existing techniques.Through constructing tailored datasets, designing customized models, and providing end-to-end pipelines, we make DR research more targeted and efficient for this disease.

Conclusion
In this study, targeting the emerging therapeutic CRC targets, P2X4 and mTOR, we constructed a specific DTA dataset and designed a multilevel feature deep learning model to predict the binding affinity of drugs to these two targets, aiding DR research.By designing multilevel fusions of drug and target representations to simulate their intricate interactions, our model demonstrated superior affinity prediction performance compared to traditional methods and graph neural networks.This offers the possibility of rapidly and cost-effectively identifying potential known drugs targeting P2X4 and mTOR.Our computational model provides crucial support for the development of new drugs for CRC.Despite certain advancements, the predicted drug-target combinations still require further experimental validation.The current model also needs evaluation of more tumor-related targets to systematically examine its generalization performance.In the future, more biological prior knowledge can be introduced, or inverse design strategies can be employed to design superior models.We look forward to interdisciplinary collaborations with experimental teams, advancing precision oncology research on both computational and experimental fronts.Moreover, our research underscores the potential value of EL and deep learning in drug discovery and repurposing.This study serves as a paradigm for DR and establishes a computational framework for developing targeted treatments for other diseases.We believe that interdisciplinary collaboration and the power of computational science will have a profound impact on precision medicine.In the future, we hope to further refine and expand our model to address more challenges and bring more innovations to the medical field.

Experimental Section
Encoding and Aggregation Module: In DTA prediction, the representation of data is crucial for model performance.For the DTA dataset, the chemical structure of drugs is usually represented by SMILES strings.This representation method uses a series of specific characters to describe the atoms in the molecule and their chemical bond connections.The structure the target protein is represented by its amino acid sequence.To convert these symbolic sequences into a numerical form suitable for machine learning models, we adopted word embedding technology.This technology has proven its effectiveness in various representation learning tasks and has been widely applied. [33,34,37]In this model, we set up a word embedding layer specifically for digitizing drug and target sequences.The encoding process can be represented as Emb p ¼ EmbeddingðTokenizerðSequenceÞÞ where Emb d ∈ R l d Ân is the word embedding representation of the drug, l d is the length of the SMILES sequence, and n is the dimension of the word vector.Similarly, Emb p ∈ R l p Ân is the word vector representation of the protein, where l p is the length of the amino acid sequence.By learning on a pretrained corpus, word embeddings can capture deep semantic information in symbolic sequences.After this step, the original symbolic input is converted into a numerical feature expression matrix, providing a data foundation for subsequent deep network learning.After obtaining the word vector representations of the drug and target, we concatenate them to form a comprehensive feature vector Subsequently, to further extract and optimize features, we set up independent fully connected (FC) layers for the aforementioned embedding vectors.The main purpose of these FC layers is to map the features obtained from the embedding layer to the specific space of the next layer, preparing for subsequent sequence feature extraction, which can be represented as It is worth noting that these three embedding FC layers are independent and do not share weights.For simplicity, the above formula combines these three independent FC layers.
Feature Enhancement Mechanism: When dealing with the complex DTA prediction task, background noise and irrelevant features may negatively impact the prediction accuracy of the model.To more effectively extract and reinforce semantic features related to the task, we introduced a GFE mechanism, inspired by the GFE mechanism used by Aytug Onan [35] in text classification tasks.To adapt to our needs, we made appropriate adjustments to it to handle the one-dimensional feature sequences of drugs and targets.The core idea is to reinforce relevant features using global and local statistical information at each spatial position within the feature group.Specifically, this mechanism identifies and emphasizes local features that are highly correlated with overall statistical features.
First, we divide the feature maps into G groups along the channel dimension and then normalize the input feature maps within each group.This can be represented as where G denotes the number of groups, C represents the number of channels, and P is the number of spatial positions.
To obtain a global statistical representation for each group, we apply average pooling across the spatial dimensions of the scaled feature maps within each group.This results in a summarized vector for each channel within a group.We then compute the summation across the channel dimension of each group, aggregating the information into a single vector.This vector represents the overall average activation for each semantic feature group where T ∈ ℝ GÂP is the vector of group statistics, and t g;p is an element of T. Next, we normalize the input feature maps using the group statistics T t g;p ¼ t g;p À μ g σ g (7)   where μ g and σ g are the mean and standard deviation of features in group g across spatial positions, respectively.The normalized value t g;p is then scaled and shifted element-wise by learned parameters w g and b g where w g and b g are the weight and bias for group g, respectively.
A sigmoid function σ is applied to s g;p to obtain attention scores between 0 and 1 Finally, the input features x g;p are enhanced with the attention scores through element-wise multiplication xg;p ¼ a g;p Â x g;p (10)   This process allows each semantic group to adaptively focus on more relevant features and suppress less useful ones.The enhanced outputs xg;p can then be used in subsequent layers for improved representation learning.
To adapt to the DTA prediction task, we transformed the original spatial group features into one-dimensional semantic features.Then, we applied the aforementioned enhancement mechanism to optimize the information representation of each semantic feature group where F 0 d , F 0 p , and F 0 t are the output of the GFE module.The details of the GFE module are elaborated in the following subsection.
BiLSTM-CNN Block: In the field of deep learning, CNN [62] and BiLSTM [35,36] have been proven to be effective structures for processing sequence data.Considering that the sequence information of proteins and drugs is highly related to its context, such as an amino acid being connected by peptide bonds to the amino acids before and after it, this contextual information is crucial for feature extraction.To capture this context, we introduced multilayer bidirectional LSTM for feature extraction in our model.The design idea of BiLSTM is to use two LSTM networks to propagate information from two directions (i.e., forward and backward) of the sequence.This design is highly consistent with the structural characteristics of proteins and drugs.After processing with BiLSTM, we obtained three output feature representations To simulate the interaction of drugs and targets on the output features of the BiLSTM layer, we used a concatenation operation to further integrate these features, resulting in a comprehensive feature vector.Specifically, we concatenated the output features of BiLSTM and, to ensure the stability of the features, we applied a LayerNorm layer to H d and H p before concatenation.The merged feature representation is Through the above BiLSTM module, we have achieved two feature aggregations.But to further extract internal contextual features, we introduced CNN.Considering that both targets and drugs are represented by sequence information, the contextual features in the domain window are crucial for the impact on the central feature.Therefore, we designed a multilayer CNN network and gradually increased its kernel size to ensure that the receptive field of this layer can capture sufficient contextual information.Specifically, we applied a one-dimensional CNN to the four features H d , H p , H t , and H fusion for further feature extraction, resulting in EL Prediction Module: In the design of deep learning models, the effective fusion of features the generalization ability of the model are key factors.To achieve these two goals, we introduced an EL strategy in the final stage of the model.At the final stage of the model, we first concatenated the aggregated features from the CNN output with the independent features.This connection strategy ensures that the model captures both global and local information.The concatenated features are passed to a FC layer, which aims to transform these features into a single DTA prediction value, as follows EL offers a new strategy to improve the generalization ability of the model, [38,39] with advantages such as improved generalization, reduced prediction variance, and increased computational efficiency.To enhance the generalization ability of the model and reduce variance, we adopted an EL strategy based on the Bagging sampling idea.We performed bootstrap sampling with replacement from the original training set of DTA, generating three Bootstrap sample sets D 1 , D 2 , and D 3 .A multilevel feature fusion model is trained independently on each Bootstrap dataset, producing three prediction models M 1 , M 2 , and M 3 .For a new data sample x, each model generates an affinity prediction value, denoted as M i ðxÞ, where i ¼ 1, 2, 3.The average of the predictions from the three models gives the final ensemble result Training Settings: In the process of training our deep learning model for DTA prediction, we adopted a specific set of configurations to ensure optimal performance and convergence.
We chose the PyTorch deep learning framework as the foundation for implementing our model due to its flexibility and efficiency.Throughout the model's construction, we extensively utilized modules provided by PyTorch, including optimizers, loss functions, and sequence modeling tools such as CNN and recurrent neural network (RNN).To ensure the model's convergence and generalization capabilities, we employed the Adam optimizer combined with the MSELoss loss function for training.The learning rate was set to 1e À 3, and the batch size for each training iteration was 128.We anticipated that the model would achieve satisfactory performance after 200 epochs of training.To prevent overfitting and select the optimal model, we saved the model parameters based on the performance on the validation set at the end of each epoch.
Considering the EL training strategy, for each fold of the fivefold crossvalidation, we trained three separate models.Considering the scale and distribution of the dataset, we adopted a fivefold cross-validation strategy for evaluation, which implies that we needed to train a separate model for each subset.This method randomly divides all samples into five subsets, using one subset as the test set and the rest as the training set in each iteration.This process is repeated five times, and we then compute the average performance over the five tests, providing a more robust evaluation result.Furthermore, the dataset was partitioned into training, validation, and test sets in Baseline Methods: To evaluate the performance of our model, we selected several state-of-the-art models related to the DTA prediction task as baselines.Here is a brief description of these models: 1) GraphDTA: [26] A DTA prediction model based on graph neural networks, exploring four GNN variants including GCNNet, [27] GATNet, [28] GINNet, [29] and GAT-GCN.2) MgraphDTA: [30] A deep multiscale graph neural network model with 27 graph convolution layers, aiming to capture multiscale features of compounds.3) FusionDTA: [31] A model employing multihead linear attention mechanisms and knowledge distillation techniques, aiming to enhance the accuracy of information aggregation and reduce parameters.4) NHGNN-DTA: [32] A model capable of handling both drug and protein structural information, using a hybrid graph and feature generator to enhance information interaction.
These baseline models provide us with a reference standard, enabling us to evaluate our model's performance on the DTA prediction task more objectively.In the subsequent experimental section, we will compare the performance of our model with these baseline methods in detail, offering readers a comprehensive performance evaluation.
Performance Metrics: Evaluation metrics are indispensable tools for assessing the performance of computational models, ensuring that their predictions are in line with empirical observations.In this context, we utilized a comprehensive set of metrics to evaluate the MFAE model's performance.MSE offers a straightforward measure of the average squared difference between predicted and observed affinities, thereby determining the model's precision.The CI quantifies the relative ranking of predicted and actual binding affinities, revealing the model's capability to accurately rank drug-target interactions.The R 2 m captures the proportion of variance in the observed data that the model can explain, reflecting its explanatory power.Furthermore, the Pearson correlation coefficient evaluates the linear relationship between predicted and observed values, while the Spearman' rank correlation coefficient measures the strength and direction of the relationship between them.Collectively, these metrics provide an overall assessment of the model's predictive ability across multiple dimensions.
The equations for these metrics are as follows where y i is the observed value, ŷi is the predicted value, and n is the number of observations.CI ¼ 1 n concordant À n discordant (18)   where n concordant and n discordant are the number of concordant and discordant pairs, respectively.
R 2 m ¼ 1 À P n i¼1 ðy i À ŷiÞ 2 P i¼ 1 n ðy i À yÞ 2 (19)   where y i is the observed value, ŷi is the predicted value, and y is the mean of observed values.
Pearson ¼ covðy,ŷÞ where covðy, ŷÞ is the covariance between observed and predicted values, and σ y and σ ŷ are the standard deviations of observed and predicted values, respectively.
where d i is the difference between the ranks of observed and predicted values for the ith observation.

Figure 1 .
Figure 1.Overview of the proposed experimental flowchart and MFAE model architecture.a) Flowchart of DR against CRC.Based on the CRC targets mTOR and P2X4, the target inhibitors were collected from the ChEMBL database as a CRC dataset, and then the MFAE model was constructed and trained.Then, the fine-tuned MFAE model using in vitro experimental data was used to conduct virtual screening of FDA-approved drugs obtained from the DrugBank database and finally obtain lead compounds.b) Architecture of the MFAE model.Input the SMILES of the drug and the sequence of the target, then use the word segmenter to encode and aggregate, and obtain the enhanced features through the group feature enhancement mechanism, and then pass through the BiLSTM-CNN block, including the second feature aggregation, and finally obtain the DTA output.The model's architecture emphasizes the hierarchical extraction of drug and target features, while also modeling their multilevel feature dependencies.c) Flow of GFE mechanism.The original features are divided into several groups, and each group obtains weight scores through average pooling, normalization, and sigmoid processes to obtain enhanced features.d) Prediction module based on Bagging EL.Based on the sampling thought strategy of Bagging EL, three sample sets are generated by random sampling with replacement from the original training set, and then the predicted value of the MFAE model trained in each sample set is averaged as the integrated prediction result in DTA.

Figure 2 .
Figure 2. Drug-target dataset construction for CRC.a) Sankey diagram of disease-pathway-target interactions in CRC.CRC mainly contains three pathways, among which mTOR acts on two of them, while P2X4 only acts on one of them.The protein structures of P2X4 and mTOR obtained from the Alphafold2 database are on the right side of the Sankey diagram.b) SMILES string length distribution in the CRC dataset, which shows the diversity of molecular complexity.Most of the lengths are distributed between 25 and 75, and a density peak appears around 70. c) Distribution of affinity values in the CRC dataset, with most molecules exhibiting moderate binding affinities.Most are distributed between 4 and 10, with a density peak around 7.5.

Figure 3 .
Figure 3. Performance comparison of MFAE with baseline models.a) Histogram of the performance of the MFAE and baseline models.Each bar representing the model is labeled with a specific index value and standard deviation.Stars are marked above the best-performing bars.b) Scatterplot of the predicted and actual values of the better-performing models.The red dashed line represents the perfect prediction, while the solid red line represents the fitted regression line.c) Density plot of the distribution of predicted and actual values for the better-performing models.d) Residual plots of the predicted and actual values of the better-performing models.The red dashed line indicates that the residual is zero.

Figure 4 .
Figure 4. Results of ablation studies and fine-tuning of the full model using external data.a) Histogram of the performance of the ablation setup.It shows the effect of different components on the DTA prediction performance of the MFAE model.Each bar graph representing an ablation setting is labeled with a specific index value and standard deviation.Stars are marked above the best-performing bars.b) Distribution of SMILES lengths of the external dataset.Most of the SMILES strings are concentrated below length 75.c) Distribution of affinity values for external datasets.Mostly concentrated on values between À10 and 0, with more values of 5 and 8. d) Scatterplot of predicted and actual values after fine-tuning.The dashed red line represents perfect predictions, showing the accuracy of the model in predicting affinity trends.e) Density plot of the distribution of predicted and actual values.It shows an overlap in the distribution of predicted and actual affinity values, indicating that the model's predictions after fine-tuning are accurate and consistent.

Figure 5 .
Figure 5.A case study on screening analysis of FDA-approved drugs against CRC.a) Workflow of the case study analysis.b) Sankey diagram of the drug-target network, with dashed lines highlighting the selected candidates.c) 3D postures from molecular docking.d) 2D docking details, showcasing the interactions of Ponatinib and Talazoparib with mTOR and P2X4 residues, respectively.e) ADMET prediction by SwissADME for the two candidate leads.The colored zone is the suitable physicochemical space for oral bioavailability.f ) Lead optimization suggestions from ADMETopt2.Both drug optimization strategies involve substituting the functional group containing the F atom with other functional groups.
a 7:1:2 ratio.The model's training and testing were conducted on a Linux operating system, with specific hardware configurations being an NVIDIA GeForce RTX A4000 GPU and an Intel(R) Xeon(R) Silver 4210 R CPU@2.40 GHz.With this setup, the training time for each epoch was approximately 2 min.To enhance efficiency, we wrote code that supports GPU parallel computation, allowing different models to be trained simultaneously on the GPU during the same epoch.Taking all factors into account, the entire process from the start of training to completion took approximately 8.5 h.Ablation Experiments: To validate the effectiveness of our model design, especially the role of the group enhancement module and the EL module, we designed a series of ablation experiments.The purpose of ablation experiments is to evaluate the contribution of each component to the overall model performance by sequentially removing key components.We considered the following four model variants for the ablation experiments: 1) Removing the GFE module: In this variant, we retained all other parts of the model and only removed the group statistical information enhancement process.2) Removing the EL module: This variant directly uses the prediction of a single model without averaging.3) Removing both the GFE and EL modules: This is a baseline model that neither includes the GFE module nor the EL. 4) Complete model: This is the full model we proposed, including both the GFE and EL modules.We trained each of the aforementioned four model variants and compared their performance on the fivefold cross-validation of the DTA dataset.To ensure fairness in the experiments, all ablation models used the same hyperparameter configuration and training process.By observing the change in model performance after removing a module, we can quantitatively analyze the contribution of that module to the overall model effect.This ablation experiment design helps us gain a deeper understanding of the importance and function of each module.For instance, if removing a module leads to a significant decline in model performance, it indicates that the module plays a crucial role in the model's performance, and vice versa.