Artificial intelligence for dementia drug discovery and trials optimization

Drug discovery and clinical trial design for dementia have historically been challenging. In part these challenges have arisen from patient heterogeneity, length of disease course, and the tractability of a target for the brain. Applying big data analytics and machine learning tools for drug discovery and utilizing them to inform successful clinical trial design has the potential to accelerate progress. Opportunities arise at multiple stages in the therapy pipeline and the growing availability of large medical data sets opens possibilities for big data analyses to answer key questions in clinical and therapeutic challenges. However, before this goal is reached, several challenges need to be overcome and only a multi‐disciplinary approach can promote data‐driven decision‐making to its full potential. Herein we review the current state of machine learning applications to clinical trial design and drug discovery, while presenting opportunities and recommendations that can break down the barriers to implementation.


INTRODUCTION
There is no cure for any form of dementia, and existing treatment options do little to curb clinical progression.Drug discovery and development processes involve many complex and lengthy steps-from identification and validation of a target in which to test a drug, identification of drug hits from a screening campaign, optimization of the lead hit, to phase I-IV clinical trials.[4] The National Institutes of Health estimates that 80%-90% of research projects fail before ever reaching human trials, whereas 95% of drugs entering human trials fail. 5Drug trials for dementia have been no exception.Over the past few decades, trials looking to modify disease processes within Alzheimer's disease (AD) have shown an even higher attrition rate and are littered with costly failures. 6This has occurred across all stages of the drug-development process, from safety signals seen at phase I, II, and III to an absence of positive primary endpoints seen in all but two recent pivotal trials. 7,8Nevertheless, these clinical trial data sets are incredibly rich and highly characterized and have the potential to inform future trial designs.
There are many potential causes of the high attrition rate in these clinical trials, but the likely primary contributor is cohort heterogeneity.The heterogeneity of neurodegenerative diseases (NDs) within clinical trials predicates differences in an individual's drug response, underlying disease pathology, length of disease course, and the trajectory of the decline, among other factors.When contemplating trial design and patient selection, a patient's stage of disease along with anticipated future decline are key factors for success. 1,6,9With trials lasting only around 18 months in diseases that typically span decades, matching the right subjects to the right compound at the right time is fundamental to clinical trial outcomes and can characterize past failures. 9By refining patient selection and homogenizing a patient's disease proteinopathies, there is an increased likelihood of an efficacious outcome and subsequent regulatory approval, all of which is achieved through reduction in erroneous patient-level responses to a given therapy.
Traditionally drug hunters have preferenced targeted chemical libraries and hypotheses around refined supportive evidence.The growing availability of large multi-omics data sets comprising tissue samples from a broad range of patients, in combination with clinical data, provides a unique opportunity to generate biological hypotheses in the human population for individual-and group-level targeted therapies for all types of dementia. 10 aid these desired outcomes, there is an increasing realization and growing consensus of the potential in using big data analytics and machine learning (ML) tools for drug discovery and optimized clinical trials. 10,11There has been a near exponential growth in the application of these methods being used within this field, as shown by Figure 1.
As part of a series that covers a broad range of aspects on artificial intelligence (AI) for NDs, this review discusses the challenges that need to be overcome before this powerful approach can be fully harnessed in drug discovery and clinical trial optimization, including data standardization, reproducibility between groups, and translatability between experimental models and humans.We provide a multidisciplinary view of current developments in the use of data science and AI at each stage of the drug discovery process: target discovery and drug design, development, and repurposing.We identify three key areas in which the use of data science and AI shows promise: predicting success, patient stratification, and informing clinical trial design.Finally, we provide recommendations on how a coordinated effort from biotechnology companies, academia, regulators, and health care professionals can drive progress.International research communities such as the Deep Dementia Phenotyping (DEMON) Network (www.demondementia.com)provide a platform for innovation to help bridge these scientific fields and utilize focused working groups that best exploit what each area can offer to better inform both clinical trials and drug discovery.

BIG DATA AND MACHINE LEARNING IN TARGET DISCOVERY AND PRIORITIZATION
The challenges in developing novel therapeutics for dementia and other NDs result from the paucity of novel, valid targets.This in turn results from etiological heterogeneity, the multifaceted, often polygenic nature of genetic risk and the complexity of the human brain. 12,13A recent study has found that drug targets with genetic support were twice as likely to improve approval likelihood of a new drug candidate at early-stage clinical trials. 14The discovery of genetic variants associated with the risk for neurological and neuropsychiatric illness provides the opportunity to significantly enhance F I G U R E 1 PubMed citations by year across Alzheimer's disease, dementias utilizing machine learning or artificial intelligence methodologies.
hypothesis-led drug discovery.Outside of NDs, this has been exemplified by high-risk mutations in PCSK9, leading to a specific target for lowering cholesterol being identified.Subsequent identification of individuals with knockout mutations and benign lower low-density lipoprotein (LDL) cholesterol has led to successful results in pivotal trials in the development of both evolocumab and alirocumab as treatment for hyperlipidemia. 15,16A recent study in schizophrenia utilized network analyses on genome-wide association study (GWAS) loci and rare variation data to identify biological pathways and mechanisms in NDs, via a combination of statistical genetic analysis approaches with chemoinformatic databases, and successfully identify potential new drugs and drug targets. 17th recent advances in the data science and ML field in the last decade, both academia and pharmaceutical companies have begun implementing these approaches within their early drug discovery process.The advances in genomic, transcriptomic, and proteomic technologies, combined with the accumulation of patient and disease relevant big data sets, have also promoted analytics and ML approaches for target identification.In an early study, Zhang and colleagues 18 used integrative network-based analysis to interrogate gene expression data from post-mortem brain tissue samples.By rankorder network structures for relevance to late-onset AD pathology, the authors highlighted an immune and microglia-specific module dominated by genes involved in pathogen phagocytosis.Transmembrane immune signalling adaptor TYROBP was also found to be a key regulator of several microglial-specific genes highly expressed in AD brains and a potential therapeutic target for late-onset AD.A genome-wide AD-associated gene was recently identified using a support vector machine (SVM)-based approach that integrated gene expression data with human brain-specific gene network data. 19Further to this, network analysis with peptides and/or mRNA transcripts have already been used to identify additional potential AD targets. 20,21The combination of a high-dimensional genome-wide protein-protein interaction network with a deep learning-based computational framework, has also allowed drug-target prioritization and identification of potential repositionable drugs for AD. 22Although still in its infancy, feature  extraction from published literature using semantic searching or natural language processing (NLP) has started to be explored in target discovery.IBM's Watson was initially used to help identify additional RNA-binding proteins that were subsequently validated by wet lab experiments and has provided some potential novel targets for NDs. 23

BIG DATA AND MACHINE LEARNING IN DRUG DESIGN AND DEVELOPMENT
Virtual screening using ML is an important tool in the drugdevelopment process.It has the capacity to increase the yields of potential drugs by conducting in silico searches over millions of compounds. 24The main ML algorithms that can be used in this context are Bayesian, SVM, supervised learning, dimensionality reduction, artificial neural networks, and ensemble algorithms.The Bayesian learning algorithms include naive Bayes, semi-naive Bayes, as well as Bayesian networks (e.g., hidden Markov modeling) and represent input data as feature vectors to plot them in space with the same dimensionality.SVM algorithms construct an optimal hyperplane that dichotomizes the data points.Supervised learning algorithms include instance-based methods, decision tree algorithms, and distributed networks.Dimensionality reduction algorithms, such as principal component analysis and linear discriminant analysis, are used to reduce the number of variables by mapping data into a lower dimensional space and can be split into feature selection and feature extraction.Artificial neural networks are composed of nodes layered together to process input data in a process known as forward propagation.Typically, forward propagation involves linear transformations and activation functions to process a given information.Deep neural networks are artificial neural networks with many layers and include recurrent neural networks, variational autoencoders, and generative adversarial networks.
Bayesian learning algorithms are some of the most common methodologies utilized in ML analyses.[28] A Bayesian ML virtual screen of a large library of U.S. Food and Drug Administration (FDA)-approved drugs and clinical candidates also successfully identified several Gylcogen synthase kinase-3β (GSK3β) inhibitors. 29Most recently, Google DeepMind has revolutionized the field with AlphaFold, enabling the prediction of three-dimensional (3D) protein structure from their amino acid sequences, allowing for easier and faster determination of target protein 3D structures, a critical step in drug design. 30e application of ML is not limited only to small molecule-based drug design and development.With monoclonal antibody therapeutics becoming an increasingly popular approach to tackle traditionally difficult targets, ML techniques have also started to be applied to antibody design and optimization.2][33] Recently, a deep neural network model was trained on a relatively small library (about 1 × 10 4 variants) that was based on the sequence alone of the therapeutic antibody trastuzumab to accurately predict antigen specificity. 32Such ML approaches could allow in silico testing of a large number of antibody sequences and significantly reduce both the financial and time costs of therapeutic antibody lead optimization.

BIG DATA AND MACHINE LEARNING IN DRUG REPURPOSING
Big data and ML have also started to play a useful role in drugrepurposing opportunities.A network-based ML algorithm has been used to identify several existing drugs of potential use for vascular dementia, 34 and a disease network model has been proposed to repurpose drugs for use across dementias. 35A study that explored various ML approaches/algorithms indicated that random forest was the best model for the prediction of AD drugs and targets. 36More recently, Rodriguez et al. 37 developed an ML framework to evaluate the association between AD severity in neuropathology (determined by Braak stage) and a molecular mechanism based on gene expression data from human neuronal cell culture models.Although still requiring further validation within in vitro and in vivo efficacy models, the authors suggest that this offers a potential method for nominating candidates for drug repurposing.The potential utilization of big data analytics and ML tools with real world data in drug discovery and development also comes with many challenges, such as nonstandard data, reproducibility between labs, and translatability between experiment models and humans.

MACHINE LEARNING IN PREDICTING SUCCESS THROUGH DEVELOPMENT
The termination or withdrawal of a drug candidate has been due primarily due to two factors: lack of efficacy and unfavorable toxicity properties.Although some drug-likeness measures, such as Lipinski's Rule of 5, 38 Veber's Rule, 39 and Ghose's Rule, 40 have proved a useful guide for filtering out toxic compounds, their predictions are very conservative.Using an ML-based approach to analyze existing clinical trial data may prove to be a more data-driven approach to solve this problem and help design and progress drugs with less toxicity.Using a set of FDA-approved drugs and drugs that failed in clinical trials due to toxicity, a random forest model has been developed to combine both chemical properties and drug-likeness measurements with target properties (such as tissue-specific expression levels and network connectivity) to predict the likelihood of toxic events in independent test sets. 41Currently prediction models focus primarily on small molecular drugs, but with the development and curation of both experimental and clinical data from other modalities, such as antibodies and biological therapeutics, the same big data and ML-driven approaches could contribute to their preclinical development.
In terms of clinical trials failures due to lack of efficacy, poor translation between preclinical models and humans is one plausible cause.Some studies have begun using ML approaches to bridge the translational gap across species and are attempting to generate humanized computational models from animal models. 42Most studies use genomic and transcriptomic data from population studies and focus on cross-species pairs as well as genotype-phenotype relationships.
One may hypothesize that using ML-based approaches to interrogate clinical trial data (especially individual level clinical data) along with preclinical experimental data, could shine a light on the translatability of experimental models and the development of computational models to predict the likelihood of success in clinical trials from animal experiment data.

DATA SCIENCE IN PATIENT STRATIFICATION
NDs have distinct clinical signs and symptoms that define their diagnostic criteria.NDs, however, share a few crucial aspects.All NDs are characterized by aggregation of a specific protein product, the selective degeneration of a population of neurons, usual adult onset of disease, and heterogeneous etiology. 43,44Although 10%-20% of patients have a relevant family history, in most cases the condition appears sporadic.The estimated heritability of genetically complex NDs is between 40% and 80%. 12,45rrent classification of most NDs is based on clinical phenotypes, which often does not consider either underlying disease heterogeneity or overlapping disease mechanisms, thus hindering therapy tailoring due to the unknown links between mechanism and phenotype.Perhaps NDs should be considered as several different conditions that may look the same.For example, AD pathology causes dementia but is influenced by many variations in an individual's genetic profile, which can be argued to precipitate disease trajectory and result in subtle pathological variations.This leads to the argument that patient selection and stratification should take these considerations into account to find the patients at the right pathological stage/trajectory of AD.This is inline with the recently suggested probabilistic hypothetical model of AD suggested by Frisoni and colleagues. 46ese shared phenomena raise the possibility that NDs share pathogenic mechanisms.8][49] This has limited the success of effective drug development and clinical trials.There is an urgent need to now move beyond clinical phenotyping to recognize the presence of shared biological pathways across NDs.Stratifying patients into subgroups based on biological factors might help in predicting a patient's response.For example, in amyotrophic lateral sclerosis (ALS) more than 15% of patients have features of frontotemporal dementia (FTD) and detailed testing reveals that ≈50% have cognitive and behavioral changes consistent with FTD. 20Despite these commonalities, clinical phenotypes manifest via divergent pathology through selective neuronal vulnerability.Furthermore, there is a strong shared genetic component. 50,51P, ERRB4, and C9orf72 are known to be associated with AD, ALS, and FTD; yet not all individuals who are at genetic risk of AD or FTD-ALS go on to develop the disease.
Personalized medicine approaches are more likely to result in effective treatments if they target the underlying causes directly.Such personalized approaches are exemplified by gene therapy.The benefit of using genetic information has been demonstrated clearly in oncology with breast cancer, that is, herceptin for women with HER-2-positive breast cancer, ovarian cancer, and colorectal cancer. 52,53For example, AD pathology causes dementia but is influenced by many variations in an individual's genetic profile, which might be useful in predicting disease trajectories and understanding the subtle disease variations. 54ny genetic and multi-omic data sets of different NDs have been generated, and the availability of many new analytical tools are now for the first time allowing the combination of these resources to inform clinical trial design. 55,56The identification of biologically meaningful clusters using genetic or multi-omic data might allow for better stratification and patient selection with more targeted treatments due to a greater understanding of disease mechanisms. 57signing trials aimed at those expected to respond has the potential to be beneficial for all stakeholders.However, any ML-based approaches to screening/trial design would only be one step in the larger screening process to enable better clinical trials and should be implemented only while in discussion with the relevant regulatory authorities.As with all trial design, there is a balance to be found between the studied population (inclusion/exclusion criteria) and the target label designation.In relation to when a smaller population may be a barrier for FDA approval, the recently approved QALSODY TM (tofersen) 58 for ALS patients carrying a superoxide dismutase 1 mutation provides a good example of successfully targeting a selected subpopulation (in this case ≈2% ALS population). 59Specifically, across the AD continuum the progressive pathology and differential rates of decline in biomarkers 60 necessitates a more nuanced approach to clinical trial design across all phases of drug development.Adaptive design is one avenue that has shown real promise when applied in phase II clinical trials.A recent application of ML methodologies for optimum dose selection allowed a quicker decision-making process in the development of Lecanemab.Subjects in the phase II study were assigned to the most likely beneficial dose using a Bayesian adaptive approach resulting in more subjects exposed to the most likely beneficial dose. 61This shortened timelines and allowed for a greater sample size in the most efficacious drug group.Deploying similar methodologies within concordant scenarios has the power to benefit subjects and researchers alike.

CLINICAL TRIAL DESIGN AND MACHINE LEARNING
As described earlier, disease and pathology comorbidities are common within the general elderly population. 62Image analysis of these comorbidities lends itself to ML methodologies, and one example of this is the Subtype and Stage Inference (SuStaIn) algorithm. 63This data driven (unsupervised) approach was designed to be able to identify any potential subgroups or progression patterns within cross-sectional databases across dementia types.This has been used within AD in the analysis of tau positron emission tomography (PET) images to uncover four distinct trajectories of deposition. 64A recent study has also showing three optimal distinct amyloid accumulation subtypes, with each showing separate risk factors known to influence AD progression. 65is further emphasizes the heterogeneity of AD, something that is common even within clinical trial cohorts aimed at a singular disease or process.Despite this, trials for diseases such as AD have populations that often display heterogeneous biomarker profiles, which suggests the presence of multiple stages or subtypes of disease.
Disentangling the heterogeneity of NDs such as AD is critical to get the right disease-modifying drugs to the right patients, at the right time-including during clinical trials.Recent evidence has shown indications of AD heterogeneity through a conceptual framework referred to as the ATX(N) continuum, which categorizes individuals using biomarkers that chart core AD pathophysiological features, namely, amyloid beta Aβ (A), tau (T), neurodegeneration (N), and where X represents additional candidate biomarkers such as neuroimmune dysregulation, synaptic dysfunction, and blood-brain barrier alterations. 66,67cent work from multiple groups proposes distinct subtypes of AD and were yielded from ML methodologies on RNA datapoints. 68,69Further evidence comes from a recent review of biological heterogeneity in AD, 62 which concluded that there are three distinct drivers of heterogeneity including risk factors, protective factors, and concomitant non-AD pathology, all of which need to be addressed within clinical trial populations to better address the underlying disease process.
Further to this, prior phase trials can yield important information that informs a go or no-go decision.As outlined in Wessels et al., 70 prior cognitive data as well as biomarker outputs can fundamentally drive future development.However, endpoints are sometimes underutilized or hypotheses poorly designed, leading to a discordance in findings across each phase of development.ML can play a key role in improving these decisions, trial design, and boosting the power of trials.2][73] Enriching cohorts with subjects who are at the correct stage for each individual compound can be achieved by fundamentally categorizing based upon biomarkers that relate to homogenous staging and prospective endpoint declines.Many markers now exist for accurate classification of subjects within the ATX(N) criteria for AD, nearly all of which are already captured within clinical trials.Similar criteria are now being implemented with other NDs with stratification by neurofilament light (NfL) allowing for clinical subtype dichotomization for PD, 74 which also holds promise for FTD, ALS, and dementia with Lewy bodies (DLB).
With regulatory approaches aligning with this narrative, part of the recent recommendations from the FDA 75 looked at three key areas to improve and enrich clinical trials: strategies to decrease variability, prognostic enrichment strategies, and predictive enrichment strategies.It is important to note that minimizing variability by excluding subjects who had large changes to patient-reported outcome measures at baseline, in essence placebo responders, is fundamental to this.However, instead of incorporating long lead times as well as additional endpoints and analyses, modeling progression based upon disease-specific biomarkers and sub-setting the analysis a priori can alleviate these operational constraints.Further to this, as biofluid markers become more accessible and cost-effective in larger trials, algorithms that can predict biomarker status become particularly useful in increasing screening success rates and could lead to quicker trials and speedier decisions on compound efficacy.
Given that these problems also exist outside of the common NDs, considering these other heterogenous areas for ways to improve clinical trial designs can also suggest possible benefits.The rare autosomal dominant mitochondrial disease Friedreich's ataxia has a number of concordant issues with which to contend and thus utilizing a single endpoint for trial outcomes is not ideal.It has instead been suggested to stratify by subtype to reduce the number of participants needed, match a subgroup to candidate compounds for yielding better results, and promote international collaboration as well as remote assessments and screenings to increase patient-selection speed. 76other example of enrichment comes from recent clinical trials with PD.When targeting people with early motor dysfunction, participants without a deficit in dopamine levels (as measured by singlephoton emission computed tomography [SPECT]) can be excluded to enrich clinical trial populations with patients with idiopathic PD.This can improve the statistical power by excluding subjects who are unlikely to progress clinically.Reduced binding on this imaging modality has been shown to predict faster decline on the Unified Parkinson's Disease Rating Scale (UPDRS) parts II (activities of daily living) and III (motor examination). 13The European Medicines Agency issued a full qualification opinion for the use of dopamine transporter as an enrichment biomarker in PD trials targeting subjects with early motor symptoms, indicating the applicability and utilization of these methodologies to clinical trials. 77

LIMITATIONS OF ARTIFICIAL INTELLIGENCE FOR DRUG DISCOVERY
Several limitations are hampering progress in the application of AI to drug discovery.ML approaches are all dependent on the availability of high-quality, representative, unbiased, and rich labeled data that adequately capture the problem parameters required.For many drug targets there is now a huge amount of relevant chemical and biological information available, although much of these data are not machine readable or meaningfully combined.Expert curation, data wrangling, and domain expertise are needed to integrate data from multiple sources in an optimal way.Commercial considerations have been limited historically to the sharing of carefully guarded proprietary drug development data by pharmaceutical companies and publishers, 78 thus restricting data integration and progress.However, open science approaches to drug development have been trialed although remain rare, for example, the development of antimalarials by Eli Lilly using a crowdsourcing platform for compound screening. 79t all ML methods are equally explainable or practical to implement.Although ML methods such as SVMs and random forest have long been used in drug discovery, deep neural networks have recently risen to prominence due to their flexible architectures and ability to model complex interactions.However, they have also drawn criticism for their need for huge amounts of high-quality training data, lack of explainability, and hidden "black box" structure, raising concerns about their appropriateness in medicine development. 80Furthermore, standard ML methods provide predictions and model associations without necessarily giving biologically grounded causal insights.Causal ML approaches are, therefore, being developed in an attempt to bridge the gap between prediction and explanation, although are not yet widely adopted. 81Finally, the more complicated the ML model, the greater the degree of expertise required for their development and optimization.Therefore, requirement for expert users remains another important limitation.In short, ML is not the magical panacea that it is sometimes portrayed to be.Taken together these limitations remind us that the effective application of ML to drug discovery remains heavily reliant upon human insight and expertise.

A biotech industry perspective
One of the main bottlenecks in applying big data analytics and ML to dementia drug discovery, or any drug discovery program, is the paucity of high-quality, clinically relevant, and well-annotated data sets.In order to maximize the impact big data and ML can bring to drug discovery and eventually the patient, there is a need for sharing data in an open, non-competitive manner with multiple parties in the health care ecosystem.Some initiatives have started to make progress on this.TransCelerate from Biopharma Inc. enables the sharing of de-identified clinical data collected historically in the control arms of clinical trials with the aim of providing a better understanding of disease, improved patient stratification, and the generation of synthetic control arms. 82

An academic perspective
Given their complexity and portrayal in popular culture, AI applications are particularly susceptible to public mistrust and misinterpretation.
Academia offers an independent, rigorous, and objective pathway to assess the validity and reliability of AI in health care and clinical research.We argue that this is particularly important when applied to drug discovery and clinical trials in dementia, where the stakes of finding novel therapeutics are considerable and successful uptake and trust in breakthroughs are essential.
Primary activities in the academic sphere relate to development initiatives including the latest biomarkers and analysis methods. 83,84en combined, these can result in instrumental studies that redefine our understanding of disease progression and the underlying biology. 47,64,85Such discoveries promise high impact on clinical trials through enrichment and screening processes, as well as patient management in health care.However, they rely on collaborative efforts with biotech (for trials data access), regulators (compliance for implementation in clinical trials), and health care (for data and patient access).Large collaborative networks like the DEMON Network are vital to this, as they span all three sectors.
Taken literally, part of academia's primary purpose is to be an "academy" to train the future workforce to meet the needs of society.Training a data science literate workforce is urgently needed to meet the increasing move toward data-driven translational precision medicine. 86,87Academia also has the flexibility to take greater risks, due to having fewer constraints or less influence from external market forces.Once breakthroughs are made, academic-industry partnerships are crucial to implement and translate the innovation into real world settings.

A health care perspective on utilization of these methodologies
The intersection of data science and health care has two key roles in drug development for dementia.First, data science can contribute to data-driven clinical decision support systems (CDSSs), which are essential because the multitude of data available to health care practitioners defies qualitative assessment. 88Second, dementia trials are moving increasingly from symptomatic trials toward disease-modifying secondary prevention, 88 and in the future, perhaps even within primary prevention trials such as AHEAD 3-45. 89Both will ultimately require linking with health care data to facilitate/support model training (e.g., CDSSs) and trial recruitment, which may be done primarily via electronic medical records (eMRs).This should be a focus in the near future for assisting with maximizing the information obtained in clinical studies as well as trial accuracy and success.AI will help clarify and harmonize data entry, analysis, and interpretation, which is required for analysis of eMRs outside of health care settings.
Finally, as many of the workforce are employed in both health care and academia, a close relationship will be of benefit.Clinical academics are crucial to ensuring innovations are evidence-based, but also have a realistic chance of being successfully adopted in practice and essential in providing a perspective throughout the clinical trials pipeline.Current regulatory papers lag the wider usage of ML techniques, leading to regulatory uncertainty with risks for stakeholders and companies looking to implement these methodologies.5][96] However, without regulatory guidelines in place there will be inherent hesitancy from industry partners to implement these paradigms.This is largely because without prior knowledge of the levels of sensitivity and specificity in these new ML techniques, using GMLP and data governance principles (among others) that require approval, no drug development program would include these at risk.Nevertheless, the regulatory outlook from both the FDA and EMA is a positive forward-looking perspective.The appli-cation of AI/ML to clinical trials methodology has already shown promise, but compliance guidance from these agencies is needed before ML techniques can be deployed in drug development.

CONCLUSION AND FUTURE DIRECTIONS
There is ample scope for ML techniques to yield significant contributions across drug discovery and clinical trial landscapes.Key to progressing as a field is collaboration between all stakeholders.The prior disappointing outcomes from the BACEi trials (e.g., [97][98][99][100] ) have shown the benefits of working across companies, academia, and health care.This helped to further progress our understanding of compounds, albeit within a setting that was in this instance to uncover the adverse events occurring across these compounds. 98,99,101This framework of collaboration was made possible by outside efforts from the Alzheimer's Association, among other stakeholders, and demonstrates the benefits of larger companies working together and sharing data.across compound development.In conclusion, there are vast and varied opportunities for data science and ML to assist drug identification, improve trial design, and increase chances of compound success, all of which are yet to be fully utilized.

RESEARCH IN CONTEXT 1 .
Narrative Review: The authors reviewed the literature using traditional (e.g., PubMed, Google Scholar) sources, meeting abstracts, regulatory guidance documents, and online articles.The authors have sought to have a broad review across many neurodegenerative disorders, with a particular focus on dementias and, in particular, Alzheimer's disease (AD).

2 .
Interpretation: Our review suggests that although artificial intelligence (AI) and machine learning (ML) methods are increasing in their usage, with positive outcomes and findings, further application to drug development and clinical trials can be improved with more widespread knowledge of how to apply them as well as strong consensus guidelines on their implementation, thus allowing for a reduction of risk associated with new methodologies within clinical development.3. Future Directions: This article proposes suggestions on how greater adoption may be achieved across biotechnology companies, the wider pharmaceutical industry, and academia, as well as general health care settings.

Figure 2
Figure 2 summarizes the opportunities, types of data, and processes through which each partner can assist across the landscape of drug development.Although the number and complexity of big data sets are increasing, they can still be utilized to a greater degree and should be addressed across all stakeholder groups to further our understanding of NDs.With large untapped resources within academia, health care, and industry, collaborative efforts and bottom-up expertise are necessary to further advance and apply ML within drug discovery and compound development.Expertise should be engaged earlier and with more frequency in the drug-development process.Teams and individual experts would yield critical knowledge to better improve discovery, selection, and development of new molecular entities.Engaging experts and research groups at the right stage of drug development can also increase the speed of decision-making, enabling quicker progression of compounds, reducing cost, and mitigating overall risk 93,[90][91][92][93]ul implementation of these techniques into real world clinical trials is the alignment and guidance of regulatory agencies.Both EU and US stakeholders and regulators have issued guidance and ethical considerations around ML and AI paradigms as well as position papers on clinical trial designs implementing ML methodologies.76,[90][91][92][93]Thesepapers have set out approaches but do not provide clear guidelines for wider applicability to clinical trials across all therapeutic areas.EMA) 2025 strategic document93outlines their plans to establish similar guidelines as well as acceptability metrics and success factor for approval of these paradigms.It is notable that they have also pledged to standardize with other regulatory agencies such as the FDA.