Artificial intelligence for neurodegenerative experimental models

INTRODUCTION: Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS: Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS: Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia


INTRODUCTION
The past decades have seen a steep rise in the availability of quantitative biological data within the context of experimental medicine. 1eclinical experimental models for dementia research are no exception with large amounts of genomic, cellular, and functional phenotyping data generated and released in relation to neurodegenerative diseases. 2As a testing ground for biological hypotheses and novel drugs, these models are of crucial importance to the field.A multitude of studies are published each year, with in vitro work spanning cell culture, [3][4][5] induced pluripotent stem cell (iPSC)-derived cultures, [6][7][8] organotypic slice cultures, 9,10,11 and organoids, [12][13][14][15] while most preclinical research using in vivo model systems focuses on rodents, including transgenic animals, 16 knock-ins, 17,18 exposure models 19,20 and more recently, multi-species models, such as human-mouse chimeras. 21Other model species such as non-human primates offer some advantages over murine models due to their phylogenetic similarity with humans, longer lifespans, and natural presentation of histological, neuroanatomical, or cognitive features of disease pathology. 22t, despite this plethora of models, what stands out is the failure rate of clinical trials for neurodegenerative disease treatments, particularly in Alzheimer's disease (AD). 23This raises questions not only about the biological hypotheses underpinning drug discovery, but also the appropriateness of existing animal models and whether methods used to translate insights from the model to human biology are up to the task. 24provements to clinical translation will require high quality experimental work in robust and valid model systems, in which both experimental screening and validation, 25 as well as improved prediction of clinical effectiveness harnessing artificial intelligence (AI) approaches and machine learning (ML) will be important.In this position paper we discuss AI approaches used in experimental medicine, specifically focusing on approaches used to translate between model systems and human disease biology.Any advanced data analytical approaches, including ML, require robust and reproducible data as input, but equally can contribute to improving reproducibility in experimental research.This review discusses the key challenges of reproducibility, cross-species translation, data curation, and interpretability of AI and ML approaches.We provide recommendations and future directions for driving forward progress in this relatively new field of application.
This review is one of a series of eight articles in a Special Issue on "Artificial Intelligence for Alzheimer's Disease and Related Dementias" published in Alzheimer's & Dementia.Together, this series provides a comprehensive overview of current applications of AI to dementia, and future opportunities for innovation to accelerate research.Each review focuses on a different area of dementia research, including experimental models (this article), drug discovery and trials optimization, 26 genetics and omics, 27 biomarkers, 28 neuroimaging, 29 prevention, 30 applied models and digital health, 31 and methods optimization. 32

Defining reproducibility
Reproducibility is the extent to which the results of a study can be recreated by applying the same analysis code and data used in the original study 33 and can be stratified as computational, empirical (data), and statistical. 34Complementing this, replicability is often discussed in concert with reproducibility and refers to the degree to which a future study employing the same method produces the same scientific conclusions with independent analysis of new data.Both are essential to generate findings that are robust and generalizable and will be discussed collectively as reproducibility.Any advanced data analytical approaches, including ML, will only yield accurate and meaningful results if the underlying data are high quality and reproducible.
Conversely, ML approaches can be used to improve computational aspects of reproducibility.Robust datasets should provide qualitatively similar outputs across analytical approaches and should be generalizable across studies. 35Publishing reproducible experiments will be critical for meta-analysis of datasets across laboratories, will accelerate advancements, and maximize research investments.To address this, the Turing Way project (the-turing-way.netlify.app)has provided open access guidance on project design, communication, collaboration, and ethics of reproducibility in data science.In addition, the FAIR Guiding Principles for scientific data management and stewardship have provided actionable recommendations to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. 36Reproducibility issues can also be inherent to the data type, for example, issues associated with technical noise and bias in single cell genomics data.Tools such as single-cell variational inference (scVI) use deep neural networks and stochastic optimization to account for batch and sensitivity effects when approximating gene expression distributions across cell types. 37Deep learning strategies can also map singlecell datasets onto existing references: single-cell architectural surgery (scArches) 38 allows integration across experimental models (including mapping of disease-affected tissue onto control references).

Reproducibility issues in stem cell technologies
Acquiring high-quality ex vivo neural tissue samples directly from human patients is usually either ethically infeasible or logistically intractable.However, in recent years stem cell technologies such as human iPSCs have allowed brain cell types to be derived from patient biopsies, opening a new era for modeling neurodegenerative diseases.iPSC models have allowed researchers to study disease mechanisms and genotype-phenotype associations in cell type-specific, physiologically relevant human models.iPSCs are as genetically diverse as the donors they are derived from, enabling researchers to study sporadic disease and polygenic risk factors, while simultaneously presenting major challenges in reproducibility.The Human Induced Pluripotent Stem Cells Initiative (HipSci) reported that 5% to 46% of phenotype variation is due to individual genetic background. 39Another source of heterogeneity in iPSC models are somatic mutations that arise from environmental factors, such as UV exposure, and through the reprogramming process. 40Non-genetic contributions to heterogeneity include the differentiation protocols, as well as cell culture and storage conditions.A study across five laboratories using standard- ized protocols on identical iPSC lines found that laboratory-based sources of variation can overpower genotypic effects. 41The development of multiple iPSC-derived cell type co-culture systems and 3D organoids has allowed modeling of neurodegeneration at the tissue and organ level.However, heterogeneity remains a major challenge again due to genetic variability of iPSC lines, in addition to the complex protocols required for multi-cellular models that introduce further layers of non-genetic variabilities.Encouragingly, studies have reproducibly generated human brain organoids exhibiting consistent cell type diversity and developmental trajectories. 42However, consistency remains to be demonstrated for disease-relevant readouts.Challenges to reproducibility may also arise if the origin of the iPSC is patientderived and allogeneic or genetically modified from a control group.
Finally, iPSCs fail to capture gene regulatory signatures caused by environmental factors during a person's lifetime. 43

Challenges to reproducibility in mouse models
Mice represent the most commonly used animal model in neurodegenerative disease research, complementing human and in vitro studies with a relatively quick reproduction time, inbred strains that minimize genetic heterogeneity, and easy commercial availability across multiple 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License strains.For mouse studies the genetic background of the line can have a major influence on reproducibility. 44For example, mice expressing human amyloid precursor protein (hAPP) when backcrossed to different strains exhibit differences in viability, 45 disease course, 46 and neuronal excitability. 47Specific genetic loci have since been identified that modify net amyloid-β (Aβ) accumulation in these lines. 48,49e Jackson laboratory recently crossed the 5xFAD mouse model of AD onto diverse genetic backgrounds to explore the contribution of genetic variation in AD.This approach more closely mirrored variation within human disease and identified marked effects of background-line specific genetic variation on the molecular and behavioral phenotypes of the AD model mice. 50,51Such efforts are beyond the scope of most laboratories and may be prohibitively costly.However, we recommend whole-genome sequencing (WGS) of new genetic mouse lines by the host lab.To improve reproducibility, and to allow the identification and further investigation of key background genetic modifiers of the disease phenotype, ideally sequencing should be repeated after extensive backcrossing by academic laboratories, and if mice are bred to congenic lines by commercial suppliers.Induced and genetically engineered models for Parkinson's disease (PD), AD, frontotemporal dementia (FTD), and amyotrophic lateral sclerosis (ALS) recreate initial biochemical events, such as misfolded protein aggregates, RNA toxicity or repeat expansion mutations, [53][54][55][56][57][58][59] but often fail to reproduce the whole breadth of downstream cellular and phenotypic responses.For AD, effects of drug treatment are poorly predicted by current models, as highlighted by the divergent outcomes presented by different models when treated with the same drug; their efficacy varies by type of intervention and species, with best results achieved for cholinergic/glutaminergic drugs. 60For example, use of transgenic mice 16 and macaques 61 enabled prediction of cognitive and behavioral improvements from administration of donepezil. 62[65] To quantify the extent to which AD models can reflect human AD pathologies, integrated omics platforms for studying the molecular signatures of neurodegenerative diseases in preclinical models and post mortem human brains have proven useful, 66 and led to increased under-standing of disease-specific cellular responses to disease.8][69][70] Drugs targeting N-methyl-D-aspartate (NMDA) and cholinergic receptors provide only symptomatic treatments for patients, 71 and phase II/III clinical failures of anti-Aβ antibodies 72,73 have led to a reevaluation of the Aβ cascade hypothesis. 74However, promising recent results in stage III clinical trials of lecanemab 75 and donanemab 76 have put monoclonal

Quantifying translatability of model species
Aβ antibodies back at center stage.At the same time concerns remain regarding the efficacy and costs of anti-amyloid immunotherapy as well as adverse side effects, most commonly in the form of amyloid-related imaging abnormalities (ARIA). 77-79

Challenges to cross-species translation
Translating a drug or treatment from the bench to the bedside is a considerable challenge.Lack of reproducibility within the same model, or across other models, is a major impediment to translatability.As discussed above, the adoption of rigorous standards should be a priority to make drug discovery more efficient and avoid wasted time and resources. 80Similarly, negative results and replication failures are often not reported even though they would help raise red flags early on and would prevent other researchers from treading down futile paths. 81Perhaps the greatest stumbling block in developing an effective drug for dementia remains our imperfect knowledge of dementia biology.This is compounded by models that do not faithfully reproduce all aspects of a pathology, prompting over-interpretation and over-extrapolation of experimental results.The appropriate choice of an experimental model for the question at hand should involve not only consideration of whether aspects of biology under investigation are being captured (eg, is the neuronal circuitry conserved?To what extent is the gene of interest conserved, expressed and part of the same network?),but also involves practical and ethical considerations.
Model choice will also be affected by the type of research question: A basic science biological experiment, for example, discovery of the mechanism of action of a gene or protein, may require a different setup than a pharmacological analysis-such as the quantification of a drug's bioavailability in the brain.A frequently neglected aspect is the role of sex-related differences in dementia biology and incidence.Given that women are more likely to be affected by AD, while prevalence of PD is substantially higher in men, sex-related biological differences are clearly relevant and need to be investigated rigorously. 82,83However, many rodent experiments are conducted in animals of one sex only (often males) for practical reasons, leading to conceivably biased findings and often overlooking the sex-specific efficacy of drugs. 84

Strategies and positive examples of translation
In thinking about how insights from models of neurodegeneration can be translated more effectively and considering improvements over the years, it stands out that we still lack a model which recapitulates any given neurodegenerative disease in its entirety.To effectively translate between models of disease, experimental design must account for disease-relevant factors such as developmental age, biomarker choice, and appropriateness of model species.There is a critical need for identifying dementia biomarkers present in both preclinical models and patients.An informative and quantifiable biomarker would enable earlier diagnosis, assessment of disease progression, and evaluation of treatment efficacy.Advances have been made in peripheral biomarkers.For example, serum tau protein levels correlate with cognitive impairment in AD and with progression of pathology in a transgenic mouse model of AD, 91 although they lack the accuracy and consistency of imaging-based primary biomarkers.Positron emission tomography (PET) imaging studies using 18 F-fluorodeoxyglucose (FDG) have illustrated consistent patterns in AD patients and various mouse models of AD, 92,93 although there are concerns that rodent models are too small for imaging studies to be translatable.Another class of systems which may prove vital in advancing preclinical models for dementia research are humanized animal models, wherein human genes are introduced into a mouse genome, or human cells are grafted into mouse tissue.A humanized mouse model of ALS expressing human fused in sarcoma (FUS) protein has already proved to better recapitulate the disease in mice, namely, exhibiting midlife-onset progression of motor-neuron degeneration not seen in previous models.The use of the FUS Delta14 mouse has already yielded novel insights into ALS development, demonstrating that neurodegeneration occurs even in the absence of FUS protein accumulation. 94

How evolution impedes (and promotes) translation
Augmenting many of the above challenges is the fact that animal models are highly evolutionarily divergent from humans.Humans last shared a common ancestor with macaques ∼30 million years ago (MYA), with rodents ∼90 MYA, with zebrafish ∼430 MYA, and with fruit flies ∼800 MYA. 95The degree of homology (conserved features) varies across anatomical regions, cell types, pathways, and genes. 96,97r example, while it is well-established that the nervous system is homologous across vertebrates and invertebrates, even rodents and macaques do not possess all brain regions, cell types, and connections present in humans. 98Furthermore, brain structures central to dementia pathology, such as the hippocampus, 99,100 striatum, 101 and prefrontal cortex, 102,103 have undergone extensive anatomical and molecular reorganization, strongly indicating equally drastic changes in function. 98At the level of genes, mice share only a subset of identifiable 1:1 orthologs with humans (around 75%), 103,104 a problem that is further exacerbated in more distant species (Figure 1).This means that biomarkers and cell type markers discovered in one species may not be readily applicable to another.It also implies that the introduction of transgenically humanized animal models may only partly alleviate this issue, as the broader genetic background in which the humanized genes are acting is still drastically different.The degree of evolutionary conservation varies across molecular pathways 105 and cell types (eg, neurons, 106 astrocytes, 107 microglia, 108,109 oligodendrocytes 110 ).
Therefore, systematic and quantitative investigation is first needed to assess a feature's degree of conservation, and thus validity, when using a given species to model human biology, 111 including in the development of novel dementia therapeutics.
At the same time, interspecies variation can be a powerful source of information to fuel ML models, because each species reflects billions of years of natural experiments.State-of-the-art protein folding models such as AlphaFold2 112 and ESMFold 113 use multi-species protein sequences to help accurately predict 3D protein structure.Another application of ML models is to predict the effect of genetic variants by learning the mapping between DNA sequence and functional genomic annotations.Specifically, PrimateAI 114 and its successor PrimateAI-3D 115 predict whether genetic variants observed in humans are likely to be deleterious or benign based on whether the variant is common in non-human primate populations.The underlying premise being that if a variant is tolerated in species closely related to humans, it is more likely to be benign in humans as well.In this way data from non-human species can be a valuable source of information for human clinical research.Other genetic variant to function prediction models including Enformer, 116 Basenji, 117

Existing major initiatives
In order to facilitate systematic and quantitative analyses of crossmodel and -species translatability, we describe a variety of resources that can be applied to experimental modeling of dementia (Table 1).
Available databases include atlases of gene expression and genetic variation in humans and some animal models, as well as phylogenetic resources.However, the rapid increase in knowledge relating to somatic genomic and gene regulatory variation requires an additional level of detail and curation to identify disease-specific determinants.
Limited proteomic information is available for proteins identified as being involved in the pathogenesis of dementia, although this is supported in part by detailed compilations of types of protein posttranslational modifications, many of which are dependent on manual curation.Since protein conformation is directly related to protein function, there is a critically important role for structural biology databases to predict secondary and tertiary structures of disease-associated proteins and their interactors.Recent breakthroughs by models such as AlphaFold2 112 and ESMFold (Evolutionary Scale modeling) 113 have enabled the prediction of near-perfect 3D protein structure prediction at scale across many species.Precomputed predictions for millions of proteins have already been deposited in public databases, such as the AlphaFold Protein Structure Database 112,121 and the ESM Metagenomic Atlas, 113,122 rapidly accelerating a wide variety of biomedical research fields. 123Combining verified structures of biologically relevant proteins with information available in pharmacological and pathway databases is essential for the identification of new potentially druggable targets.Harmonizing these databases will promote more sophisticated design of small molecules and other therapies targeting specific proteins or pathways and thereby expedite drug development.
Validating biomarkers for dementia, whether identified by imaging or biochemical means, brings with it the challenge of determining which are the most accurate and informative predictors of disease progression and/or response to therapeutic intervention.It will thus become increasingly important to establish detailed and accurate databases that can be linked to each other as well as to electronic health records of individuals to advance the prospect of personalized medicine for dementia.

Resource gaps and how to fill them
Validating new models and biomarkers is an essential part of the diagnostic and therapeutic discovery process.However, benchmarks for validation are not always readily identified.When human data are available, were they collected at the same time points, both in terms of age and disease progression?Are they reproducible across labs, and across populations?When using animal and in vitro models, how do you match developmental and aging stages with human progression? 147Are benchmarks reproducible across models, genetic backgrounds, and different laboratories?How can data from in vitro experiments best be linked with model organisms and to human disease?To answer some of these questions, closer integration of the available data on disease pathogenesis needs to be pursued, with metadata on timelines, genetic background and other relevant variables pertaining to models and experiments that are not always systematically and accurately reported.Better patient stratification and quality control of genetic and functional metadata recording would increase the selection accuracy of optimum benchmarks.It would also enhance information determining the choice of a specific model for a given hypothesis to be tested.For example, whether the model accurately recapitulates the neural circuit or the signaling pathway in question will considerably affect the choice of model.Integration of the various existing datasets, coupled with a user-friendly database query mechanism, would be ideal to facilitate the design of high-quality experimental studies relevant to human disease.

Structural equation modeling
Despite widespread adoption and necessity of experimental models in preclinical research, they present significant limitations both in terms of reproducibility of results within models, and translation from models to human patients. 148,149Various computational approaches, including ML, have been implemented to address each of these issues.Here we highlight several approaches that have been used to enhance reproducibility and translation, including those that have yet to be adopted in dementia research.A more comprehensive overview of ML/AI methodology can be found in the methods optimization paper from the same series. 32Mathematical and statistical modeling of experimental models based on prior domain knowledge (eg, structural equation modeling ) is an approach that can be used to support hypothesis generation and testing, provide insight into biological mechanisms, and predict the effects of interventions. 150In contrast to conventional deterministic approaches in SEM, in which fixed, predefined parameters determine predicted outcomes, probabilistic simulation-based modeling allows researchers to operationalize and test the effects of uncertainty at multiple levels within the model system.2][153] Simulation has been used within preclinical mouse models of metabolic disease to predict disease onset and progression and to more accurately estimate the impact of pharmacological interventions, 154,155

(Semi-)supervised ML approaches
Relative to both deterministic SEM and probabilistic simulations, ML approaches deviate even further from reliance on predefined model assumptions and manual parameter assignment. 156Instead, ML generates an in silico model of the experimental system learned primarily, if not entirely, from data.Regarding the issue of reproducibility, supervised and semi-supervised ML has been applied to learn more robust generalizations when mapping experimental inputs to outputs.
Read-across structure activity relationships (RASAR) was trained on hundreds of thousands of animal toxicology experiments to learn the relationship between binarized chemical fingerprints and safety outcome metrics, achieving an average prediction accuracy greater than that of any single animal experiment. 157Other ML approaches have been used to address the issues of translatability by predicting experiment outcomes in humans from matched in vivo or in vitro model data.Efforts such as the systems biology verification (sbv) project's IMPROVER Species Translation Challenge aimed to advance methods for cross-species translation but were met with limited success in part due to insufficient training data. 1580][161] However, this paired interspecies case-control approach is limited by the time it takes to manually curate such datasets.It also presupposes the a priori validity of the animal model when learning model-to-human mappings.
High-quality data with large sample sizes are not always available in human cohorts.Another variant of supervised ML uses transfer learning, which is often a general ML architecture for pretraining a model on a larger dataset that is less specific to your task (eg, histological images from a large cohort of animal models) to learn basic features common to all data of that modality (eg, anatomical borders, cell contours, subcellular features), and then fine-tuning the model with a smaller but more task-specific dataset (eg, disease-associated pathologies in post mortem histological samples).3][164] Transfer learning is also more regularly being applied to omics data.For example, Stumpf et al. (2020) employed this strategy to first train a cell type classifier using single-cell transcriptomic profiles from mouse bone marrow, and then accurately predict human bone marrow cell types. 165

Unsupervised ML approaches
Unlike supervised learning, unsupervised ML aims to learn an in silico model of the data using only the data itself (without the need for labels).
Within the domain of single-cell omics (eg, genomics, transcriptomics, epigenomics, proteomics, and multi-omics) there has been an explosion of such methods.[168][169] Unsupervised ML methods can also be applied to the problem of reproducibility.Despite the large number of cells, the number of donors in a given single-cell dataset is usually quite low (often just a few individuals per study).1][172] Here too, ML can be of great utility.Resources like the scArches database 38 store models previously trained on one or more datasets (eg, an unsupervised dimensionality reduction model trained on a large single-cell dataset of cell lines from controls).Other users can then download these pretrained models and apply them to their own smaller datasets (eg, embedding singlecell data from a cohort from AD and control group cell lines into the same low-dimensional space as the large dataset).In this example of unsupervised transfer learning, models can learn patterns across many datasets, increasing the effective sample size and the likelihood that the results will generalize to new unseen datasets.It also obviates the need for direct access to all of the pretraining datasets, which can be non-trivial to acquire and reprocess.Similar transfer learning approaches have been used to successfully predict the effect of novel drug-induced perturbations in cancer cell lines from previously learned latent embeddings. 173,174Recently, several models trained on a large corpus of scRNA-seq data (eg, single-cell Generative Pre-Trained Transformer [scGPT], 175 single-cell bidirectional encoder representations from transformers [scBERT], 176 Geneformer 177 ), have been put forward as generalist base models to be fine-tuned by users with smaller, more targeted datasets.The goal of such resources is to make transfer learning both more robust and easily accessible to the research community. 38nally, unsupervised learning methods can be flexibly combined with supervised ML, simulations, SEM and/or traditional statistical approaches to form innovative solutions to problems inadequately addressed by any one method.GNNs are particularly adept at utilizing supervised and/or supervised ML architectures and can efficiently handle hierarchical or semi-structured data. 178Specifically, they have been used to predict pathway-specific disease mechanisms, 179 protein function across multiple species, 180 disease-associated disruptions in brain connectivity, 181 AD status, 182,183 rare disease gene targets, 184 and drug response, 185,186 as well as to integrate multi-modal data. 187,188In this way, ML can be used to aid in the design, reproduction, interpretation, and translation of studies in experimental models even prior to investment of extra time and resources in new experiments or clinical trials.

Interpretability and trust in ML approaches
Despite their many advantages, a major hurdle for the widespread adoption of cutting-edge ML approaches is the lack of trust in blackbox predictions, 189 particularly in healthcare environments where there are concerns of patient safety and privacy. 190,191This lack of trust is not entirely unfounded, as ML algorithms exploit patterns in data, even if they are not relevant to the problem of interest. 192To minimize this risk, there is increasing focus on making models more easily interpretable, 193 less biased, 194 and less susceptible to adversarial attacks. 195Interpretability is a particularly difficult challenge as improvement in this domain is often (though certainly not always) accompanied by decreased predictive performance. 196Nevertheless, advances continue to be made by way of text-based explanations, visualizations, explanations by example or simplification, and feature relevance. 194,197These techniques have increasingly been applied to biomedical sciences and healthcare, 178,198,199 such as for drug discovery 200,201 and prediction of drug-drug interactions. 202Advances have particularly been reported in the domain of medical imaging analysis, 203 employing techniques such as visual attention, 204 saliency maps, 205 and SHapley Additive exPlanations (SHAP) 206 for dementia diagnosis based on neuroimaging data.Overall, the adoption of explainable and interpretable AI for dementia-related applications, however, remains scarce to date, leaving ample opportunities for progress.

Future applications
Recent advances in AI, driven by composite deep learning models with near human-like intelligence, have the potential to change the landscape of neurodegenerative research in the future.The success of these approaches often relies on large-scale, high-dimensional, uniform datasets, which are required for training complex algorithms.For experimental researchers, generating such datasets is both costly and time consuming.]37,38 Focusing on developing methods that do not rely on high-dimensional uniform data will ensure experimental research into neurodegenerative disease advances alongside AI.
Large-scale population cohorts are likely to facilitate the development of massive uniform datasets that lend themselves to application of AI approaches.UK Biobank has collected genetic information and deep phenotyping data on half a million individuals in the UK. 207In the context of neurodegeneration, there is an important argument to facilitate brain donation from UK Biobank participants in the future, so that phenotypic data can be linked to neuropathological measures and genetic variation.Complementing this, WGS data collected to screen for genetic disorders as a part of the Newborn Genomes Programme may be instrumental in accelerating the diagnostic process of infants born with rare genetic conditions. 208Automated AI analysis pipelines could streamline the process of detecting rare genetic variants or phenotypic associations.For example, pathogenic variants could be filtered and ranked using deep phenotype integration based on natural language processing of the medical literature.In the context of dementia, the detection of variants known to increase risk of AD could be used as a proxy for testing family members.In the long run, ML could be used for evaluating genotype-phenotype correlations, 209 biomarker identification, 210 to predict individual disease risk and gene function. 211This improved knowledge of disease biology could then be experimentally validated in model systems to develop better diagnostics and therapeutics.
Generative large language models (LLMs) are generalist AI models trained on a massive corpus of text to achieve convincing natural language capabilities with an extensive breadth of knowledge.
Open-access implementations of LLMs, including OpenAI's chatGPT, Google's Bard, Bing, or Meta's LLaMA, have recently gained much public interest.Iterations of these models are rapidly evolving, even as they continue to be applied to a wide variety of real-world problems, including biology 212 and medicine. 213,214Several examples that have been specifically trained to synthesize, mine, or infer biomedical knowledge are Flan-PaLM/Med-PaLM (instruction fine-tuned/ medical Pathways Language Models), 215 BiomedGPT (Biomedical Generative Pretrained Transformer), 216 PubMedGPT/BioMedLM, 217 GeneGPT, 218 BioGPT, 219 PubMedBERT, 220 BioLinkBERT, 221 Galactica, 222 and BioMegatron. 223These approaches have even been adapted for non-language based biological data (eg, scRNA-seq). 175,176While static versions of LLMs trained on a snapshot of data from a particular time point are prone to hallucinations (ie, providing real-sounding but objectively false answers), this can be partly ameliorated through the addition of internet-search capabilities.Open-source projects like AutoGPT 224 seek to extend this even further by forcing the model to query itself in order to identify what information it is currently lacking to answer the user's question, enabling a semi-automated loop of knowledge gathering and knowledge synthesis.While there are plenty of remaining challenges to address, LLMs are uniquely positioned to offer human-understandable justifications for their reasoning by querying them with natural language, just as one would with another human.For example, one may ask an LLM to predict whether a particular drug will have a side effect of motor impairment in mice, whether this side effect will also occur in humans, and to provide well-cited justifications for its reasoning.In combination with proper validation, human oversight, and ethical implementation, LLMs are likely to open entirely new avenues of biomedical research and healthcare at scale.

Key recommendations
To improve the quality and scope of the application of AI to experimental models of neurodegenerative diseases and overcome major existing challenges (Figure 2), we make four key recommendations: Accounting for species divergence through evolution: Inherent differences in biology between species, some driven by millions of years of evolution, complicate translation of biological insights from animal models to human disease.We recommend using information on evolutionary distances in combination with transfer learning or autoencoder approaches to improve cross-species translation.
Enhancing interpretability and transparency of AI/ML approaches: As with applications of AI and ML more generally, there is a risk for opacity and distrust in the methods, especially where clinical data are concerned.A focus on addressing these issues by adapting existing approaches and continued research advances in this domain are needed to increase trust and model interpretability.

CONCLUSIONS
Animal models are an important tool for assessing mechanisms of neurodegenerative disease in complex in vivo settings and prioritizing therapeutic approaches.However, promising drugs in animal models have repeatedly shown high failure rates in human clinical trials. Here

RESEARCH IN CONTEXT 1 . 2 . 3 .
Systematic review: Experimental models in dementia research are important tools for fundamental medical research and drug discovery.Here we reviewed challenges in preclinical experimental neurodegenerative disease modeling and translation to clinic, highlighting machine learning (ML) and artificial intelligence (AI) approaches used to overcome these issues.Interpretation: We identified four key challenges: a lack of reproducibility, poor data curation, species divergence, and insufficient interpretability.We offer recommendations and examples of how to address these challenges, using careful experimental design and targeted ML/AI approaches.Future directions: While only recently adopted in preclinical dementia research, AI and ML models have great potential to improve prediction, diagnostics, and biological understanding of neurodegenerative diseases.With high quality, well-curated data and the specific adaptation of approaches including transfer learning, structural equation modeling (SEM), simulations and neural networks, both reproducibility and cross-species translation could be improved, while continued efforts should address the interpretability of these models.
and a semi-supervised deep learning approach proposed by Mourad118 implementing a convolutional neural network within a graph neural network (CNN-GNN) all observed significantly boosted performance when training on data from multiple species, as opposed to a single species.The developers of Nvwa,119 a deep learning model designed to learn DNA sequence motifs controlling cell type-specific gene expression, also observed a boost in 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons LicenseF I G U R E 1 Genes shared between humans and non-human species.Phylogenetic tree annotated with the percentage of human genes that have 1:1 orthologs in each species (shown numerically and as the filled proportion of each circle).The absolute number of 1:1 orthologs shared with humans are plotted as the color of each circle.Constructed using the orthogene R package. 92Key: Anolis carolinensis, green anole; Bos taurus, cattle; Caenorhabditis elegans, roundworm; Canis lupus familiaris, dog; Danio rerio, zebrafish; Drosophila melanogaster, fruit fly; Equus caballus, horse; Felis catus, cat; Gallus gallus, chicken; Homo sapiens, human; Macaca mulatta, rhesus macaque; Monodelphis domestica, gray short-tailed opossum; Mus musculus, house mouse; Ornithorhynchus anatinus, platypus; Pan troglodytes, chimpanzee; Rattus norvegicus, brown rat; Saccharomyces cerevisiae, baker's yeast; Schizosaccharomyces pombe, fission yeast; Sus scrofa, pig; Xenopus tropicalis, western clawed frog.15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License performance when the model was trained on multiple closely related species at once.Others like the melanoma enhancer prediction model DeepMEL 120 successfully trained on cancer cell lines in one species (human) to predict in another species (dog), which can be considered a form of transfer learning.
Enhancing reproducibility across model systems and experiments: To enhance applications of AI and ML approaches in model systems, reproducibility should become a priority, driven by large enough, wellcontrolled experiments, that allow the statistical study and resolution of biases and artifacts.Conversely, ML approaches including simulations can improve model reproducibility in experimental research, as can pretrained unsupervised clustering methods in the context of single-cell genomics.Improving upon small and disjointed datasets: AI and ML methods often require large and high-dimensional training datasets to yield robust and appropriately fitted models.We recommend increasing experimental sample sizes and enhancing integration of existing data resources with biological and clinical data to facilitate this.Numerous data resources are already openly available spanning genomics, proteomics, phylogeny, and clinical databases.These should be expanded and leveraged for ML analyses in experimental dementia research.Ultimately, we should aim to generate massive, uniform datasets, while continuing to develop methods to deal with heterogeneity in the meantime.

F I G U R E 2
Key challenges which need to be overcome to enhance the application of machine learning (ML) and artificial intelligence (AI) approaches for experimental models in dementia research.efficiencyand improved model reproducibility can be enhanced by appropriate and careful application of AI and ML approaches in the field.
15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Fur86er exacerbating these challenges is the lack of a controlled ontology when describing or annotating results of a model experiment, relating these terms across species.This can include, for example, the mouse phenotypes that correspond to specific symptoms of AD or PD in humans.The Human Phenotype Ontology (HPO)85,86and its counterparts in non-human species aim to catalogue the full breadth It is therefore important to assess sex-balanced cohorts, both at preclinical and clinical drug development stages.The correct use and development of better preclinical models should prevent several pitfalls of clinical phases in drug development, such as proper evaluation of pharmacokinetics and pharmacodynamics.7015525279,0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License though these 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Major initiatives to share datasets applicable to experimental modeling of dementia.
TA B L E 1 146 approaches have not yet been extended to experimental models of dementia.
166,167Specific unsupervised ML frameworks 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License like autoencoders and generative adversarial networks (GANs) have been used extensively for dimensionality reduction, data denoising (eg, dropout correction), artifact removal (eg, batch, species), feature selection (eg, differential gene expression), data labelling (eg, cell type, disease state), data integration (eg, across datasets and/or omics modalities), clustering, data visualization, and other downstream 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License we reviewed challenges to translation from model to human, including issues surrounding reproducibility, with the aim of making recommendations to enhance reproducible research and translatability via the adoption of AI approaches.Successful applications of AI and ML in the domain of experimental dementia research are limited; however, other biomedical research fields have witnessed promising advances.Such methodological developments and applications can be adapted to research questions in neurodegeneration, building on existing and novel high-dimensional datasets, including single-cell and spatial omics, proteomics, metabolomics, and biomarker profiles.With the projected growth of quantitative data on preclinical models for dementia research, we are optimistic that increased translational 15525279, 0, Downloaded from https://alz-journals.onlinelibrary.wiley.com/doi/10.1002/alz.13479by University College London UCL Library Services, Wiley Online Library on [02/10/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License