Artificial intelligence for biomarker discovery in Alzheimer's disease and dementia

With the increase in large multimodal cohorts and high‐throughput technologies, the potential for discovering novel biomarkers is no longer limited by data set size. Artificial intelligence (AI) and machine learning approaches have been developed to detect novel biomarkers and interactions in complex data sets. We discuss exemplar uses and evaluate current applications and limitations of AI to discover novel biomarkers. Remaining challenges include a lack of diversity in the data sets available, the sheer complexity of investigating interactions, the invasiveness and cost of some biomarkers, and poor reporting in some studies. Overcoming these challenges will involve collecting data from underrepresented populations, developing more powerful AI approaches, validating the use of noninvasive biomarkers, and adhering to reporting guidelines. By harnessing rich multimodal data through AI approaches and international collaborative innovation, we are well positioned to identify clinically useful biomarkers that are accurate, generalizable, unbiased, and acceptable in clinical practice.


INTRODUCTION
Biomarkers are defined as measurable characteristics that are indicators of biological processes, pathogenic processes, or responses to interventions. 1 These can be subtyped by their application, and for this review we focus on indications of susceptibility or risk, diagnostic, and prognostic biomarkers as defined in Table 1.Improving the clinical management of dementia requires reliable biomarkers that can aid in identifying high-risk populations, early diagnosis, accurate subtyping, and predicting prognosis, drug response, or adverse events. 2 Biomarkers from different biological scales have been investigated in people with dementia, including neuroimaging, 3 electrophysiological, 4 genetic, 5 gene expression, 6 protein, 7 metabolic, 8 gut microbial, 9 sleep, 10 gait, 11 and digital 12 biomarkers.Biomarkers from cerebrospinal fluid (CSF), 13 as well as minimally invasive collectable biological fluids including blood, 14 saliva, 15 tear, 16 and urine 17 have shown promise in improving dementia diagnosis.CSF biomarkers such as CSF amyloid beta (Aβ), total tau (t-tau), and phosphorylated tau (p-tau) have been introduced in some research-based clinical centers and amyloid positron emission tomography (PET) can be used to estimate plaque density using florbetapir, flutemetamol, or florbetaben tracers. 18However, TA B L E 1 Focused biomarker subtypes.

Subtype Definition
Susceptibility biomarkers Indicate the risk of developing specific diseases in future in those without clinically apparent disease

Diagnostic biomarkers Detect or confirm diseases or their subtypes
Prognostic biomarkers Indicate the likelihood of disease progression in those who have the disease routine clinical use of dementia biomarkers has not reached most clinical settings, 19 and dementia continues to be diagnosed on the basis of clinical diagnostic criteria. 20Advancing biomarker research using an interdisciplinary approach is essential for accelerating the discovery of more reliable and clinically adaptable biomarkers for dementia. 21zheimer's disease (AD), vascular dementia (VaD), dementia with Lewy bodies (DLB), and frontotemporal dementia (FTD) are the four most common subtypes of dementia.They differ in their rate of progression, 22 mortality rate, 23 medication response, 24 and susceptibility to medication-related adverse events.Early diagnosis and accurate subtyping are important for the safe and effective management of dementia, 24 but there are currently few biomarkers available to serve this important clinical need.Moreover, mild neurocognitive disorder or mild cognitive impairment (MCI) is characterized by objective evidence of cognitive impairment that is not sufficiently severe to interfere with independent daily living. 25It is important to be able to predict conversion to dementia in people with MCI in clinical settings, but we do not have any biomarker that can be used routinely to address this need.
Several methodological challenges impede the discovery and clinical validation of dementia biomarkers, yet traditional statistical approaches are insufficient to help discover novel dementia biomarkers using exponentially growing multi-omics and multimodal data.
Artificial intelligence (AI) approaches have already demonstrated their potential for discovering novel dementia biomarkers.7][28]

AI and data science methods used in biomarker discovery
AI approaches and machine learning (ML), in particular, have been used successfully in the analyses of many modalities of data for dementia-related diseases and for exploring different biomarkers.
These approaches are key for robust interrogation of complex and multimodal data sets to identify novel patterns and potential biomarkers. 29e application of AI methods varies according to the biomarker type, and these methods are traditionally grouped by input data and algorithm learning style (Figure 1).Supervised learning uses input data which have a known classification; in the case of biomarker discovery data this is often disease status or a related endophenotype.

RESEARCH IN CONTEXT
1. Systematic Review: Artificial Intelligence (AI) and machine learning are making a unique contribution to dementia biomarker research and discovery.
2. Interpretation: By summarizing publications on the advanced methods used in dementia biomarker discovery and identifying the exemplar of studies, we identify gaps, issues, and challenges, and we suggest potential future applications of AI to biomarker discovery.Although there have been comprehensive reviews on current Alzheimer's disease (AD) and non-AD dementia biomarkers, robust clinically useful biomarkers have yet to be identified.

Future Directions:
The key to progress in biomarker discovery using AI is the support of funders to allow the generation of suitable data sets and collaboration across cohorts and studies to promote both sharing and scaling up of sample sizes.

F I G U R E 1
Comparison of Machine Learning approaches.For biomarker discovery two ML approaches are utilized in biomarker discovery depending on required outcomes.To define biomarkers based on a known patient phenotype supervised learning can be implemented.In contrast, during data exploration, unsupervised methods can be used to identify patient subtypes.
Supervised learning includes regression, support vector machines (SVMs), random forest, and advanced deep learning methodology.
Unsupervised learning methods are often used to explore data and understand structure.These methods include clustering algorithms or dimensional reduction approaches to reduce data set complexity or to stratify a data set by feature similarity. 29Targeted fluid-based biomarker discovery often incorporates a simple classification method, using a receiver-operating characteristic (ROC) curve to assess the accuracy and performance of novel biomarkers in validation stages. 30y to developments in discovery approaches are understanding the performance of AI/ML in increasing the sensitivity and specificity to identify biomarkers for dementia-related diseases.More advanced segmentation and ML techniques including thresholding, supervised and unsupervised learning, probabilistic techniques, atlasbased approaches, fusion of different image modalities, and enhanced probabilistic neural networks have been applied to neuroimaging biomarkers. 31ong the barriers in current discovery studies, one of the main issues is the lack of adequate sample sizes, which has been addressed recently by the rise of large openly available data sets.Furthermore, with the difficulty of collecting labeled data over time (e.g., due to insufficient follow-up studies), assessing the predictive power of certain biomarkers is a challenge, which can be addressed by applying semisupervised/unsupervised learning techniques and deep learning (DL) approaches. 32The Deep and Frequent Phenotyping cohort captures data from fluid biomarkers, digital wearables, imaging, and clinical tests assessing cognition repeatedly over a period of time. 34proaches required to interrogate these diverse resources are complex and often require methods capable of accounting for noise and confounders.

State of the science
The key genetic risk biomarkers can be divided into early-and lateonset forms of AD. 35 Family inheritance and mutations in key genes (amyloid precursor protein, presenilin 1, and presenilin 2) can be used to identify the rarer early onset subtype of the disease.The apolipopro-tein E (APOE) gene is the prominent risk factor for late-onset AD, although carriers of the pathogenic ε4 allele have only an increased risk of disease rather than a clear positive diagnosis.Genome-wide association studies (GWASs) continue to identify additional single nucleotide polymorphisms (SNPs) linked to late-onset AD.However, each SNP individually provides only a very small contribution toward AD susceptibility. 5By combining these SNPs, we can create more powerful polygenic risk scores (PRS) that act as better predictors of disease susceptibility. 36PRS are generated by summing the weighted effect sizes from relevant SNPs in previous GWASs, which are linked to genes from multiple mechanistic pathways.Prediction accuracy is optimized by including two main components: the APOE risk alleles and a combination of supporting SNPs from other risk genes. 37,38her analyses use common biomarkers as a starting point to understand more about the dataset.Palmqvist et al. 39 combined a series of known biomarkers using cross validation to understand which biomarkers gave the highest accuracy of prediction for conversion from MCI to AD. Discovery of novel biomarkers, which is key to advance the field, is more likely in datasets where there is a larger selection of assays.For example, in GWAS or imaging studies, where multiple measures can be derived from scans.Here, it is often necessary to use more advanced methods.For example, where RNA sequencing was used for risk biomarker discovery, the authors used network analysis to prioritize relevant genes from the differentially expressed results in a larger data set containing MCI, controls, and AD cases.When hub genes were tested in a prospective cohort, an overall accuracy of 0.727 was achieved. 40 important step in the prediction of AD is the conversion from MCI classification and pre-clinical stages of disease to AD. Biomarkers used in diagnosis can also be used as markers of change.Serum neurofilament light (NfL) chain was used in multivariate linear mixedeffects models alongside estimated years to symptom onset as an early biomarker. 41Using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), researchers performed a comparison between genomic and imaging variables using four different ML approaches to understand the potential of cost-effective, minimally invasive transition biomarkers.The researchers found that by combining relatively easily obtained measurements (i.e., plasma biomarkers, genetic risk, and cognitive scores) the prediction results were competitive with imaging data.When random forest and gradient boosting approaches were applied, plasma analytes measuring APOE and C-reactive protein were ranked among the most important features. 42The multimodal data set of ADNI was also used by Gupta et al. for the classification of AD and MCI subjects.They integrated structural magnetic resonance imaging (MRI), PET, CSF (proteins), and APOE genotype using a multiclass SVM classifier with stratified cross-validation to show the improvement of combining modalities. 43Proteins measured in the CSF were Aβ42, t-tau, and p-tau181 , and although the CSF features outperformed other modalities in individual comparisons, the best predictions were acquired using a combination of the modalities.These results show that using advanced methods and multimodal data will generate integrated markers to improve current susceptibility biomarkers.

Remaining challenges
Some of the major remaining challenges faced by risk biomarker studies are basic in nature: many studies focus exclusively on one or a few measurements or modalities.This limits the scope for the application of AI and ML approaches and, in line with that, currently applied methods tend to be more traditional and regression-based, 44 such as the Cox proportional-hazards model for time-to-event predictions. 45,46A variety of potential biomarkers are studied in the context of disease risk predictions, spanning neuropsychiatric and clinical measurements, socioeconomic and lifestyle factors, neuroimaging, genetics and omics, as well as peripheral blood biomarkers.Some of these modalities, such as omics 40 and neuroimaging, 47 may lend themselves better to AI approaches, due to their high-dimensional nature.Within omics biomarkers, genetic risk factors are possibly the best-studied subset, with predictions based on individual variants as well as combined polygenic scores.However, these approaches are generally limited to basic summing of risk and assume uniform additive genetic effects.
Genetic risk prediction has great potential to expand with novel AI and ML approaches, leveraging large training data sets and allowing for more complex genetic relationships.Delivering on this promise will pose several challenges to ensure that the predictions are robust and do not suffer from overfitting.In addition, AI approaches to time-toevent prediction in dementia face specific methodological challenges including missing data and heterogenous phenotypes. 48Dementia is a highly heterogeneous syndrome, encompassing multiple diseases, and characterized by substantial variability on neuropathological profiles.Current risk-prediction tools focus on modifiable risk factors 49 and although the Cardiovascular Risk Factors, Aging and Dementia (CAIDE) score incorporates APOE ε4 status, the accuracy of these predictions may be increased by including other key biomarkers.In order to improve risk prediction generally, a focus on the prediction of neuropathological signatures and deep phenotyping rather than disease status or clinical symptoms may prove more effective. 50

State of the science
Numerous studies have sought plasma biomarkers relevant to AD diagnosis, and the past decades have seen several analytes being tested for this purpose. 513][54] Apart from AD hallmarks in blood, an increasing number of studies have discovered a range of other proteins and metabolites in plasma that might also act as diagnostic biomarkers. 55,56ML has demonstrated to be very useful to extract reliable predictors for the development of diagnostic biomarkers of AD in the past decade.These techniques have been applied to traditional biomarkers, such as neuroimaging, 31 nuclear medicine, 57 and electroencephalography, 58 as well as promising ones, for example, neuropsychological measures 59 and speech-based digital biomarkers. 60 example is the use of ML in the construction of metabolite panels from top discriminant metabolites in biofluids. 61To date, studies have included assays run in CSF, plasma, saliva, and brain tissue, detecting associations with AD outcomes. 62Stamate et al. 63

Remaining challenges
ML and more-conventional statistical models have been applied previously to clinical phenotypes in dementia for the prediction of clinical outcomes.However, reaching an adequate performance or prediction accuracy to allow for the use of such models in clinical practice remains challenging. 65The poor performance of such models highlights the limited informative contribution of clinical phenotypes without biological data. 66Therefore, using supervised ML techniques based on clinical phenotypes and the integration of known and newly discovered biomarkers to build a more-informed predictive model for prognostic and diagnostic purposes might be a solution. 67For example, Ashton et al. 67 used data from multiple cohorts to explore the accuracy of plasma NfL assay's diagnostic performance across diseases.They found that plasma NfL is clinically useful in identifying atypical parkinsonian disorders in patients with parkinsonism, dementia in individuals with Down syndrome, dementia among psychiatric disorders, and frontotemporal dementia in patients with cognitive impairment.
Heterogeneity within dementia is compounded by the overlap between dementia and other clinically defined neurodegenerative diseases, which complicates the identification of relevant subgroups for personalized treatment. 50Understanding the relationship between dementia phenotypes and underlying causes is, therefore, key to the development of targeted approaches to therapy beyond clearly defined genetic subtypes.In response to this impasse, several international consortia are collecting large data sets with multi-layered genomic, environmental, and clinical data. 29Supervised and unsupervised ML methods such as DL are valuable and promising tools to exploit these complex data sets, but their use involves dealing with petabytes of data and millions of data features, only easily manipulated on secure high-performance computing networks. 68This is a nontrivial problem.Challenges for storage, computing, and data analysis makes e-infrastructure and expertise mandatory.

State of the science
Neuroimaging with computerized tomography (CT) or MRI has become a standard tool in the diagnosis of neurodegenerative disease and shows prognostic potential. 69PET can also provide objective measures to indicate the presence and progression of dementia.By improving the specificity of PET tracers, we are gaining insights into disease stages in vivo that were only possible in the past using pathological examination. 70However, PET imaging is expensive and not yet widely available.
One of the first biomarker changes in AD is a 50% decrease in the levels of CSF Aβ42; however, CSF Aβ42 is a marker of amyloid pathology rather than cognitive decline. 71CSF t-tau and p-tau 72 are increased in AD but not in other tauopathies.CSF tau levels correlate with rapid disease progression. 73As such, CSF Aβ42 combined with CSF p-tau accurately predicts future AD dementia in patients with MCI. 74Certified reference materials and methods are now available, making it possible to standardize CSF Aβ42 thresholds for prognosis globally. 75Although CSF and imaging markers have a good prognostic accuracy, their high cost and invasive nature create a need for the development of more accessible markers that may be found in blood, urine, faeces, saliva, or tear fluid. 15,76Despite accessibility concerns, data sets generated from CSF are ideal for discovery of novel markers due to decreased experimental noise.New markers can then be tested in other modalities such as blood plasma. 67Repeated, non-invasive measures will also be possible using eye imaging, where deposition of Aβ and tau, atrophy of neuronal layers, and vascular changes could all be potentially monitored. 34,77st of the current prognostic biomarkers for AD were discovered through hypothesis-driven research using conventional statistical approaches to predict disease outcomes.However, AI algorithms have been gaining traction as an alternative approach for prediction, and combining biomarkers from multiple modalities, such as a CSF, MRI, and cognitive performance biomarkers, has been shown in recent studies to improve model performance. 78,79A recent prime example showed four distinct trajectories of tau deposition in AD. 80 This study highlighted the power of pooled images from diverse studies to identify disease trajectories and individual variability that would be lost when using the Braak staging system, which is more suitable for population-level description.Complex data from a single modality can also benefit from advanced methods; for example, multi-level autoencoder models were used for interpretation of longitudinal methylation data. 81In a detailed comparison, Chen et al. showed that for this data set, convolutional neural networks were outperformed by other autoencoders and that the informative methylation sites were enriched for expected gene ontologies.

Remaining challenges
Early diagnosis of AD is important for prevention and planning therapeutic strategies.To develop screening tests with greater specificity, further research is needed to identify additional biomarkers beyond the main CSF biomarkers currently in use for AD (Aβ42, t-tau, and p-tau), such as Aβ42/Aβ40 ratio and α-synuclein. 82,83ood-based biomarkers hold immense promise for evaluating AD prognosis due to their high accessibility and low cost.However, standardized assays and procedures need to be developed to facilitate the comparison of biomarker measurements across different batches and laboratories. 44In addition, the introduction of specified cutoffs would facilitate the use of plasma biomarkers in a clinical setting. 44tabolites may also serve as useful prognostic biomarkers for dementia.A metabolomics study that used an SVM classifier and random forests identified distinct sphingolipids and glycerophospholipids associated with AD pathology and preclinical disease progression. 84e metabolites were implicated in several biologically relevant pathways in AD, including tau phosphorylation, Aβ metabolism, calcium homeostasis, acetylcholine biosynthesis, and apoptosis.
However, there are particular challenges when it comes to employing ML for defining progression from MCI to AD.In addition to the lack of large enough data sets to train ML models effectively and the variability of data, the key challenge with feature selection is due to the complex and heterogeneous nature of MCI.This, in combination with the myriad of factors that contribute to disease progression and the limited understanding of underlying disease mechanisms, means that prognostic biomarker discovery will require both clinical and AI knowledge for optimal analysis.

Improvements in methods
Our field is some way from making full use of this approach for specific management pathways or targeted disease-modifying therapies, which would enable individualized precision medicine. 85With the advent of drugs targeting specific proteinopathies, the role of biomarkers specific to underlying pathologies will be increasingly important.
The majority of AI studies in dementia use the ADNI data set, which has advantages in its size and availability, but has drawbacks in the limited recruitment of non-Alzheimer's dementias and biases within the data set. 86We call for wider recruitment of people attending memory clinics with longitudinal data to permit the development and real-world validation of biomarkers, such as that collected by the National Alzheimer's Coordinating Center. 87In these realworld settings, biomarkers will need to demonstrate clinically relevant benefits.These include cost savings for health systems and quality of life measures for patients. 88e dementia field may be able to learn from other fields of medicine.AI and big data approaches for developing cancer biomarkers have had a significant impact on cancer care, 89 for example, the use of RNA profiling for treatment.Molecular biomarkers, for example, mutations in the estrogen receptor 1 in breast cancer, are used to predict both treatment outcomes and prognosis.Discovery studies of both prediction and prognosis markers in different cancer types have applied advanced approaches including DL and decision trees. 90Oncology also leads the field on the use of multi-omics data for patient stratification and personalization with clinically validated biomarkers; however, a gene expression panel is also used in tests for coronary artery disease and a test for cardiac allograft rejection. 91 amyotrophic lateral sclerosis, open access data cohorts and data challenges have led to the development and comparison of DL models to predict disease progression. 92Pancotti et al. have ranked model features to better understand the predictive power of different measures and markers but key to interpretability and clinical translation will be the replication and explainability of these models.Appropriate method selection is also a key issue for replicable biomarkers.Accessible coding libraries allow users to apply many new methods; however, overfitting due to sample size can be a limiting factor with these approaches and, in many cases, decision tree-based approaches (e.g., random forest or gradient boosting) will still outperform advanced neural network models. 93plication of complex models on non-standard data sets, such as those found in primary care or brain health clinics, also raises issues.Few ML algorithms have been tested in prospective replication studies and discrepancies between training data and real-world clinical data make biomarker translation more challenging. 94Adding to these models, higher-order features from different modalities make the assessment of performance and ranking of features more complex.Additional considerations for the successful application of ML models include the balance of classes within the data set (e.g., case and control) and the use of a suitable training set. 95Careful testing of optimized models to take forward for further study is necessary where the order of feature ranking can change between ML methods or even model iterations.Successful features must be robust to these changes.Feature selection itself can be used to improve model performance and a comparison of methods to find the most appropriate model for the data set is recommended to promote feature stability. 96stained and strategic funding will be required to realize the vision of biomarkers in clinical practice.Given its social and economic impact, government funding is likely to play a big role in facilitating this work.Therefore, we welcome initiatives such as the U.S.

Improvements in data sets
As well as improving the speed and power of algorithms, 97 to move toward biomarkers that have clinical applicability, much of the future development in the field lies in data sets.Many studies rely on single data sets for discovery, with cross-validation to estimate algorithm performance.Given the generalizability issues of unseen data, accuracy drops when tested on other research data sets, and substantially when tested on clinical data. 98This is particularly problematic for AD, a heterogeneous condition, but crucial for successful translation into clinical practice.It is also symptomatic of a larger issue, which is that current data sets used for biomarker discovery lack diversity of participants. 99mparisons of known biomarkers between populations have shown a discrepancy in prediction with CSF biomarkers p-tau and t-tau, confirming the need for more representative cohorts. 100In direct response to this bias, the ADNI 4 will be focused on enrolling 50% to 60% of new participants from underrepresented populations. 101However, in addition to the collection of representative data, consideration of the analysis approach is important and care should be taken not to simply use a variable to delineate subgroups but rather to consider the complexities in a full analysis. 102thout addressing this crucial need to better represent the populations for which biomarkers should ultimately improve health outcomes, improving performance metrics using increasingly complex algorithms built on homogenous input data is likely to be unproductive.This is reflected in recent policy updates, including a European Commission White Paper, which places requirements on training data such that it does not lead to discrimination. 103 advantage of ML is that algorithms can be adaptive, continually learning in response to new data even after deployment.This could help overcome the lack of diversity in current training sets but means that published performance metrics hold less meaning.Thus another future direction for the field is to move away from single summary metrics such as accuracy, instead evaluating health outcomes and impact, such as in clinical trials of Software as a Medical Device (SaMD), proposed by the U.S. Food and Drug Administration (FDA) to regulate AI. 104 Improving risk detection will only have a positive impact if results are followed up appropriately. 104,105This relies on a number of factors, including health service infrastructure and resources, as well as trust from stakeholders in AI.Ensuring interpretability of models can help foster trust, as can reporting an algorithm's confidence in its decision. 27timately, policy and regulation must be based on outcomes for patients. 105While we move toward this goal, a consensus on reporting performance metrics would aid in comparing results of different biomarkers and ML approaches, which lack standardization. 21e capabilities of ML to handle high-dimensional data sets mean that it is tempting to continue extending algorithms with multiple biomarkers to try and further improve accuracy.This should be weighed against the clinical feasibility of requiring multiple, often expensive, tests that may not be available equitably.Rather, the power of harnessing ML to analyze multi-omics data sets lies in the ability to develop biomarkers that are sensitive to the multifactorial nature of neurodegenerative diseases, while ruling out those that are redundant.Using both complementary and unique methods across biological scales of information, and modalities, is essential. 61Similarly, although larger data sets seem attractive in ML, they do not necessarily help the issue of bias 106 ; rather, more-diverse data sets are needed. 107 suggest that the future of biomarker discovery lies in testing of biomarkers in other data sets to ensure against cohort effects, their ability to distinguish different types of dementia to improve differential diagnosis, and building infrastructure to improve diversity in data. 63,97,108

FUTURE DIRECTIONS OF AD BIOMARKER DISCOVERY
The search for novel biomarkers to further dementia and AD phenotypes in addition to disease pathogenicity is key.Clinical biomarkers extracted from electronic health records have been used to identify coronavirus disease 2019 (COVID-2019) patients and when data becomes available, unsupervised approaches can be used to profile AD cases. 109For dementia biomarkers, the first step would be to understand the translation of CSF markers to blood (plasma or serum) assays, followed by that in other biofluids (e.g., saliva, urine).The discovery of novel biomarkers and replicating their application requires large data sets, and therefore the collection of new cohorts continues as well as the need for meta-analysis.Replication in multimodal data sets is key in the pathway to translation to a clinically applicable biomarker.It is likely that fluid biomarkers will be used as part of a battery of tests in a clinical setting alongside clinical assessment of psychiatric and neurological features, cognitive testing, and structural brain imaging. 94erefore, the key to the success of novel biomarker is its performance alongside other measures and whether it can increase the accuracy of the test panel.AI methods are used increasingly in the discovery of imagingbased biomarkers-or example, the use of measures of structural complexity, specifically a mathematical measure called fractal dimensionality, which can be derived from conventional 3D segmented T1 MRI scans. 110In addition, MR elastography can be used as a biomarker by examining regional brain stiffness 111 to understand tissue biomechanical properties.Using this approach, stiffness in affected regions has been found to be lower for dementia patients than for healthy controls. 112More mainstream use of these example measures, which involve specific data-acquisition methods, will allow for richer multimodal data to be integrated within AI models that can be used to classify dementia type and severity.
Both brain and eye imaging can provide a significant resource for analysis.AI techniques, especially DL, are used widely used in image processing in the brain.In the eye and brain, DL-based solutions have already created opportunities for clinical and research use for automated approaches. 113The emergence of combined molecu-lar imaging, in combination with brain and eye phenotyping, has just shown promising preliminary data providing evidence of how multimodal imaging could generate information for AI approaches. 114ate-of-the-art AI algorithms show an exceptional capability to learn from complex imaging data.Despite these successes, there is a continuous need for improved image segmentation and classification and noise or artifact reduction to improve image interpretation that still require human decision-makers for prognosis, diagnosis, and monitoring of response to treatment. 115,116However, with the improved methodologies and the widening biomarker platforms, cross-modality translation and synthesis are on the near horizon.
The evaluation of AI devices by regulatory agencies will be critical to their implementation in real-world settings, and like other modalities these biomarkers will need to demonstrate clinically relevant benefits.
The FDA maintains a current list of AI applications at different stages of approval. 117These are assessed for both safety and effectiveness of the tool.In Europe, there is no AI-specific list, although recommendations have been made and a literature review by Muehlematter et al. includes dementia-specific devices. 118Further streamlining and demystifying of regulatory processes will be essential for translating this growing academic field for patient benefit.

RECOMMENDATIONS AND CONCLUSIONS
Once identified and validated, novel biomarkers can be considered for clinical translation.Robust analytical pipelines are critical to the longterm use of AI approaches.Guidelines for generating these pipelines are available (e.g., the Findable, Accessible, Interoperable and Reusable (FAIR) Cookbook; https://fairplus.github.io/the-fair-cookbook/content/recipes/introduction/FAIR-cookbook-audience.html), but a commitment to their use is a key recommendation for incorporation of new biomarkers.In addition, reporting guidelines to support publication of prediction studies are under development by the EQUA-TOR network.TRIPOD-AI will provide guidelines for diagnostic and prognostic model publication. 119r new biomarker discovery, however, collaboration and an interdisciplinary approach will be necessary, which is increasingly recognized by funders.The EU Joint Programme-Neurodegenerative Disease Research-seeks multinational, consortium grants.In the United Kingdom, the UK Dementia Research Institute was recently launched with the single biggest investment in dementia research, with collaboration at the heart of its approach.In North America, both the National Aging.The inclusion of any newly discovered biomarkers will be reliant on sufficient replication and performance in multiple cohorts.Here, data sharing is key and steps made by these communities will be vital in the following stages of developing reproducible markers.
To overcome the remaining challenges in dementia biomarker research, it will be essential to collect additional data from underrepresented populations to reduce bias, develop even more powerful AI approaches to enhance accuracy, validate the use of noninvasive biomarkers to improve practicality and improve adherence to reporting guidelines, which are coproduced by multiple stakeholders to improve reproducibility.By harnessing rich multimodal data through AI approaches and international collaborative innovation, we are well positioned to identify biomarkers that are accurate, generalizable, unbiased, and ready for translation to clinical practice.
showed that plasma metabolites have the potential to match the area under the curve (AUC) of well-established AD CSF biomarkers.Comparing the accuracy of a number of ML methods (decision trees and DL) they reported aspartate and dodecanedioate as the top ranked features.Prompt and precise diagnosis of dementia subtype is of clinical relevance and might be of paramount importance for studies aimed at assessing neuroprotection or disease-modifying approaches to early stages of the disease.Despite the growing amount of data, ML has not yet been incorporated into a clinically available diagnostic tool for AD or other types of dementia.Toschi et al. 64 used a clustering approach on CSF biomarkers to address the heterogeneous AD pathology.Five distinct clusters of samples were generated with unique cellular and molecular profiles.Of them, two clusters showed biomarker profiles linked to neurodegenerative processes not associated with classical AD-related pathophysiology.One cluster was characterized by the neuroinflammation biomarker YKL-40.However, the clinical relevance of these clusters has yet to be established.
Institutes of Health Bridge2AI program and Canadian Institutes of Health Research Training Platform grants have an interdisciplinary and intersectoral focus.These programs also recognize the need to improve diversity in the workforce, as well as in participants, so that studies using AI are also representative of society.International initiatives are underway to improve research infrastructure and sharing of data, such as the Deep Dementia Phenotyping (DEMON) Network (demondementia.com),and the Canadian Consortium on Neurodegeneration in