Machine learning‐based classification of Alzheimer's disease and its at‐risk states using personality traits, anxiety, and depression

Alzheimer's disease (AD) is often preceded by stages of cognitive impairment, namely subjective cognitive decline (SCD) and mild cognitive impairment (MCI). While cerebrospinal fluid (CSF) biomarkers are established predictors of AD, other non‐invasive candidate predictors include personality traits, anxiety, and depression, among others. These predictors offer non‐invasive assessment and exhibit changes during AD development and preclinical stages.


| INTRODUCTION
Alzheimer's disease (AD) is commonly preceded by cognitive impairment states, namely subjective cognitive decline (SCD) and mild cognitive impairment (MCI).While MCI requires a measurable deviation from normal cognitive performance as assessed by neuropsychological testing, SCD does not.[4][5][6][7][8] Established biomarkers for the diagnosis of AD and associated risk stages are altered levels of amyloid beta (Aβ1-42), total tau (tTau), and phosphorylated tau (pTau181) in cerebrospinal fluid (CSF; 3,7,9 ).Obtaining CSF samples requires an invasive lumbar puncture and is typically only performed in cases of clinical suspicion.
Hence, less invasive measures have been proposed.This study undertook a comparative assessment of the predictive value of voxelwise resting-state functional magnetic resonance imaging activity of the default mode network (DMN), personality traits, depression, anxiety, apolipoprotein E (ApoE) genotype, and CSF biomarkers.These predictors were employed in a machine-learning classification framework to distinguish between different groups of participants positioned along the trajectory of Alzheimer's disease or those in a cognitively healthy state (Figure 1).At an intra-individual level, personality traits 10 change in premorbid cognitive states and in AD itself.2][13][14] Similarly, at an inter-individual level, individuals with AD display higher neuroticism and lower scores in agreeableness, extraversion, conscientiousness, and openness compared to healthy controls in both selfand informant ratings. 15,16In general, a linear trend reflecting the severity of cognitive decline is apparent in personality trait scores, indicating that alterations in AD are more notable and pronounced compared to its preceding stages.
Personality traits are considered rather stable throughout life, while anxiety and depression are transient states.However, anxiety and depression are widely reported to correlate with personality traits [17][18][19] and may be regarded as proxies for neuroticism. 20,21gher levels of depression and anxiety are consistently associated with subjective cognitive decline (SCD), 22 aMCI, 23,24 and AD 25 and may be used as predictors for these cognitive states.Comparisons of affective symptoms between SCD/MCI and SCD/AD have yielded inconsistent results, but higher prevalence of depressive symptoms is observed compared to healthy controls. 22Higher anxiety and depression levels increase the risk of converting from (a)MCI to AD [26][27][28][29] and treatment of these conditions might potentially reduce the conversion rate. 30Additionally, the rate of cognitive decline is reported to be influenced by the age of depression onset. 31[34] Activity of the DMN 35 can be assessed employing resting-state fMRI 36 and metrics like PerAF 37 by measuring BOLD signal fluctuations.Patterns of AD-typical Aβ plaques deposition and disturbances in DMN functional connectivity of the DMN show considerable overlap. 38DMN functional alterations have been described in individuals with aMCI and AD for a range of measures, including amplitude of low frequency fluctuations, therefore holding potential diagnostic value for identifying AD and its at-risk states. 3,7,39,402][43][44] The ApoE genotype is proposed as a risk marker in individuals with SCD. 2 Previous research has mostly tested the aforementioned predictors individually in discriminating cognitively healthy individuals from those at-risk for or with AD.Here, we assessed their diagnostic value in a cross-sectional multi-class classification approach, 45 including all four participant groups simultaneously.Our primary focus was to evaluate the role of personality traits, both individually and in combination with depression and anxiety.Furthermore, we aimed to compare the performance of all assessed feature sets in terms of their respective predictive accuracies, that is, class and decoding accuracies.In this study, the term "predictive" refers to support vector classification performance of feature sets differentiating participant groups in a cross-sectional design, not the prediction of a longitudinal diagnostic outcome.
Our hypotheses were as follows: (1) Measures of personality traits would yield significant predictive accuracies above chance across all participant groups.
(2) Combining personality traits, depression, and anxiety scores would improve predictive accuracies compared to personality traits alone.
(3) A feature set comprising non-invasive predictors (voxel-wise resting-state activity of the DMN, personality traits, depression and anxiety scores, and ApoE genotype) would yield equal or higher predictive accuracies across all groups compared to a feature set consisting of CSF biomarkers (tTau, pTau181, and Aβ42/40 ratio).

| Participants
For our cross-sectional study, we used baseline data from participants recruited through the DELCODE study.For detailed information on the DELCODE study, see Jessen et al. 8 .We included a large cohort of 733 participants that were assigned to four different groups based on their entry diagnosis: HC, SCD, aMCI, and mild AD.
All participants were aged 60 years or older, fluent in German, able to F I G U R E 1 Study design.In a crosssectional design, predictor variables were combined into feature sets that were used in the SVM classification to predict participant groups.The feature set "confounding variables" was included in all other feature sets and also served as the base model.
give informed consent, and had a study partner present.Please see Table 1 for details.
Participants for the study were recruited either through local newspaper advertisements or from memory clinics.Healthy controls self-identified as cognitively healthy and passed a telephone screening for SCD.These individuals were included as HC if their memory test performance was within 1.5 standard deviations (SD) of the age-, gender-, and education-adjusted normal performance on all Consortium to Establish a Registry for Alzheimer's Disease (CERAD) subtests and if they did not meet the SCD criteria. 2Conversely, individuals expressing cognitive decline concerns to the memory center physician were categorized as either SCD or aMCI, based on a comprehensive semi-structured interview following the SCD-plus criteria 2 and their CERAD performance.SCD participants outperformed the −1.5 SD below normal, while aMCI patients underperformed (>1.5 SD) on the "recall word list" subtest, thus excluding non-amnestic MCI participants.They did not meet the criteria for dementia, and their inclusion was based on the memory clinic diagnoses, which adhered to the current research criteria for MCI as defined by the National Institute on Aging-Alzheimer's Association. 1,46signment to the AD group was based on both clinical diagnosis and on the Mini Mental Status Examination (MMSE).Only participants with mild AD (>18 points and <26 points on the MMSE) were included.Aside from HC, all participant groups (SCD, MCI, AD) were memory clinic referrals and underwent clinical assessments at their respective memory centers.These assessments consisted of a medical history review, psychiatric and neurological examinations, neuropsychological testing, blood laboratory analysis, and routine MRI scans.Cognitive function was measured using the CERAD neuropsychological test battery, which was administered at all memory centers.were acquired in odd-even interleaved-ascending slice order.

| MRI data acquisition
Participants were instructed to lie inside the scanner with eyes closed, but without falling asleep.Directly after, phase and magnitude fieldmap images were acquired to improve correction for artifacts resulting from magnetic field inhomogeneities via unwarping.This was followed by brief co-planar T1-weighted inversion recovery EPIs.PerAF is a voxel-wise, scale-independent measure of lowfrequency (0.01-0.08 Hz) BOLD signal fluctuations relative to the mean BOLD signal intensity for each time point, averaged across the whole time series. 37The global-mean-adjusted PerAF (mPerAF) was computed from rs-fMRI using an adapted version a of the RESTplus toolbox. 48A DMN mask 50 was applied, representing a composite of functionally defined regions of interest (ROIs), and the resulting mPerAF maps served as voxel-wise mean-centered predictor variables.

| Clinical and risk factor assessments
Trained study physicians administered the baseline clinical assessments in the DELCODE study.These assessments followed a fixed order and were completed within a single day.Caregivers of participants with AD were allowed to help complete the questionnaires.
Clinical assessments included the Geriatric Depression Scale short form (GDS; 51 ), the Geriatric Anxiety Inventory short form (GAI-SF; 52 ), and the Big Five Inventory short form (BFI-10; 53,54 ).Scores on the five personality scales (each calculated as the mean of the two respective items) were included as five standardized predictors.The sum scores of GDS and GAI-SF were included as standardized predictors, respectively.

| Assessment of confounding features
Chronological age was included as a standardized predictor (mean = 0, SD = 1).The acquisition site predictor used in the DEL-CODE study included 10 distinct sites across Germany, which were represented as dummy-coded predictors using 10 binary variables.
Gender was included as a dummy-coded predictor with two binary predictors.

| Prediction of outcome from predictor variables and performance assessment
Predictor variables were combined into eight feature sets (Figure 1).
In this study, we will employ the terms "predictor(s)" and "feature(s)" interchangeably, as well as "group(s)" and "class (es)", to represent the same concept. - of 14 (CA), that is, the same proportion, separately for each group, each ranging between 0 and 1.
For each feature set, statistically significant differences from chance-level prediction for DA and CAs were tested, and pairwise comparisons of each feature set against the base model were performed.This was done using one-tailed paired t-tests for the classification performance of each feature set against the base model, with each pair consisting of a subsample evaluated using both feature sets.
Bonferroni-Holm correction was applied for multiple testing.Additionally, a subsample-by-subsample correlation matrix of DAs across all permutations was computed and incorporated into a general linear model of the pairwise accuracy differences across all subsamples.All scripts used to perform the analyses are available at https://github.com/jmkizilirmak/DELCODE162.

| Handling of missing values and unbalanced class sizes
Participants and the "CSF" feature set was excluded from inferential comparisons to maintain statistical power.Supplementary information provides an alternative analysis with equal sample sizes (N = 311; Table S4) across all feature sets, as well as an analysis with SCD and aMCI groups merged into an "at-risk for AD" group (Table S2).
Subsampling was used to ensure equal numbers of participants in each group when performing SVC. 55The size of each subsample was based on the smallest group (rounded off to the nearest 10).A total of 30 subsamples were created, and each subsample was subjected to 1000 permutations of group membership to establish a null distribution.Permutations were performed to calculate the p-value of the prediction accuracy.

| RESULTS
Classification results are reported in Table 2 and inferential statistical comparisons are reported in Table 3. DAs are visualized in

| Base model: Low predictive value of combining age, gender, and site
The "base model" produced the lowest overall DA (DA = 0.345, p = 0.047) and no CA was significantly different from chance for any group (Figure 3).

| Personality trait and affective state scores: Highest prediction accuracies for HC and across groups
Feature set "Personality" was consistently outperformed by "Personality extended", which produced the overall highest DA (DA = 0.414, p = 0.001).Combining scores of geriatric depression and anxiety demonstrated the overall highest class accuracy for healthy controls (CA = 0.628, p = 0.003) and the overall third-highest DA (0.392, p = 0.003).

| Relatively poor performance of combined predictors without CSF biomarkers
Across all groups and in terms of DA, prediction accuracies of feature set "All w/o CSF" were consistently lower than those of "Personality" and "Personality extended" and it was not in the top three CAs for any participant group.

| DISCUSSION
In this cross-sectional study, we aimed to evaluate the diagnostic value of several feature sets for Alzheimer's disease, associated atrisk states (SCD, aMCI), and healthy controls using support vector machine classification.We focused on the performance of combining personality traits with scores of depression and anxiety, as well as examining the predictive ability of DMN BOLD amplitude fluctuation measured through resting-state fMRI, ApoE genotype, and CSF biomarkers.All feature sets demonstrated decoding accuracy significantly above chance (Table 2).
The highest decoding accuracy was observed in feature sets: (i) "Personality extended," which combined personality traits with anxiety and depression scores; (ii) "CSF", consisting of tTau, pTau181, and Aβ42/40 ratio; (iii) "ApoE," including the ApoE genotype; and (iv) "Depression, anxiety," comprising depression and anxiety scores.The only feature sets not achieving significant above-chance classification performance for HC were "Base model" and "CSF", with the latter showing the lowest overall accuracy for the aMCI group.

| Inferiority of the combined predictor and poor prediction accuracy of resting-state activity of the DMN
Our hypothesis that combining non-invasive predictors (feature set "All w/o CSF") would outperform CSF biomarkers in prediction accuracy was not supported by our data.The classification accuracies of the "All w/o CSF" feature set were comparably low and similar to the "mPerAF" feature set, suggesting that the inclusion of mPerAF paradoxically reduced classification performance.While DMN resting-state mPerAF performed above chance, its performance did not significantly differ from the "Base model".
The predictive ability of resting-state fMRI of the DMN for AD has yielded inconsistent findings.While certain studies have reported consistent alterations in DMN activity and connectivity in AD 39 and the added value of combining different MRI modalities to classify AD, 56 other research suggests that neuropsychiatric measures may have higher predictive ability. 57 is important to note that most DMN studies have focused on functional connectivity rather than voxel-wise amplitude measures like mPerAF.The divergent results could be attributed to our approach of evaluating all groups simultaneously, resembling a fully automated diagnostic process, as opposed to making binary decisions between distinct groups.Furthermore, unequal sample sizes can introduce bias in classification, and various approaches have been proposed to address this issue. 58

| A combination of personality, anxiety, and depression scores yield a relatively high overall prediction accuracy
Personality alone demonstrated class accuracies statistically significant above chance for the groups of HC and AD, but not for SCD and aMCI, partially confirming our hypothesis."Personality" was surpassed by the feature set "Personality extended".However, the accuracy of correctly classifying the aMCI group was equally high, while class accuracies for the SCD and aMCI groups remained nonsignificant, partially supporting our hypothesis.These results indicate that depression and anxiety contribute additional predictive value to the decoding accuracy of the BFI-10.The highest class accuracy for HC, however, was achieved by a feature set containing scores of depression and anxiety, and adding personality traits did not improve class accuracy.Previous studies have indicated that depressive episodes can be prodromal manifestations of neurodegeneration in AD. 32,33,59 Possibly, alterations in levels of depression within the SCD and aMCI groups surpass changes in personality traits when contrasted with shifts seen in healthy controls.The predictive ability of the feature set "Depression, anxiety" for HC may be primarily attributed to the GDS as some of the GAI-SF items overlap with those of the BFI-10 neuroticism scale, suggesting depression scores to be well-suited in distinguishing between healthy individuals and participants with cognitive impairment.AD participants were best classified using a combination of CSF biomarkers, consistent with previous findings. 9,60,61The predictive value of combining CSF F I G U R E 2 Decoding accuracies of the evaluated feature sets.The 90% confidence intervals were obtained by averaging the confidence intervals of the 30 subsamples (single dots) on which SVCs were performed.biomarkers, personality traits and scores of depression and anxiety should be investigated further.

| Poor classification accuracy for SCD and aMCI with any feature set
Predictions for participant groups with SCD or aMCI were mostly above chance level but not statistically significant (Table 2).This trend persisted after merging SCD and aMCI into an "at-risk for AD" group (Table S2) class accuracies for the groups of SCD, aMCI, and "at-risk for AD". 62-64

| Limitations
Our study has several limitations.CSF biomarkers were only measured in a portion of the sample, resulting in different sample sizes for feature sets and exclusion of the "CSF" feature set from inferential analysis.Anosognosia is known to be a common occurrence in the early stages of AD [65][66][67] and may also have confounded the assessments of the GDS, the GAI-SF, 68 and the BFI-10. 69Additionally, caregiver influence on self-reports may have affected the accuracy of assessments in the aMCI and AD groups.Another important limitation relates to the demographics of the groups.
Despite being composed of confounding variables only, the "Base model" performed above chance.This can be attributed to the association between age and dementia risk. 70On average, AD participants were older than HC or those with SCD (Table 1).However, because age was included in all feature sets, its predictive value was consistently accounted for.Finally, the cross-sectional design is a limitation, as it precludes the use of longitudinal data to track personality change and assess the validity of the markers over the natural progression of the participants.This underscores the need for future research to complement our findings with longitudinal data.

Figure 2 and
Figure 2 and CAs in Figure 3.The four best performing feature sets sorted by decoding accuracy are depicted as a confusion matrix in

F I G U R E 3
Class accuracies of the evaluated feature sets.The dotted line represents the chance level.Error bars represent the average 90% confidence interval across all 30 subsamples.

of 14 -
tures achieved consistently superior class accuracies for all assessed participant groups.The combination of depression and anxiety scores F I G U R E 4 Confusion matrices of best performing feature sets by decoding accuracy.10 WASCHKIES ET AL.was most effective in classifying healthy controls, supporting previous findings that regard late-life depression as a prodrome of Alzheimer's disease, while CSF biomarkers were most effective in classifying participants with mild Alzheimer's disease.The highest overall prediction accuracies across all participant groups were achieved by a combination of personality traits with scores of depression and anxiety, closely followed by CSF biomarkers and the ApoE genotype.These findings indicate that a combination of CSF biomarkers, personality, depression and anxiety scores, and the ApoE genotype may have complementary value for classification of AD and associated at-risk states.Further investigation is needed, particularly regarding the predictive value of personality traits and associated affective states as low-cost and easily assessable screening tools.Moreover, our findings highlight the challenge of accurately classifying SCD and aMCI groups using machine learning approaches when the underlying conditions of these cognitive impairments are unknown.Addressing this challenge requires adhering to consensus on terminology and conceptual frameworks.AFFILIATIONS 1 German Center for Neurodegenerative Diseases (DZNE), Göttingen, Germany 2 Department of Psychiatry and Psychotherapy, University Medical Center Göttingen, Göttingen, Germany

fMRI data preprocessing and analysis
Descriptive statistics of predictor variables.
Because four groups were included, chance performance was at 0.25.Mean accuracy, 90% CI and mean p correspond to the average across 30 subsamples.The p-value of each subsample was obtained by comparing the accuracy value to the null distribution generated from 1000 permutations.
T A B L E 2 SVM classification results.WASCHKIES ET AL.