Predicting progression to Alzheimer's disease dementia using cognitive measures

It is important to determine if cognitive measures identified as being prognostic in dementia research cohorts also have utility in memory clinics. We aimed to identify measures with the greatest power to predict future Alzheimer's disease (AD) dementia in a clinical setting where expensive biomarkers are not widely available.

� We evaluated the prognostic utility of cognitive tests to predict incident AD dementia in a combined memory clinic sample of patients with subjective cognitive decline (SCD) or mild cognitive impairment (MCI) � A statistical model including change in cognition (from first to second assessment) alongside baseline scores was able to predict AD more accurately versus baseline scores alone

| INTRODUCTION
Dementia is increasingly recognized as a worldwide healthcare challenge.Whilst there is still no cure for the most common cause of dementia, Alzheimer's disease (AD), two recent phase 3 trials of antiamyloid immunotherapies delayed clinical progression in patients with mild AD. 1,2 It is thus imperative to identify patients in the prodromal stages of AD dementia, who can be prioritized for further investigations with the view to initiate interventions. 3 The two most intensively studied at-risk states for sporadic AD dementia are subjective cognitive decline (SCD) 4 and mild cognitive impairment (MCI). 5Briefly, SCD is defined as perceived cognitive decline without impairment on objective cognitive testing, whereas MCI is characterized by objective cognitive impairment in the absence of substantially impaired activities of daily living (ADL).
The relative risk of all-cause dementia in SCD versus agematched healthy controls is 2.1, 6 while the equivalent risk for patients with MCI is considerably larger, at 15.9. 7Biomarkers are increasingly used to help predict the clinical trajectory of patients with SCD and MCI. 8 However, many biomarkers (e.g., positron emission tomography or cerebrospinal fluid measures of beta-amyloid) are invasive, expensive and/or have limited availability outside of more developed countries.Moreover, while there is considerable interest in blood-based biomarkers of AD, these are not yet clinically validated or widely available. 9Whilst the range of neuropsychological testing available in different memory clinics varies, cognitive tests are already routinely used for the assessment of dementia, including at memory clinics with fewer resources.
Whilst the number of cognitive tests used for neuropsychological assessment varies between memory clinics, it is common for a range of measures to be administered.It would be advantageous to identify those measures with the greatest power for predicting which patients will progress to AD dementia, as this could contribute to a more efficient clinical service and reduce the burden on patients.
Additionally, when individuals present with SCD or MCI, and do not go on to convert to AD or another type of dementia shortly after, it is unclear which patients should be offered further clinical follow-up. 10y additional indicators regarding who should be discharged would be beneficial for clinical decision making.In addition, the advent of disease-modifying therapies means that identifying individuals on the pathway to AD dementia is increasingly important.
This study made use of 18 years of clinical data from the Essex Memory Clinic, at which patients were assessed annually using a comprehensive battery of cognitive measures.The objectives of this study were to: identify the baseline cognitive measures with the greatest utility for predicting progression to AD dementia at the third visit (and beyond); and to establish the additional predictive power gained by incorporating data from the first follow-up after SCD/MCI diagnosis.

| Population selection
Data from the clinical database of the Essex Memory Clinic were used, spanning a period from 2002 to 2019.Only patients with a minimum of three assessments were eligible for inclusion in this study.This was to enable change over two assessments to be used to predict progression to AD dementia (hereafter, 'AD') at future visits (and compared to the predictive accuracy using baseline measures alone).However, had the outcome been measured at visit two, the association between change scores and AD could be uninterpretable due to scores from that visit also being used for diagnosis (i.e., giving rise to dependency between the predictors and outcome).

| Assessments
The available cognitive measures are described below:

| Mini-mental state examination
The mini-mental state examination (MMSE) is a brief screening test for cognitive impairment. 11The MMSE includes items measuring the following: orientation to time/place; registration; attention and calculation; recall; and language (maximum score 30).

| Cambridge cognitive examination-revised
The Cambridge Cognitive Examination-Revised (CAMCOG-R) is a criterion test for dementia with an optimum cut-off score of 80 (maximum score 105). 12Various subscales are available, but this study utilized the total score only.

| Wechsler memory scale logical memory test
The administration of the Wechsler Memory Scale Logical Memory Test (LMT) involves the assessor reading two short story paragraphs to the patient. 13Immediate recall is assessed after the reading of each story, while delayed recall is assessed approximately 30 min later (scores were converted to age-adjusted percentiles for the present study).

| Trail-making test
The trail-making test (TMT) comprises two parts, each of which requires patients to link up 24 consecutive circles arranged on a page by drawing a line through them. 14Part A (TMT-A) can be considered a test of simple visual attention and psychomotor speed, whereas part B (TMT-B) assesses executive task switching.For the present study, completion time in seconds was analyzed.

| Verbal fluency
Two types of verbal fluency were measured (category and letter); both measures evaluate executive control and verbal ability.For category fluency, individuals are required to generate as many animal words as possible within 1 min. 15For letter fluency, individuals are required to generate as many words as possible beginning with the letters "F", "A" and "S" (with 1 min allocated per letter). 15For both tests, the total score (number of unique eligible responses) was analyzed.

| Alzheimer's disease assessment scale- cognitive (naming objects and fingers)
The Naming Objects and Fingers subtest from the AD Assessment Scale-Cognitive (ADAS-Cog) was used for the assessment of semantic memory (maximum score 17). 16xiety symptoms were assessed using the Rating Anxiety in Dementia (RAID) 17 scale or the anxiety subscale from the Hospital Anxiety and Depression Scale (HADS).18 Depressive symptoms were measured using the 15-item Geriatric Depression Scale (GDS-15) 19 or the depression subscale from the HADS.18 For patients assessed using the HADS, we used the subscales as measures of anxiety and depression, respectively.For patients who did not complete the HADS, we used the RAID score for anxiety symptoms and the GDS-15 score for depressive symptoms.
In addition, the data collected for each patient at visit one included age, sex, years of education, medical and psychiatric history, mental state examination, and physical examination (including full neurological examination).Both analyses used a variable selection method to produce a subset of cognitive measures most strongly predictive of conversion to AD (the least absolute shrinkage and selection operator; Lasso).

| Statistical modeling
Cross validation was used to select the Lasso tuning parameter lambda.The value of lambda was selected which maximized the area under the receiver operating characteristic (ROC) curve for the logistic regression models, and Harrell's C-index for Cox proportional hazard models. 20th the Lasso logistic regression and proportional hazards models were fitted in two stages.The first only included demographics and cognitive measures from the first visit.The second additionally included change in the cognitive measures between the first and second visits.These change scores were derived by subtracting visit 1 from visit 2 scores.
The Lasso method precludes the derivation of confidence intervals or p-values.There is also limited value in interpreting the coefficients arising from a Lasso model, as these coefficients are inherently biased.Instead, emphasis is placed on which measures were selected, rather than the size of the association between cognitive measures and the outcome.Predictive value of the selected logistic regression model was assessed using a ROC curve and the area under the curve (AUC).Harrell's C-index was used to assess the selected Cox model. 20l analyses were carried out using the statistical software R. 21 The package "glmnet" was used for the Lasso. 22

| Predictors: Cognitive measures
All available cognitive measures were included as candidate predictors in all models.For analyses, all visit 1 cognitive measures were scaled to have a mean of zero and standard deviation (SD) of one.All change scores were also scaled using the SD of the scores from visit 1.For ease of interpretation, unscaled cognitive measures were displayed in descriptive tables.Median (interquartile range) was shown for all visit 1 scores as there was evidence of skew for most; mean (SD) was shown for change scores since these were not skewed.
We report completion times (in seconds) for TMT-A.Instead of including completion times for TMT-B, we report the ratio of TMT-A and -B completion times.This approach adjusts individuals' scores on TMT-B for their motor and visual scanning speed (indexed by TMT-A).It has been suggested that the ratio of TMT scores is a purer measure of executive functions than TMT-B alone. 23In this study, the ratio was derived by dividing the completion time for TMT-A by the time for TMT-B.Thus, in-keeping with the other cognitive measures, lower scores indicate worse cognitive function (because these represent a relative increase in the time taken to complete TMT-B vs. -A).For patients who were discontinued on TMT-B by the clinician (due to making ≥2 errors), the ratio was set to zero.

| Predictors: Demographics
A limited set of demographics was selected to include in the models due to the modest sample size: age at first visit, year of first visit and presence of family history of AD (defined as a first or second degree relative with a diagnosis of AD, coded as a binary variable).Year of first visit was included to control for possible secular trends given the relatively long period of study.

| Population
Table 1 summarizes the demographics of the patients included in this study.A total of 68 (21%) out of 328 individuals were diagnosed with AD at any point during follow-up after the second visit, 30 (44% of 68) of these were at the third visit.The median number of assessments (including visit 1) was 4. The majority of people who were diagnosed with AD during follow-up initially presented with MCI, not SCD (97% of those who progressed to AD had MCI at baseline).
The unstandardized visit 1 (baseline) and change scores between first and second visit are shown in Table 2.The average time interval between successive visits was approximately one and a quarter years.The mean visit 1 scores are worse overall for those who were (vs.were not) diagnosed with AD during follow-up.The greatest standardized mean differences are for the TMT-A, LMT delayed and MMSE, all of which exhibited a difference of ≥0.4 SD between those who do and do not convert to AD.
The change scores for the never-AD group are either close to zero or indicative of improvement for most of the cognitive measures.This most likely reflects neuropsychological retest (practice) effects. 24In contrast, unstandardized change scores for those who converted to AD were mostly indicative of deterioration.This pattern of results was observed for CAMCOG-R (−1.10 for the AD group, 0.28 for the non-AD group); LMT immediate (−0.24AD, 6.17 Anxiety and depression appeared to improve for both groups between visits one and two.The reduction in scores was greatest for the AD group, with mean change scores of −0.10 for anxiety and −0.20 for depression (compared to −0.07 and −0.09 in the non-AD group).

| Statistical models
The results of the Lasso logistic regression of diagnosis at the third visit and Lasso Cox proportional hazards regression of time to diagnosis of AD are shown in Table 3.For both analyses, "model 1" included baseline demographics (they were age at first visit, year of first visit, family history of AD) and measures from the first visit, while "model 2" additionally included change scores between visits one and two.A brief note on terminology: the statistical measures derived from these approaches differ, in that logistic regression yields odds ratios, whilst Cox models yield hazard ratios.The key difference between these measures of association is that hazard ratios take the time "at risk" of AD into account, while odds ratios do not (NB.though in this study, odds ratios reflected the diagnosis of AD at a relatively fixed timepoint, that is, visit 3).   of 10 both models would incur a high FPR of over 30%.Similarly, if the aim was to minimize the FPR, to under 10% for example, both models would only identify around 30% of cases.

Overall
In unplanned sensitivity analyses, patients with a history of stroke (n = 23) were excluded, and the analyses repeated, to establish whether the same results would be obtained for a more homogeneous sample.Excluding these patients resulted in one less patient progressing to AD.The re-analysis yielded identical results for the logistic regression models (see Table 3).However, in the Cox model exclusively using visit 1 scores, only a single cognitive test was selected (i.e., TMT-A).In contrast, the Cox model using visit 1 and 2 scores was identical to the main analyses, save for the omission of baseline MMSE score.

| DISCUSSION
This study set out to identify the baseline cognitive measures with the greatest utility for predicting progression from SCD/MCI to AD dementia, and to establish the additional predictive power gained by incorporating scores from the first follow-up after SCD/MCI diagnosis.
T A B L E 2 Visit 1 and visit 1 to 2 change scores, unstandardized for ease of interpretation.Anxiety and depression scores were standardized, given these were each measured using two different scales.

of 10
- Lower scores for the LMT (delayed recall) at baseline were consistently associated with conversion to AD at the third visit and beyond.Furthermore, deterioration in LMT delayed scores was associated with progression to AD.Given the prognostic value of LMT delayed recall in the present study, memory clinics are encouraged to consider including this measure in their cognitive batteries.Nevertheless, the LMT is a commercial product, and may thus not be a financially viable measure in all clinical settings.Unfortunately, we are not aware of a free alternative.
We also showed that the TMT-A at the first visit was a useful predictor of future conversion to AD.The widely used CAMCOG-R and MMSE were also good predictors of conversion.CAMCOG-R was predictive both at baseline and over 1 year of follow-up.
Notably, a second visit to the memory clinic after diagnosis with SCD/MCI increased the predictive power of the models as measured by the AUC and Harrell's C-index.
The finding that including change scores improved the predictive accuracy of models suggests that longitudinal cognitive data in patients with SCD/MCI captures prognostically useful information "over and above" initial scores alone.For example, having a high starting score for a given test may be less benign in individuals who subsequently show a decline; similarly, a low starting score may be less predictive if this score remains stable at follow-up.In this study, the cognitive scores of the "never AD" patients tended to improve between visit 1 and 2, whereas the "ever AD" group tended to decline.Whilst the former finding likely reflects practice effects, which can be considered a nuisance variable, a recent review found that practice effects are prognostically informative. 24That is, individuals showing practice effects at follow-up had a better prognosis versus those who did not.The likely existence of practice effects in the current study may thus have contributed to the improved prediction when including change scores.
The increased predictive accuracy of an additional visit to the clinic must be weighed against the increased cost and burden on patients.
The model which only included baseline scores had a reasonably good performance as measured by the AUC and C-index.As discussed in the  results section, the optimal strategy may depend on whether the aim is to: identify the most cases; keep the FPR low or strike a balance between the two.In the latter scenario, two visits appear most informative according to the data available for this study.
In sensitivity analyses, patients with SCD/MCI and a history of stroke were excluded, and the models re-estimated.Only one less patient progressed to AD versus the main analyses (likely because patients with a history of stroke were more likely to be diagnosed with vascular dementia).The inferences from the sensitivity models were largely in-keeping with the main analyses, save for the Cox model using visit 1 scores only, which only included TMT-A (perhaps reflecting that patients with prodromal AD had a different baseline cognitive profile from those with a history of stroke).This caveat notwithstanding, the results were largely robust to the inclusion/exclusion of patients with a history of stroke.
Our to AD.The meta-analysis did, however, find an association between poorer TMT-B performance and progression from MCI to AD. 25 In the present study, the ratio of TMT-A/B scores (rather than TMT-B) was evaluated as a predictor of decline, as this may be a purer measure of executive function, 23 but no association was observed between ratio scores and AD.Also in contrast to the present study, the meta-analysis found that tests of category fluency showed relatively high accuracy for predicting progression. 25One explanation for the currently divergent findings is that, when the review authors excluded studies with a high risk of bias, the sensitivity of category fluency tests fell from 0.71 to 0.60, implying that the association between this measure and progression may not be a reliable finding.Consistent with the present results, a study of SCD prognosis using multiple cohorts found that lower MMSE score predicted progression to dementia. 26wer studies have investigated the utility of a change in cognitive scores for the prediction of future AD in patients with SCD/MCI.
Of the studies that have investigated change scores, most evaluated these in combination with biomarkers.Gomar et al. 27  The Lasso method was selected to choose a subset of variables most predictive of conversion to AD.There are known issues with interpretability of the coefficients produced by such models.
We did not estimate the size of the associations between each cognitive measure and conversion to AD.We also did not aim to produce a clinical prediction tool.Instead, the aim was to provide a list of cognitive measures which may be most useful in clinical settings to identify patients at the highest risk of decline who might then be prioritized for follow-up.Whilst our use of the Lasso precludes direct comparisons with other studies, or direct translation of the findings to clinical practice, the measures presently found to be prognostic (or analogs thereof) are used in both research and clinical settings throughout the world.Nevertheless, in considering the implications of our findings for other memory clinics, two factors appear particularly relevant: language and cost.
Whilst the MMSE and LMT are available in a wide range of languages, 29,30 efforts to translate the CAMCOG-R have primarily focused on European languages. 31Moreover, TMT-B requires familiarity with the English alphabet sequence, putting speakers of other languages at a disadvantage.Nevertheless, culturally fair alternatives exist, including the Color Trails Test 32 and the Shape Trail Test. 33It is also important to note that assessment materials for both the MMSE and LMT must be purchased from their respective publishers; this may not be financially viable in all set-

| Strengths
The longitudinal dataset used for this study was collected over 18 years from an unselected, local population.These patients were from a clinical non-academic setting, serving both rural and a wide range of urban populations (including affluent suburbia, market towns and deprived inner-city areas), thus reflecting a wide range of socio-economic circumstances.
A broad range of cognitive measures was collected at each assessment.Clinical diagnosis was made by a consensus of three senior clinicians, which is more accurate than clinical diagnosis made by a single clinician. 36We undertook analyses of both conversion to AD at

Figure 1
Figure 1 displays the data selection flowchart.Patients had to fulfill diagnostic criteria for SCD 4 or MCI 5 at visit 1. Assessments made before a diagnosis of SCD or MCI were excluded, so "visit 1" refers to the first assessment at which a diagnosis of SCD or MCI was made.Dementia diagnosis (including AD) was made according to ICD-10 criteria by a consensus of three senior clinicians.As part of a routine diagnostic workup, structural magnetic resonance imaging scans were performed within 6 months of the initial visit in the majority of cases and were available at the time of consensus diagnosis.Those with a diagnosis of Parkinson's disease at any time were excluded.Patients that progressed to non-AD dementia were removed from the data at time of diagnosis.
Two analyses were carried out: the first sought to predict conversion to AD at the third visit only, using logistic regression.The second F I G U R E 1 Data selection flowchart.AD, Alzheimer's Disease; MCI, Mild Cognitive Impairment; SCD, Subjective Cognitive Decline.model analyzed time to conversion to AD at any visit after the second assessment using a Cox proportional hazards model.
non-AD) and delayed (−2.26 AD, 7.39 non-AD); category fluency (−0.40 AD, 0.10 non-AD); and TMT-A/B ratio (−0.10 AD, −0.03 non-AD, indicating greater lengthening of the relative difference between TMT-A and -B completion times in the AD group).Again, there was a standardized mean difference of over 0.4 for the LMT delayed.
Considering the results from the models which only included demographics and visit 1 scores, lower MMSE and delayed LMT scores and longer TMT-A time were associated with higher odds or hazard of AD diagnosis.In addition, in the Cox model (only), lower CAMCOG-R score was associated with greater hazard of AD.Moving on to the results from the models which also included change scores, worsening CAMCOG-R and LMT (delayed recall) scores were independently associated with both the odds of AD at visit 3 and hazard of AD diagnosis from visit 3 onwards.Using observations from two visits increased the predictive power of the models: the AUC for the logistic regression model only including visit 1 scores was 0.81, compared to 0.87 for the model including two visits.The predictive power of the Cox model for time to AD diagnosis was also improved by the addition of change scores.The C-index for model 1 was 0.76, and 0.81 for model 2.The ROC curves for the Lasso logistic regressions are shown in

Figure 2 .
Figure 2.These indicate that there is a greater difference between the two models around a false positive rate (FPR) of 0.2, and less of a difference below a FPR of 0.1 or above 0.3.The model including a change score could correctly identify approximately 80% of cases of AD at the third visit, while misclassifying approximately 20% of the non-AD cases.The model including visit one only could correctly identify approximately 50%-60% of cases for a similar FPR.If the aim was to correctly identify a larger proportion of cases, over 90%, then score a,b −0.2 [−0.8-0.4]−0.2 [−0.7-0.4]−0.4 [−0.9-0.3]0.18 Depression score a,b −0.

Note:
Abbreviations: AD, Alzheimer's Disease; CAMCOG-R, Cambridge Cognitive Examination-Revised; LMT, Wechsler Memory Scale Logical Memory Test; MMSE, Mini-Mental State Examination; TMT, Trail-Making Test. a For TMT-A, higher scores indicate worse cognition; for all other measures a higher score indicates better cognition.
tings.Whilst free and widely-translated alternatives to the MMSE exist (e.g., Addenbrooke's Cognitive Examination III, 34 Montreal Cognitive Assessment 35 ), we are not aware of such alternatives to the LMT.Nevertheless, measuring cognitive changes within individual patients remains a promising strategy to improve the prediction of AD.

regression with Lasso Cox proportional hazards with Lasso Model 1: Visit 1 only Model 2: Visits 1 and 2 Model 1: Visit 1 only Model 2: Visits 1 and 2
Results of Lasso logistic regressions for conversion to AD at third visit and Cox's proportional hazards models of time to conversion to AD after the second visit.
25sults were in-keeping with previous studies from research cohorts showing that tests of specific cognitive functions have utility for predicting incident AD dementia.A meta-analysis by Belleville et al.25concluded that baseline scores from verbal memory measures (e.g., LMT) and broad cognitive batteries (e.g., CAMCOG-R, MMSE) had high predictive accuracy for predicting progression from MCI to AD.It is difficult to directly compare the present results for the TMT with those of the meta-analysis.Due to an insufficient number of studies, Belleville et al. did not evaluate TMT-A, which here was a significant predictor of progression 28rmanthé et al.28evaluated the prognostic value of plasma neurofilament light (pNfL) and MMSE to predict incident AD in ADNI patients with MCI.The addition of MMSE change scores to a model including baseline pNfL and MMSE significantly improved the prediction of AD.28Whilst, in the present study, MMSE change scores were not predictive of incident AD, scores on a more comprehensive measure of global cognition (CAMCOG-R) were.
27ange scores for cognitive measures, ADLs, and CSF/volumetric biomarkers, and selected the best combination of predictors for future AD.Change scores for TMT-B and ADLs were the only metrics to predict AD in a multivariable model.27