Genetic risk for alzheimer disease is distinct from genetic risk for amyloid deposition

Objective Alzheimer disease (AD) is the most common form of dementia and is responsible for a huge and growing health care burden in the developed and developing world. The polygenic risk score (PRS) approach has shown 75 to 84% prediction accuracy of identifying individuals with AD risk. Methods In this study, we tested the prediction accuracy of AD, mild cognitive impairment (MCI), and amyloid deposition risks with PRS, including and excluding APOE genotypes in a large publicly available dataset with extensive phenotypic data, the Alzheimer's Disease Neuroimaging Initiative cohort. Among MCI individuals with amyloid‐positive status, we examined PRS prediction accuracy in those who converted to AD. In addition, we divided polygenic risk score by biological pathways and tested them independently for distinguishing between AD, MCI, and amyloid deposition. Results We found that AD and MCI are predicted by both APOE genotype and PRS (area under the curve [AUC] = 0.82% and 68%, respectively). Amyloid deposition is predicted by APOE only (AUC = 79%). Further progression to AD of individuals with MCI and amyloid‐positive status is predicted by PRS over and above APOE (AUC = 67%). In pathway‐specific PRS analyses, the protein–lipid complex has the strongest association with AD and amyloid deposition even when genes in the APOE region were removed (p = 0.0055 and p = 0.0079, respectively). Interpretation The results showed different pattern of APOE contribution in PRS risk predictions of AD/MCI and amyloid deposition. Our study suggests that APOE mostly contributes to amyloid accumulation and the PRS affects risk of further conversion to AD. ANN NEUROL 2019;86:427–435

A lzheimer disease (AD) is the most common form of dementia in elderly people and is a major health problem worldwide. 1 The clinical diagnosis is typically characterized by progressive loss of memory and cognitive function. In the past decade, numerous relevant susceptibility loci, genes, and pathways have been identified 2-6 that have improved the understanding of this complex disease. However, the risk for developing AD involves multiple genetic and environmental components, with the APOE genotype 7 having the strongest genetic effect. 2 Amyloid-beta (Aβ) plays a key role in the pathogenesis of AD, but little is known about the process of its formation in the brain. Identification of the earliest pathological signature of AD requires longitudinal measurements of Aβ deposition in the brain by positron emission tomography (PET) or by measurements of Aβ reduction in cerebrospinal fluid (CSF). Although Aβ is necessary for the pathologic diagnosis of AD, it is not sufficient in itself to cause cognitive dysfunction and clinical AD. It has been shown that amyloid deposition has low specificity for predicting development of AD. 8,9 The preclinical stage of AD starts with mild impairment in cognitive domains (MCI) and includes a syndrome featuring relatively isolated memory deficits. 10 In 2011, the National Institute on Aging and Alzheimer's Association created separate sets of diagnostic guidelines for the symptomatic or "clinical" stages of AD, 11,12 where AD represents the "disease" and "dementia" represents the clinical syndrome. Thus, a person may progress from MCI to dementia (due to AD), but both MCI and dementia cases may or may not be AD.
Studying individuals who develop MCI and then further progress to AD requires detailed longitudinal datasets. The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a multicenter study designed to assess the utility of various biomarkers for detecting early changes associated with MCI and AD. It includes collection of neuroimaging data, clinical and cognitive assessments, and information on demographics and individual genetic profiles.
The polygenic risk score (PRS) approach aggregates the effects of multiple genetic markers identified through genome-wide association studies (GWASs) 2 and has shown great potential in identifying an individual's risk of developing AD. 13,14 A few studies have recently used AD PRS to predict mild cognitive functions and clinical MCI 15 ; however, only one has suggested that PRS could identify MCI in middle aged adults 16 more effectively than the APOE locus alone. The PRS approach has also been applied to biological pathways related to AD but was not more predictive than APOE alone. 17 The implementation of polygenic hazard score (PHS; closely related to PRS 18 ) analysis in the ADNI data showed that PHS is associated with AD biomarkers (CSF and PET) in individuals without AD, 19 and that higher PHS was associated with greater rates of cognitive and clinical decline, even after controlling for APOE status 20 ; however, its predictive value was not quantified.
In this study, we estimate the predictive accuracy of PRS differentiating (1) AD cases versus controls, (2) MCI cases versus controls, and (3) amyloid-positive versus amyloid-negative individuals. We also investigate whether (4) the AD PRS can predict individuals with MCI who will progress to AD and those who will remain with MCI, with positive amyloid deposition.
Recently, GWASs and exome/genome sequencing have implicated, with varying degrees of confidence, lipid metabolism, the innate immune system, and endosomal vesicle recycling in late onset AD pathogenesis. 21,22 Therefore, we also examined the pathway-specific PRS association using these recently identified pathways 6 related to AD risk.

ADNI: Setting/Clinical Description
Data used in the preparation of this article were obtained from ADNI, a publicly available database (https://adni.loni.usc.edu). The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The data were collected for about 900 individuals between ages 55 and 90 years. The initial 5-year study (ADNI1) followed participants for 2 to 3 years, with repeated imaging scans and psychometric measurements every 6 or 12 months. All ADNI participants provided written informed consent. The ADNI project was extended as the ADNI-GO and ADNI2, studies with a proportion of new and original ADNI1 participants.
Clinical diagnosis and genetic information were available for 770 individuals from the ADNI1, ADNI-GO, and ADNI2 studies. Longitudinal data contained information about clinical assessments from the first visit (baseline) to the latest available visit, with mean follow-up time of approximately 5 years. Details of the ADNI design, participant recruitment, clinical testing, and additional methods have been previously reported elsewhere. 23,24 Table 1 shows the classification of diagnosis and number of individuals whose diagnosis remained stable during the study. It also presents the diagnostic categories and the numbers of individuals within those diagnostic categories at the latest assessment, which were used for the analyses.
To assess amyloid deposition, the latest MRI PET scans from 663 participants were used in the analysis (AV45 ligand threshold of 1.11). In this study, we used the individuals' diagnosis at the latest point of assessment. We then tested whether AD PRSs were associated with AD, MCI, and amyloid status in 3 main analyses: (1) AD versus controls, (2) MCI versus controls, and (3) amyloid-positive versus amyloid-negative status ( Table 2).

ADNI: Genotyping and Quality Control
A total of 770 samples from ADNI1/GO/2 set were wholegenome sequenced (WGS) and genotyped using the Illumina (San Diego, CA) Omni 2.5M BeadChip (42,732,452 variants). WGS calls were made using the Broad Institute best practices (BWA & GATK HaplotypeCaller).
Basic quality control checks were performed using standard procedure. 25 Single nucleotide polymorphisms (SNPs) were excluded where genotype missingness was >0.02, Hardy-Weinberg equilibrium p value was <1e-6, and SNP minor allele frequency was <0.01. This retained 7,808,548 SNPs for the analyses. Matching those SNPs with the latest publicly available GWAS AD summary statistics 2 reduced that number to 5,771,686. 428 Volume 86, No. 3

Generating PRS
Generation of PRS requires 2 independent datasets: summary statistics of association with AD in a discovery sample; and a test sample, which is independent of the discovery sample and contains genotypes for each individual. 26 As the discovery sample, we used summary statistics from the powerful GWAS (17,008 AD cases and 37,154 controls) of the International Genomics of Alzheimer's Project (IGAP Stage 1). 2 PRSs were generated using SNPs with AD association p ≤ 0.5 in the IGAP dataset, as it has been reported as having the best prediction accuracy. 13 The SNPs were then linkage disequilibrium (LD) pruned (r 2 = 0.1 and 1,000kb window), keeping the SNPs most associated with AD. The number of SNPs after the LD pruning was 162,957. We included APOE ε2 and ε4 allele genotypes directly into the PRS with effect sizes B = −1.04 and B = 1.55 for ε2 and ε4, respectively, calculated in the ADNI data, while excluding the APOE region (chromosome 19:44,400-19:46,500kb). 13 Prior to all analyses, the PRSs were adjusted for the 8 principal components and then standardized. A total of 441 ADNI participants were part of original IGAP summary statistics. 2 To overcome a potential bias in PRS analysis due to overlapping samples, we used a simulation approach we previously described. 14 In brief, first we assessed the variation in the SNPs' effect sizes using 1,000 simulations when randomly excluding 266 cases and 173 controls (matching the numbers of overlapping samples). The variation in the IGAP effect sizes due to the overlap was estimated in terms of standard deviation (SD IGAP = 0.053) from the mean (ie, the original IGAP SNP beta-coefficient [Beta IGAP ]). Then, new IGAP genome-wide summary statistics were simulated 10,000 times with adjusted effect sizes (Beta adjusted ) and p values for each SNP. Beta adjusted was sampled from a normal distribution with mean = Beta IGAP and SD = 0.053*SE IGAP ; p values adjusted were

Genome-wide and Pathway-Specific PRS Predictions
Initially, we tested whether PRSs are associated with AD risk (AD cases vs controls) in the ADNI dataset. Then we assessed whether the AD PRS can distinguish individuals with MCI from cognitively normal controls and amyloid-positive from amyloidnegative individuals. Finally, we assessed whether PRS can predict AD risk over and above APOE in MCI individuals who have had positive amyloid deposition (to be precise, the MCI individuals who converted to AD between the baseline and final time of assessment vs nonconverters). All analyses were performed using logistic regression models with the following predictors: (3) full PRS model (predictors 1 and 2 together). Gender and age were used as covariates in all analyses. We tested whether the PRS significantly improves the model fit over and above APOE alone with the anova() function in R. We report the accuracy of the models in terms of AUC. In addition, we calculated PRS prediction accuracies (AUCs) in the extremes of PRS distribution for individuals whose PRS score was greater or smaller than AE1.5 SD from the PRS mean.
For the pathway-specific analyses, we chose the latest published 9 pathways that have been reported as playing a role in AD pathogenesis, namely (1) protein-lipid complex assembly, (2) regulation of beta-amyloid formation, (3) protein-lipid complex, (4) regulation of amyloid precursor protein catabolic process, (5) reverse cholesterol transport, (6) protein-lipid complex subunit organization, (7) plasma lipoprotein particle assembly, (8) tau protein binding, and (9) activation of immune response. 6 Finally, to quantify the proportion of variance that remains unexplained by the pathways together, we calculated and tested PRS for the whole genome excluding these 9 pathways.
Pathway-specific PRSs were generated in the ADNI dataset for each individual as described above with and without the APOE region. The PRSs in this case were adjusted not only for 8 principal components but also for age and gender and then standardized.
The results were considered significant if the resulting p value was ≤1.85 × 10 −3 = 0.05/(3 scenarios × 9 pathways), corresponding to the Bonferroni correction for multiple comparisons.

Results
The prediction accuracy of AD cases (n = 174) versus controls (n = 224) at the last assessment point was AUC APOE = 76% and AUC PRS = 75%, for APOE alone and for PRS without APOE, respectively (Table 3, first row). The best prediction accuracy (AUC FULL = 82%) was achieved with the full model, which includes both APOE and PRS. An analysis of variance test (last column of Table 3) confirmed that PRS significantly improves the prediction accuracy of the model over and above APOE (p = 1.7 × 10 −13 ). A similar pattern of results was observed when we compared MCI individuals at the last point of assessment (n = 344; see Table 1 for details) with controls; however, the accuracy was reduced (AUC APOE = 62%, AUC FULL = 68%). Again, PRS significantly improves the prediction accuracy of MCI risk over and above APOE (p = 2.5 × 10 −11 ). Figure 1 shows standardized density plots of polygenic risk scores in AD cases (red line), controls (blue line), and MCI (orange line), where the mean of the PRS for the latter is between the means of the PRS for AD cases and controls. Interestingly, the results for prediction of amyloid deposition by PRS follows a different pattern; APOE alone significantly predicted amyloid deposition with AUC APOE = 76%, and PRS did not improve the prediction accuracy further.
When we tested the full PRS model for prediction of individuals at the extremes of polygenic score distribution (AE1.5 SD from the PRS mean), the prediction accuracy as expected increased (AUC = 94% for AD vs controls and AUC = 91% for MCI vs controls).
We tested whether the PRS can predict progression to AD in individuals with MCI. Of 459 individuals with MCI at the baseline assessment, 441 had known amyloid deposition status (270 were amyloid-positive and 171 were amyloid-negative). The prediction accuracy of amyloid deposition in this subsample was AUC APOE = 79% by APOE alone and PRS did not improve the prediction accuracy (p = 0.48; Fig 2). Of 270 amyloid-positive individuals, 112 have progressed to AD and 150 individuals remained MCI as of the last point of assessment. In this case, PRS did predict AD progression (AUC APOE = 63% and AUC FULL = 69%), significantly improving the prediction over and above APOE (p = 0.0002; see Fig 2).
Finally, we calculated pathway-specific PRSs and tested them for association with risk for AD, MCI, and amyloid deposition. The results are presented in Table 4. The majority of pathways were significantly associated with AD risk; however, this association was mostly driven by the APOE region. Two pathways (protein-lipid complex, protein-lipid complex subunit) remained significant after removing genes in the APOE region. When we excluded all pathways from the whole genome PRS, we observed that a substantial part of variance still remained unexplained (p = 2.2 × 10 −14 ; last row of Table 4). Comparing amyloid-positive versus amyloid-negative individuals, the same 2 protein-lipid-related pathways and additionally reverse cholesterol transport were significant after removing genes in the APOE region. The association results of the 9 pathways' PRS with MCI risk were nominally significant for all pathways, and the association was mostly attributed 430 Volume 86, No. 3 to APOE. This clearly demonstrates that the pathways that contain the APOE region are strong predictors of amyloid deposition. Protein-lipid complex has shown the strongest association with AD and amyloid deposition risk in all the analyses. The overlap of genes in the 3 pathways above is presented in Figure 3. 27 Finally, we tested these pathways' PRS for association with amyloid deposition in individuals with MCI and with their further progression to AD when their amyloid deposition status was positive. We found that protein-lipid complex, protein-lipid complex subunit organization, and reverse cholesterol transport pathways are also associated with amyloid deposition even after exclusion of the APOE region ( Table 5, 4th column).

Discussion
The pathological process related to AD starts long before clinical onset and lasts approximately 15 to 20 years 28 It is widely believed that identifying individuals who have high risk of AD earlier is essential for therapeutic strategies for AD prevention and intervention. 29 Due to the diagnostic heterogeneity of MCI and different length of follow-up assessments, the conversion rate to AD or other types of     to AD, it is notable that an increase in amyloid plaques that starts many years before clinical symptoms appear plays an important role in brain degenerative changes. A reasonable prediction accuracy can be achieved with a PRS approach that uses genetic profile information and relates it to AD risk. 13,14 The PRS and its modifications have been assessed for association with AD and AD-related phenotypes in a number of studies; however, the reported prediction accuracies have not been entirely consistent. In this study, we examined prediction accuracy that can be achieved with APOE alone and with the full PRS model differentiating between AD, MCI, controls, and amyloid status.
We have shown that the best prediction accuracy can be achieved with the PRS that includes APOE for both AD versus controls and MCI versus controls analyses (AUC = 82% and AUC = 68%, respectively). In both analyses, the PRS improves the prediction accuracy by about 8 to 9% compared to APOE alone, which replicates the analyses in independent datasets published elsewhere. 13,14,16 Of course, GWASs indicate that APOE is the strongest risk factor and other common genetic variants have smaller effect sizes. However, the APOE region explains~5% of SNP heritability, whereas the whole genome explains~24%. 32 In addition, PRS prediction accuracy shows a substantial increase in AUC, which makes the PRS potentially clinically useful for disease risk prediction. Furthermore, AD GWAS risk loci have greatly expanded our understanding of the disease mechanisms.
As expected, the accuracy of MCI prediction is lower than AD, which can be explained by the inclusion of a subset of MCI individuals who will not develop AD. For individuals with extreme PRS, the AUC reaches 90% and above for both AD and MCI.  The prediction of amyloid deposition showed a different pattern. In the whole sample, the prediction accuracy with APOE alone was 76% and the PRS did not improve the accuracy any further (AUC remained 76%). Similar results were obtained when we tested the prediction accuracy of amyloid deposition in individuals with MCI. However, when we looked at individuals who have already had positive amyloid deposition and attempted to predict their progression to AD, the best accuracy was observed with the full PRS model, which includes the APOE region; however, this also requires the PRS component.
Note that for all the models used, the best prediction accuracy was achieved with a p value threshold of 0.5 for AD-associated SNPs. The same threshold was previously reported in studies that were done on different genotyping arrays. 13,14 For the best prediction accuracy in clinical practice, PRS should be generated on a set of SNPs in a way that captures genetic liability of the whole genome.
The potential implication of these findings is that the APOE gene affects amyloid deposition but that much of the rest of the risk of disease is involved in the rate at which amyloid deposition causes a neurodegenerative response. Clinical trials have previously shown that there is little correlation between AD progression and accumulation of amyloid plaques, supporting a hypothesis that AD development may have 2 separated stages: amyloid dependent and amyloid independent. 33 It is also known that the APOE gene influences the deposition of amyloid in the brain 34 and that this is necessary but not sufficient for development of clinical AD. Moreover, it has been shown that neuronal loss and tangle numbers increase as AD progresses, 35 unlike the number of amyloid plaques, which reaches its maximum 36 with the onset of clinical symptoms.
While analysis of early onset AD firmly implicated amyloid precursor protein metabolism and Aß production in the etiology of the disease, GWASs and exome and genome sequencing have implicated with varying degrees of confidence a number of potentially biologically relevant pathways in late onset AD pathogenesis. 21,22 Of course, pathway construction is an imperfect art both because of the knowledge base used in the generation of the pathways and because proteins may have more than one function in more than one cell type. Nevertheless, it is valuable to divide polygenic risk by pathways both in terms of modeling the disease through induced pluripotent stem cell technologies (one might like to assign high or low risk by pathway) and in terms of eventually tailoring therapies to pathway deficits. To dissect AD PRS by biologically relevant gene sets, we tested pathways enriched in AD 6 identified by IGAP. All pathways except "activation of immune response" were highly significantly associated with AD risk and amyloid deposition risk; however, most of the signal was attributed to the APOE region alone. Protein-lipid complex showed the strongest association with AD and amyloid deposition risk in all the analyses.
In conclusion, our results imply that APOE contributes to disease risk in a manner that is mechanistically different from the other genetic contributors to disease risk. We speculate that APOE affects amyloid deposition and that the PRS affects conversion from amyloid positivity to AD. Therefore, in the context of the amyloid cascade hypothesis, APOE acts prior to amyloid deposition and the remaining genetic risk factors identified through GWASs act between amyloid deposition and clinical onset of AD.