Large‐scale sequencing studies expand the known genetic architecture of Alzheimer's disease

Abstract Introduction Genes implicated by genome‐wide association studies and family‐based studies of Alzheimer's disease (AD) are largely discordant. We hypothesized that genes identified by sequencing studies like the Alzheimer's Disease Sequencing Project (ADSP) may bridge this gap and highlight shared biological mechanisms. Methods We performed structured literature review of genes prioritized by ADSP studies, genes underlying familial dementias, and genes nominated by genome‐wide association studies. Gene set enrichment analyses of each list identified enriched pathways. Results The genes prioritized by the ADSP, familial dementia studies, and genome‐wide association studies minimally overlapped. Each gene set identified dozens of enriched pathways, several of which were shared (e.g., regulation of amyloid beta clearance). Discussion Alternative study designs provide unique insights into AD genetics. Shared pathways enriched by different genes highlight their relevance to AD pathogenesis, while the patterns of pathway enrichment unique to each gene set provide additional targets for functional studies.


BACKGROUND
Alzheimer's disease (AD) is the leading cause of dementia in the United States, estimated to affect 5.8 million Americans in 2020. 1 AD is a complex and highly heritable trait 2 for which there is no efficacious treatment. Drug targets supported by human genetic evidence are much more likely to be approved by the Food and Drug Administration for therapeutic use, 3 demonstrating the need for continued genetics research into AD and an improved understanding of the biological processes underlying the disease.
The known genetic architecture of AD implicates causal and risk variants at dozens of loci. 4 Family studies have illustrated that rare early-onset autosomal dominant AD (ADAD) can be caused by highly penetrant variants in APP, 5 PSEN1, 6 and PSEN2. 7 Although these autosomal dominant variants explain the cause of AD in < 1% of cases, 8 their discovery provided a direct link between AD genetics and pathogenesis through rare coding changes 9 in genes underlying the generation of amyloid beta (Aβ), a neuropathological hallmark of AD. 10 The apolipoprotein E (APOE) ε2 and ε4 alleles defined by two missense variants were first associated with AD in family studies and underlie the strongest signal across genome-wide association studies (GWAS) of AD. [11][12][13][14] Rare variant association studies have also identified protein coding changes associated with AD, 15 though many of these studies have been restricted to analyses of known variants (e.g., ABI3, PLCG2 16 ) or small samples of whole exome sequence (WES) data (e.g., AKAP9, 17 TREM2 18 ). Large GWAS of common variants have implicated dozens of loci but do not implicate the ADAD genes. 13,19 Many of the AD GWAS loci are intergenic, and the specific genes influencing AD risk and pathogenesis within those loci are mostly unresolved. 19 The genes implicated by family studies and GWAS approaches are largely discordant, influenced in part by their study design: family-based studies have better power to detect rare variants with large effect sizes, while GWAS are better powered to identify common variants associated with modest effect sizes but typically representing a single ancestry. Large-scale sequencing efforts like the Alzheimer's Disease Sequencing Project (ADSP 20 ) may resolve the link between GWAS locus and functional variation by directly testing sequence variation rather than genetic markers or imputed genotypes. We hypothesize that the genes implicated in AD risk by these different analytical strategies may represent shared biological pathways.
Instead of relying on a single gene's story, pathway analyses identify enrichment in biological functions among members of a gene set. 21 These approaches have connected genes near GWAS loci to biological processes that may influence AD pathogenesis. 12,13 Pathway analyses are frequently restricted to the genes or loci implicated by a single study rather than the field as a whole and may miss connections with genes implicated by alternative study designs. If the support for a given pathway is strong, one could imagine targeting therapeutic interventions or treatments to those pathways, as opposed to a single gene. 20 Here, we summarize the genes implicated by the ADSP Discovery Phase publications and place them into the larger context of AD genetics. We compare the genes implicated by the ADSP with genes underlying familial dementias and genes prioritized in a recent meta-analysis of AD GWAS representing > 90,000 subjects (35,274 cases and 59,163 controls) 13 or an AD genetics literature review. 22 Gene set enrichment analyses identify biological processes implicated by these three different avenues of AD genetics research. We hypothesize that the genes implicated by the ADSP will provide greater resolution within established AD pathways and may implicate new pathways relevant to disease.

2.3
The AD sequencing project gene set Genes with evidence for a relationship with AD risk were extracted from ADSP Discovery Phase publications using permissive filters.
Genes from the family-based WGS studies were extracted if they met one or more of the following conditions: (1) variation in genes belonging to the familial dementia gene set which either was previously reported as pathogenic or co-segregated with AD in at least one family within the ADSP, (2) variation within genes from the AD GWAS gene set with either evidence for association with AD or co-segregation in 2+ families, or (3) variation co-segregating with AD in 2+ families within a multi-family linkage region. Genes from the ADSP WES studies were extracted if their support met at least one of the following conditions: (1) variation with exome-wide significant evidence of association at the variant or gene level or (2) variation includes rare coding variants in 10+ cases and no controls. All gene names were verified using the multi-symbol checker developed by the HUGO Gene Nomenclature Committee (HGNC) multi-symbol checker.
Genes meeting these permissive criteria underwent structured literature reviews by two investigators, and the two earliest references supporting a link between AD and the gene were recorded where available. First, we searched for "gene" AND "Alzheimer" in PubMed and reviewed the entries from oldest to newest. We then reviewed the Online Mendelian Inheritance in Man (OMIM 32 ) for each gene for a connection to AD. Finally, we searched for '"gene" and "Alzheimer"' and reviewed the first two pages of matches for references supporting the gene to AD link using https://scholar.google.com (last accessed March 22, 2021). Papers were included as evidence of a connection between the gene and AD if the gene was associated with AD-specific changes in genotype or gene expression, or AD-specific endophenotypes, pathology, or biomarkers in humans or animal models at a study-wide statistical significance level. References were excluded from the review if the research was an abstract for a conference, part of a dissertation, not published in English, or linked only to an AD risk factor (e.g., aging).
Genes with at least one external publication supporting a link to AD were included in the ADSP-derived gene set (ADSP+) used for pathway analysis.

Gene set enrichment analysis
Gene sets were provided to STRING-db (v11.0 33

ADSP+, AD GWAS, and familial dementia gene sets
Across the eight ADSP Discovery Phase studies, 9,25-31 64 genes met our permissive criteria (Table S2 in supporting information). Independent support for a link to AD was identified for the majority of these genes (43/64, 67%), defining the ADSP+ gene set ( Table 1). Most of these genes were reported in a single ADSP Discovery Phase study, though TREM2 appeared in four studies. 27,28,30,31 Much of the literature support for the ADSP+ genes come from functional studies, rather than statistical associations ( Figure 1, Table S2). Studies identifying genes differentially expressed in AD supported the highest number of genes (15 genes), closely followed by studies of genes related to changes in AD pathology (12 genes) or animal models (12 genes), GWAS or single nucleotide polymorphism (SNP) association studies (9 genes), linkage analyses (5 genes), and WES/WGS studies (3 genes).
The relatively sparse support from WES/WGS studies almost certainly reflects the relative scarcity of large sequencing studies of AD prior to the ADSP.

Gene set enrichment analysis
The genes within the ADSP+ gene list exhibit significant evidence of interaction and represent many biological pathways. The ADSP+ genes exhibit significant PPI enrichment (P = 8.36E-03), with seven PPI edges observed between 43 nodes when two edges were expected

DISCUSSION
While the genetic architecture and etiology of AD remains only partially understood, our structured literature review and gene set enrichment analyses suggest that WGS and WES studies may fill in some of these gaps while also providing support for pathways previously implicated in AD. Although each gene set provided a long list of candidate genes with few overlapping genes, the ADSP+ gene set was enriched in biological processes also implicated by the familial dementia genes, AD GWAS genes, or both. This suggests the alternative strategies used to associate these genes with AD point to shared mechanisms of disease.
The presence of pathways associated with regulation of Aβ clearance, endocytosis, regulation of phosphorous metabolic process, immune system process, and regulation of MAPK cascade in all three gene sets support candidate and gene pathways nominated by AD GWAS. [36][37][38] The relationship between regulation of Aβ clearance (GO:1900221) and cholesterol efflux (GO:0033344) pathways and AD are well established. 39,40 The regulation of Aβ clearance is directly related to the hallmark pathologic features of AD and offers a connection between the genes implicated in late-onset AD 41 and ADAD.
Similarly, the relationship between cholesterol efflux and AD has been of interest since the association between APOE and AD was first reported. 11 The ADSP+ studies also provide unique genes to these commonly implicated pathways, further elucidating the mechanisms by which these pathways contribute to the progression of AD.
Among the pathways significantly enriched only by the ADSP+ gene set, one of the most strongly associated processes is positive regulation of microtubule polymerization (GO:0031116; FDR = 0.0026; AKAP9 and MAPT; Table 2). Microtubule polymerization events play important roles in synaptic plasticity and function, 42 biological processes highlighted by a recent family-based WGS study of AD. 43 Tau stabilizes microtubule polymerization, promoting microtubule assembly, 44 and neurofibrillary tangles of tau are another hallmark of AD pathology. 1 Post-translational modifications of tau are known to contribute to neurodegenerative aggregation and affect the ability of tau to promote microtubule polymerization. 45 Microtubule deficiencies in brain tissue are significantly associated with clinical AD status, 46 and variation at the MAPT locus has been associated with AD among APOE ε4 negative subjects. 47 Although AKAP9 is specific to the ADSP+ gene set in this study, it was evaluated by the ADSP as a candidate gene with prior evidence of association with AD. 26 Other AD sequencing studies have identified rare variants with large effect sizes in AKAP9, 17,48 and variants in AKAP9 were nominally associated with AD in a recent GWAS of African American samples. 14 AKAP9 mutations enhance phosphorylation of tau, 49  Bayesian networks to model relationships between epigenomic and transcriptomic data to identify AD networks, where protein phosphorylation and synaptic signaling were identified as differential subnetworks associated with AD. 50 We have shown that large-scale sequencing studies like the ADSP bring attention to new genes and biological processes implicated in AD while providing support for biological processes previously nominated by GWAS and family studies. Furthermore, the frequency with which AKAP9 contributed to both new and established AD pathways and evidence from functional studies that it relates to tau-mediated AD pathology strengthens the evidence it may play a role in AD risk and pathogenesis.
Our study has several limitations. The ADSP study design included a complicated ascertainment strategy, favoring families with many cases and few APOE ε4 alleles, while age, sex, and APOE genotype were used to select cases and controls with reduced risk of developing AD. 20 The sample size of the ADSP Discovery Phase was much smaller than the large-scale GWAS conducted in recent years. 12,13 The WGS data in the While gene set enrichment analysis is a useful tool for providing biological context for genes, there is no single gold-standard approach. This study focused on GO: Biological Processes, as our approach accounted for the ontological relationships between processes and this approach has been widely used in AD genetics studies (e.g., Jansen et al. 12