Apolipoprotein E region molecular signatures of Alzheimer's disease

Summary Although the APOE region is the strongest genetic risk factor for Alzheimer's diseases (ADs), its pathogenic role remains poorly understood. Elucidating genetic predisposition to ADs, a subset of age‐related diseases characteristic for postreproductive period, is hampered by the undefined role of evolution in establishing molecular mechanisms of such diseases. This uncertainty is inevitable source of natural‐selection–free genetic heterogeneity in predisposition to ADs. We performed first large‐scale analysis of linkage disequilibrium (LD) structures characterized by 30 polymorphisms from five genes in the APOE 19q13.3 region (BCAM,NECTIN2,TOMM40,APOE, and APOC1) in 2,673 AD‐affected and 16,246 unaffected individuals from five cohorts. Consistent with the undefined role of evolution in age‐related diseases, we found that these structures, being highly heterogeneous, are significantly different in subjects with and without ADs. The pattern of the difference represents molecular signature of AD comprised of single nucleotide polymorphisms (SNPs) from all five genes in the APOE region. Significant differences in LD in subjects with and without ADs indicate SNPs from different genes likely involved in AD pathogenesis. Significant and highly heterogeneous molecular signatures of ADs provide unprecedented insight into complex polygenetic predisposition to ADs in the APOE region. These findings are more consistent with a complex haplotype than with a single genetic variant origin of ADs in this region.

disease have been reported for genetic variants on chromosome 19 in the APOE gene region (19q13.3). The APOE e4 allele is associated with increased risk of ADs (Corder et al., 1993) and remains the most notable genetic risk factor for AD development in various populations (Raichlen & Alexander, 2014).
APOE encodes a protein involved in lipid homeostasis. In the brain, ApoE plays a role in astrocyte-mediated amyloid-beta degradation (Koistinaho et al., 2004), supporting the amyloid cascade hypothesis of AD development (Hardy & Higgins, 1992). However, some researchers contend that variants of other genes in the APOE region play a role in AD development. For example, the mitochondrial cascade hypothesis holds that TOMM40 plays a role in AD development through the regulation of mitochondrial biogenesis (Roses et al., 2010;Swerdlow, Burns & Khan, 2014). NECTIN2, which flanks TOMM40, encodes a plasma membrane component of adherens junctions that also serves as an entry mediator for certain mutant strains of herpes simplex virus. The pathogen hypothesis thus holds that NECTIN2 is a causative factor in the development of ADs (Harris & Harris, 2015;Martin et al., 2000).
Despite considerable progress in research into genetic predisposition to ADs, with the greatest advances involving APOE research, progress in the development of therapeutic interventions has been slow, with a success rate of only 0.4% in clinical trials conducted between 2002 and 2012 (Cummings, Morstorf & Zhong, 2014). The corresponding 99.6% failure rate indicates that the mechanisms underlying the development of ADs remain poorly understood. A fundamental difficulty in elucidating the genetics of AD and other complex age-related diseases characteristic of postreproductive life is the undefined role of evolution in establishing the disease mechanisms (Nesse, Ganten, Gregory & Omenn, 2012). This problem is complicated by recent changes in human life expectancy (Oeppen & Vaupel, 2002) and the fitness landscape (Corella & Ordovas, 2014;Crespi, Stead & Elliot, 2010;Kulminski, 2013;Vijg & Suh, 2005).
Evolution-related factors are inevitable sources of genetic heterogeneity in determining predisposition to ADs. Heterogeneity in the strongest genetic risk factor for ADs, the APOE e4 allele, is evidenced by differences between geographic gradients in the frequency of the e4 allele among AD-affected and general populations.
Indeed, for Caucasians, the AD gradient ranges from 40.5% in Southern Europe to 61.3% in Northern Europe , whereas in the general population, the gradient is much wider, ranging from 10% to 15% in Southern Europe to 40%-50% in Northern Europe (Gerdes, 2003). This heterogeneity suggests that individuals carrying the e4 allele might not develop an AD. These observations are supported by genetic studies, which show that even homozygous e4 carriers might not develop an AD (Corder et al., 1993).
Here, we examined the complex molecular landscape of the APOE region, harboring five genes (BCAM, NECTIN2, TOMM40, APOE, and APOC1) and represented by 30 single nucleotide polymorphisms (SNPs) available from common genotyping arrays, by performing the first reported large-scale analysis of linkage disequilibrium (LD) structures in five cohorts comprising 2,673 AD-affected and 16,246 unaffected individuals. We also examined heterogeneity in cross talk between these genes, characterized by complexity of the LD structures. Consistent with the undefined role of evolution in establishing mechanisms of age-related traits, our results show that the heterogeneous molecular landscape of the APOE region in AD-affected individuals differs from that in unaffected individuals.
As unprecedented insight into the human nature of ADs, this difference demonstrates that ADs are associated with highly heterogeneous molecular signatures spanning the entire region of all five genes, more consistent with a complex haplotype than with a single genetic variant origin of ADs.

| Study overview
Data were obtained from the Late-Onset Alzheimer Disease Family Study (LOADFS), Health and Retirement Study (HRS), Cardiovascular Health Study (CHS), and the Framingham Heart Study original (FHS) and offspring (FHSO) cohorts (Tables 1 and S1). The analyses focused on 30 SNPs in the Hardy-Weinberg equilibrium (HWE), p HW > 10 À3 , that do not exhibit strong LD (r 2 < 0.8), representing the BCAM, NEC-TIN2, TOMM40, APOE, and APOC1 genes in region 19q13.3 (Table S2). For cross-platform comparisons, we used directly genotyped and imputed SNPs. Sensitivity analyses were performed using directly genotyped SNPs only. The primary analysis focused on the LD structure of the 19q13.3 region, as represented by the 30 selected SNPs, and on contrasting the LD patterns between AD-affected and unaffected individuals of Caucasian ancestry, men and women combined. Affliction status (cases) was characterized as the presence of an AD, defined as a dementia of Alzheimer type (n = 2,673). Individuals without an AD (n = 16,246) were classified as noncases. As expected, cases were mainly from the LOADFS (designed as a case-control study), and they were typically from earlier birth cohorts (Table S1).
Unless explicitly stated, the results of LD analyses are presented using a haplotype-based method (details in Section 4).
Consistent with previous studies (Deelen et al., 2011;Fortney et al., 2015), our association analyses showed that minor alleles of rs2075650 and rs157580 (TOMM40) were associated with higher and lower risk of AD development, respectively (Table S2). The effect directions were consistent in all studies; the effect sizes varied markedly, ranging from 0.380 (p = 0.014) in the FHS cohort to 1.45 (p = 1.65 9 10 À78 ) in the LOADFS cohort for rs2075650 and from À0.052 (p = 0.612) in the FHS cohort to À0.705 (p = 5.98 9 10 À23 ) in the LOADFS cohort for rs157580 ( Figure S1 and Table S2).

| LD structure of the APOE region
We first examined the LD structure of the selected region in the large pooled sample of all cohorts (cases and noncases combined). It was represented by three heterogeneous clusters mapped to the BCAM and NECTIN2 genes and the TOMM40-APOE-APOC1 locus ( Figure 1). Each cohort exhibited the same structure (Table S3) Large proportion of AD cases in LOADFS is due to case-control design.
Large proportion of AD cases in FHS is due to older age of participants of this cohort at the end of follow-up (mean age is 91.4 years) and larger proportion of women (66.7%) who are at higher risk of AD. a Information on age at onset of AD in LOADFS was not known for all cases.
F I G U R E 1 LD structure of the APOE region. LD (r 2 , %) is shown in the pooled sample of all studies, cases and noncases combined, for 30 SNPs from the BCAM, NECTIN2, TOMM40, APOE, and APOC1 genes. All r 2 > 0 were significant after conservative (because most SNP pairs could not be considered independent) Bonferroni correction, p < p Bonf = 0.05/435 (= 30 9 29/2) = 1.2 9 10 À4 (Table S3). Numbers 1-3 to the right show three patterns of LD between SNPs from the TOMM40-APOE-APOC1 and BCAM-NECTIN2 loci. Pattern 1 is defined by a stronger LD for the BCAM-NECTIN2 SNPs with rs157580 than rs2075650. Pattern 2 is defined by about the same modest LD for the BCAM-NECTIN2 SNPs with rs2075650 and rs157580. Pattern 3 is defined by weak LD for the BCAM and NECTIN2 SNPs with rs2075650 and rs157580. LD structure for each cohort separately is presented in Table S3. Functional annotation of the 30 selected SNPs is given in Table S5A-C from the BCAM-NECTIN2 locus were in low-to-moderate LD with SNPs from the TOMM40-APOE-APOC1 locus.

| Molecular signatures of ADs
Then, we evaluated the LD structure for SNPs in the APOE region for cases and noncases separately and contrasted LD patterns between these groups using haplotype-and genotype-based methods. We used these two methods because differences in the LD estimates from them are informative of deviation from HWE. This information is important because HWE in the entire sample does not guarantee HWE in subsamples, and thus, the observed deviation from HWE may be biologically plausible. Although such deviation can be readily identified by estimating HWE in subsamples separately (e.g., p HW = 5.94 9 10 À7 in cases, whereas p HW = 0.665 in noncases for rs11668536; Table S2), it can also occur regardless of HWE in subsamples at the haplotype level, that is, when D AB 6 ¼ D AB , which is difficult to detect in stratification analyses (see Section 4).
We found that the LD patterns estimated using the haplotypebased method differed significantly between cases and noncases in the pooled sample of all cohorts (p < 2 9 10 À4 ) and each of the four cohorts: LOADFS (p < 2 9 10 À4 ), HRS (p = 1.8 9 10 À2 ), CHS (p = 1.4 9 10 À3 ), and FHSO (p = 1.5 9 10 À2 ). In the FHS cohort, the difference was not significant (p = 0.908). The patterns of the F I G U R E 2 LD structure in AD-unaffected individuals and the difference in LD in subjects with and without ADs. Upper-left triangle: LD pattern (r, %) in the pooled sample of all studies, noncases, for 30 SNPs. Lower-right triangle: heat map for Dr representing the molecular signature of ADs as the difference in LD in subjects with and without AD. The difference Dr was defined as Dr = r cases -r noncases if LD coefficients r were of opposite signs in cases and noncases (yellow and purple); otherwise, Dr was defined as Dr = |r cases |-|r noncases |. Red denotes r cases > r noncases , and blue denotes r cases < r noncases . Numbers 1-3 after SNP IDs indicate patterns shown in Figure 1. Legend on the right shows color-coded p-values of difference Dr. We used r rather than r 2 here to emphasize that r can be of opposite sign in cases and noncases. The heat map shows that LD in cases changes for vast majority of SNPs in the entire region spanning all five genes. Figure S3 shows the heat map for r 2 . Numerical estimates are shown in Table S3 differences represent molecular signatures of ADs in this genetic region. The molecular signatures in the pooled sample are illustrated by heat maps for Dr ( Figure 2) and Dr 2 ( Figure S3).
Our analysis identified 173 of 435(= 30 9 29/2) SNP pairs (39.8%) with Dr values significant at the Bonferroni-adjusted level: p ≤ p Bonf = 1.2 9 10 À4 . For 27 additional SNP pairs, we observed suggestive significances: p Bonf < p < 10 À3 . Of these 200 SNP pairs (46.0%), the correlation coefficients r for 17 SNP pairs with significant Dr (p ≤ p Bonf ) were in opposite directions for AD cases and noncases. Such significant differences could be missed when using r 2 statistics ( Figure S3). Figure 2 illustrates the complex rearrangement of LD in cases compared with noncases spanning the entire region.
Molecular signatures of ADs estimated using the genotype-based method ( Figure S4 and Table S4) were qualitatively the same as those estimated using the haplotype-based method, with significant differences observed between cases and noncases in the pooled sample of all cohorts (p < 2 9 10 À4 ) and in each of the four cohorts: LOADFS (p < 2 9 10 À4 ), HRS (p = 0.022), CHS (p = 0.014), and FHSO (p = 0.024), but not in the FHS (p = 0.926). The genotypebased method provided 140 SNP pairs significant at p < p Bonf , of which 135 SNP pairs attained p < p Bonf , and five SNP pairs were of suggestive significance (p Bonf < p < 10 À3 ) according to the haplotype-based method. For 28 additional SNP pairs, we observed suggestive significance (p Bonf < p < 10 À3 ) according to the genotypebased method.
Notably, Figure 3 shows that the molecular signatures of ADs for seven SNPs in the TOMM40-APOE-APOC1 locus were mostly consistent in all cohorts. Accordingly, all pairwise estimates of Dr in the pooled sample (except the rs8106922 [TOMM40] and rs405509 [APOE] pair) were significant at p < p Bonf . Figure 3a shows that the molecular signature of AD in this locus was associated with increasing LD for some SNP pairs and decreasing LD for others, in cases as compared with noncases. For example, LD of the rs2075650 SNP (TOMM40), with the minor allele exhibited a strong detrimental association with ADs (Table S2), increased with all of the other six SNPs in this locus, whereas LD of rs157580 (protective association) increased with rs2075650, rs440446, rs439401, and rs12721046 and decreased with rs8106922 (TOMM40) and rs405509 (APOE). As shown in Figure S5, the observed patterns in the TOMM40-APOE-APOC1 locus were not altered by imputation. Lower-right triangle: heat maps for Dr 2 ¼ r 2 cases À r 2 noncases representing the molecular signature of ADs in this locus. Color in (a) codes p-values; color in (b-f) codes Dr 2 (see legend). Numerical estimates are shown in Table S3. See Figure S5 for heat maps for directly genotyped SNPs KULMINSKI ET AL. is observed within these ad hoc LD patterns, as they held for the BCAM and NECTIN2 SNPs regardless of LD. For example, pattern 1 was the same for rs12610605 and rs11673139 (all NECTIN2), despite the low LD between these SNPs (r 2 = 2%). The same pattern was observed for SNPs #12 and #19, which exhibited higher LD (r 2 = 37%) and for SNPs #9 and #18, which exhibited even higher LD (r 2 = 72%) ( Figure 1). Pattern 2 also held regardless of LD between the BCAM and NECTIN2 SNPs (e.g., LD between SNPs #7 and #10 was <1%). Despite the apparently modest LD for SNPs between the BCAM-NECTIN2 and TOMM40-APOE-APOC1 loci (r 2 < 20%), these patterns were consistent in all cohorts of cases and noncases combined and noncases only (Figure 4a,b).

BCAM-NECTIN2 and TOMM40-APOE-APOC1 loci in AD-affected and unaffected individuals
As shown in Figure 2, there were significant changes in LD between cases and noncases for 23 SNPs from the BCAM-NECTIN2 locus with 7 SNPs from the TOMM40-APOE-APOC1 locus. This rearrangement of the LD structure in cases compared with noncases is indicative of AD-specific cross talk between these genes. Figure 4 shows that these three ad hoc LD patterns in cases substantially differ from those in noncases. Visual analysis of Figures 2 and S3 also suggests that changes in LD between cases and noncases differ between these ad hoc LD patterns. To quantify this visual insight, we analyzed the correlation (Pearson two-tailed test) in the change in LD magnitude between cases and noncases (characterized by Dr 2 ¼ r 2 cases À r 2 noncases ) with the LD in noncases (r 2 noncases ) for SNP pairs comprised of the intersection of the 23 SNPs from the BCAM-NECTIN2 locus and seven SNPs from the TOMM40-APOE-APOC1 locus. In this analysis, we used 58 SNP pairs for which Dr was significant at p < p Bonf = 1.2 9 10 À4 (Figure 2). A significant inverse correlation was observed between Dr 2 and r 2 noncases for these pairs (r Pearson = À0.58, p = 2.0 9 10 À6 ), which was driven by SNPs from pattern 1 (r Pearson = À0.82, p = 6.8 9 10 À10 ) (n S = 37). For SNPs from patterns 2 and 3, we observed a significant direct correlation (r Pearson = 0.72, p = 2.1 9 10 À4 ) (n S = 21) that was consistent for each pattern (i.e., r Pearson = 0.81, p = 2.4 9 10 À3 [n S = 11] for pattern 2 and r Pearson = 0.64, p = 4.4 9 10 À2 [n S = 10] for pattern 3).

| LD and minor allele frequency (MAF)
We also examined whether differences in MAF between cases and noncases create the molecular signatures of ADs (Figure 2), even though the complex heterogeneous structure of the AD molecular signatures suggested that this is unlikely. To examine this question quantitatively, we compared the correlation between Dr and change in the differences in MAF between cases and noncases (i.e., DMAF = ½SNP 1 cases À SNP 1 noncases -½SNP 2 cases À SNP 2 noncases ) for the same SNP pairs. There was no significant correlation for the SNP  (Table S5A). Another study also reported that rs405509 regulates APOE promoter activity (Artiga et al., 1998). Rs387976 (NECTIN2), located in an open chromatin region, seems to mediate transcription factor binding (Song et al., 2011). Ten functional variants were found to exist in active expression states in multiple cell lines, ranging from 1 (rs387976) to 63 (rs157580 [TOMM40]) of 68 cell types with data available from Ensembl. All of these variants can be in a poised state in one or more cell types (i.e., they can be epigenetically activated at a later stage in development or in response to exogenous stimuli) F I G U R E 4 LD between selected SNPs from the BCAM-NECTIN2 and TOMM40-APOE-APOC1 loci. SNPs rs1871046, rs377702, and rs8104483 are representative of patterns 1 (light/dark blue), 2 (gray/black), and 3 (light/dark red), respectively, as shown in Figure 1. Symbols denote samples. The 95% confidence intervals are shown in the pooled sample of all cohorts ("All"). (a) A sample of cases and noncases combined. (b) Noncases only. (c) Cases only. LD for the other SNPs from these patterns in the pooled sample of all cohorts is illustrated in Figure S2 ( Creyghton et al., 2010;Murao, Noguchi & Nakashima, 2016;Puri, Gala, Mishra & Dhawan, 2015). Two variants, rs1871046 (NECTIN2) and rs440446 (APOE), were characterized as being a poised expression state in many cell types (20 and 32 epigenomes, respectively).
In addition, seven of 10 regulatory variants exhibited a poised epigenetic signature in normal human astrocytes (NHAs), and one of the variants (rs439401) was found to be active in these cells.  (Table S5B). The analysis showed that 26 variants affect regulatory motifs, 11 variants bind regulatory protein, and most of the SNPs were identified as eQTL for the five genes in the APOE region in at least one cell type. Variants rs387976, rs12610605, and rs8105340 (all NECTIN2) were identified as possible eQTL for the RELB, GEMIN7, and PVR genes, respectively, in specific cell types. Six variants (rs1871046, rs157580, rs2075650, rs405509, rs440446, and rs439401) were found to have multiple regulatory features (from 7 to 23) in a variety of tissues.
Most SNPs were also identified as eQTL using GTEx (Consortium 2015) for the same protein-coding genes in which they are located (Table S5C). Variant rs11667640 (NECTIN2) was also predicted to be an eQTL for the nearby APOC2 gene. Variant rs283813 (NECTIN2) F I G U R E 5 Heat map for the difference of MAF between AD cases and noncases. The difference was defined as DMAF = (SNP 1 cases À SNP 1 noncases ) -(SNP 2 cases À SNP 2 noncases ). Numbers 1-3 after SNP IDs indicate patterns shown in Figure 1. Legend in the inset shows color-coded difference in MAF was predicted as an eQTL for the nearby BCAM gene, which is expressed in skin, and for ZNF155, which is expressed in the putamen region of the brain. The ZNF155 gene maps to a zinc finger gene cluster located on 19q13. Highly significant expression of all of these protein-coding genes was detected in multiple tissues.

| DISCUSSION
Our results provide compelling evidence that AD in humans in the APOE region is associated with a highly heterogeneous molecular signature represented by the pattern of the differences in LD structures between AD-affected and unaffected individuals (Figure 2). This signature includes SNPs from all five genes (i.e., it spans the entire region). Remarkably consistent AD signatures were observed in the locus consisting of the TOMM40, APOE, and APOC1 genes (Figure 3). Our results show that the molecular signatures of ADs cannot be explained by differences in the MAF for AD-affected and unaffected individuals. Accordingly, the molecular signatures of ADs are consistent with the cis-(haplotype) rather than a single allele origin of ADs (Jazwinski et al., 2010;Lescai et al., 2011). Whether the molecular signatures of ADs include the APOE, e4 allele remains unclear. However, because rs2075650 in Caucasians is typically in modest LD with rs429358 (r 2~0 .5), which defines the APOE e4 allele, the signature likely includes the e4 allele. Assuming an evolutionary origin of LD structures in unaffected individuals, these results are consistent with the uniquely human nature of ADs, which are sensitive to the modern environment (Finch, 2012).
Finally, the results of our bioinformatics analysis show that 10 of 30 SNPs in the APOE region are regulatory variants in active expression states in a variety of tissues in from 1 to 63 of the 68 cell types available for analysis. Variant rs439401 in the APOE-APOC1 intergenic region, which includes a specific astrocyte enhancer for the APOE gene (Grehan, Tse & Taylor, 2001), is the only variant among the 10 active expression SNPs that is active in NHAs. However, seven other variants exhibit a poised epigenetic signature in NHAs. Astrocytes have important functions in brain development, physiology, and health. They also serve as neural stem cells in the adult brain and have been implicated in various pathologic processes, including ADs (Pekny et al., 2016). Recent data indicate that the function of astrocytes can change from potent pro-inflammatory to potent anti-inflammatory in response to regulatory signals (Sofroniew, 2015). Genes in poised expression states in relevant cell types can be activated by changes in the epigenome later in development or by environmental cues (Creyghton et al., 2010;Murao et al., 2016;Puri et al., 2015). These observations strongly suggest that AD development could be the result of a complex transcriptional regulatory structure modulating regional gene expression (Fitzsimons et al., 2014) supported by clustering of alleles in the molecular signatures identified in the APOE region.
Given the functional role of BCAM, NECTIN2, TOMM40, APOE, and APOC1 genes and the regulatory activity of variants in these genes, the molecular signatures elucidated in the present study could be associated with increased risk of developing an AD by increasing susceptibility to brain infections (Itzhaki et al., 2016;Porcellini, Carbone, Ianni & Licastro, 2010).
Despite the rigor of this study and reliability of our results, there are potential limitations. First, the available data did not allow us to investigate the role of the APOE e2 and e4 alleles in the identified molecular signatures of ADs. Second, we did not compare the molecular signatures of ADs in males and females because of the limited sample size. Third, despite validation of our findings in four independent studies, further replication in larger samples would improve the characterization of the molecular signatures of ADs.
In conclusion, significant and highly heterogeneous molecular signatures of ADs provide unprecedented insight into complex polygenetic predisposition to ADs in the APOE region. These findings are more consistent with a complex haplotype than with a single genetic variant origin of ADs in this region.

| Data availability
This manuscript was prepared using a limited access datasets obtained through dbGaP and the University of Michigan. Phenotypic HRS data are available publicly and through restricted access from http://hrsonline.isr.umich.edu/index.php?p=data.

| Experimental design
We used data from five cohorts (described below) to examine linkage disequilibrium (LD) structure of the APOE region spanning five genes (BCAM, NECTIN2, TOMM40, APOE, and APOC1), using 30 single nucleotide polymorphisms (SNPs). The selected SNPs did not exhibit strong LD (r 2 < 0.8) and were directly genotyped in at least two cohorts. We focused on analysis of the LD structure in the entire sample of individuals of Caucasian ancestry (men and women combined) and on comparative analyses of the LD structures in individuals affected and unaffected by Alzheimer's disease (AD).
These SNPs were selected because they were not in strong linkage disequilibrium (LD), with r 2 < 0.8, and were directly genotyped in at least two cohorts. We excluded individuals with >5% missingness.
To facilitate cross-platform comparisons, we selected directly genotyped target SNPs or their proxies (r 2 > 0.8 in the 1,000 Genomes Project, CEU population) using all available arrays for each study.

| Association analysis
Associations between ADs and each of the 30 selected SNPs were evaluated using an additive genetic model with the minor allele as an effect allele. Given limited information on AD age at onset in the LOADFS, the associations in this study were characterized using a logistic model with AD as a binary outcome and random effects to adjust for potential familial clustering (gee package in R). Associations in the other studies were evaluated using the Cox proportional haz-

| LD analysis
LD was characterized by the correlation coefficient r using haplotypebased (Weir, 1979) and genotype-based (Zaykin, Meng & Ehm, 2006) methods. Specifically, the haplotype-based method evaluates r as where p i (i = A,B) are allele frequencies in two SNPs, is the LD coefficient, and h 1 is the frequency of a haplotype AB. This method assumes Hardy-Weinberg equilibrium (HWE), which may or may not hold in subsamples and/or at the haplotypic level, even when SNPs in a sample are in HWE (Nielsen, Ehm & Weir, 1998).
Haplotype frequencies were evaluated using an expectation-maximization algorithm (haplo.stats package in R).
The genotype-based method evaluates r without assuming HWE, where D AB is the composite LD coefficient, defined as and h 0 1 is the joint frequency of alleles A and B at two different gametes (Weir & Cockerham, 1979). D A and D B are HW disequilibrium coefficients at these SNPs. In the case of , so D AB is an unbiased estimate of the LD parameter D AB . Therefore, inequality D AB 6 ¼ D AB characterizes deviation from the HWE at the haplotypic level, which otherwise could be difficult to detect (Nielsen et al., 1998).
Significance of the r 2 estimates was characterized using chisquare statistics, defined as v 2 = r 2 N, where N = 2n is the number of gametes and n is the sample size (Lewontin, 1988). Given potential loss of power due to inferring haplotypes from genotypes (Wellek & Ziegler, 2009), we used a more conservative estimate, with n instead of N.
We employed a LD contrast test (Zaykin et al., 2006) to compare the LD patterns between the AD-affected and unaffected groups.
Given a set of K SNPs, we adopted the Z 2 statistic, Z 2 = trace ((r 1r 0 )T (r 1r 0 )), where r 1 and r 0 are the matrices of the LD correlation coefficients for AD-affected and unaffected individuals, respectively. This statistic was used to characterize the significance of the overall difference in LD patterns between these two groups and the differences in pairwise estimates of LD between the groups.
In the latter case, considering a pair of SNPs, r 1 and r 0 are simplified into a 2 9 2 matrix with two off-diagonal coefficients representing the LD coefficient. Then, we have Z 2 = 2(r 1r 0 ) 2 . To contrast LD between the AD-affected and unaffected groups, we used a permutation procedure by shuffling the case and noncase labels (Krzanowski, 1993) to obtain an empirical distribution of Z 2 under the null hypothesis r 1 = r 0 , from which a p-value was computed.
Tests contrasting the entire LD patterns of the AD-affected and unaffected groups provide statistics for the association of the pattern of differences in LD between these two groups with ADs, called the molecular signature of ADs. The statistic for a given pair of groups does not require multiple testing correction. Significance of the r 2 estimates and the differences in the pairwise estimates of LD should be corrected for multiple testing. In the case of the 30 SNPs examined, this represented 435 (=30 9 29/2) tests. We adopted a conservative Bonferroni correction for significance, p < 1.2 9 10 À4 , despite some correlation between these SNPs.
Asymptotically valid confidence intervals were constructed using asymptotic variance adapted from (Wellek & Ziegler, 2009). This asymptotic variance closely coincided with the exact variance in a sample of n ≥ 60 individuals.

| Functional annotation
Functional features and activity levels of the selected SNPs were annotated using the Ensembl variant effect predictor (McLaren et al., 2010) for 68 cell types. Information on expression quantitative trait loci was obtained from the GTEx pilot analysis, v6 (Consortium G, 2015). Chromatin state and protein binding annotation (Roadmap Epigenomics and ENCODE projects), and the effects of SNPs on regulatory motifs were annotated using HaploReg (Ward & Kellis, 2012) v.4.1 (http://archive.broadinstitute.org/mammals/haploreg/haploreg. php).

ACKNOWLEDG MENTS
This research was supported by Grants No P01 AG043352 and R01AG047310 from the National Institute on Aging. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the zmanuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The authors thank Arseniy P. Yashkin for help in preparation of phenotypes in HRS.

AUTHOR' S CONTRIBUTION
A.M.K. conceived and designed the experiment and wrote the paper, J.H., J.W., L.H., and Y.L. prepared data, coded statistical tests, and performed statistical analyses. I.C. performed bioinformatics analysis.

CONFLI CT OF INTEREST
The authors declare that they have no conflict of interest.