Sex‐ and APOE‐specific genetic risk factors for late‐onset Alzheimer's disease: Evidence from gene–gene interaction of longevity‐related loci

Abstract Advanced age is the largest risk factor for late‐onset Alzheimer's disease (LOAD), a disease in which susceptibility correlates to almost all hallmarks of aging. Shared genetic signatures between LOAD and longevity were frequently hypothesized, likely characterized by distinctive epistatic and pleiotropic interactions. Here, we applied a multidimensional reduction approach to detect gene–gene interactions affecting LOAD in a large dataset of genomic variants harbored by genes in the insulin/IGF1 signaling, DNA repair, and oxidative stress pathways, previously investigated in human longevity. The dataset was generated from a collection of publicly available Genome Wide Association Studies, comprising a total of 2,469 gene variants genotyped in 20,766 subjects of Northwestern European ancestry (11,038 LOAD cases and 9,728 controls). The stratified analysis according to APOE*4 status and sex corroborated evidence that pathways leading to longevity also contribute to LOAD. Among the significantly interacting genes, PTPN1, TXNRD1, and IGF1R were already found enriched in gene–gene interactions affecting survival to old age. Furthermore, interacting variants associated with LOAD in a sex‐ and APOE‐specific way. Indeed, while in APOE*4 female carriers we found several inter‐pathway interactions, no significant epistasis was found in APOE*4 negative females; conversely, in males, significant intra‐ and inter‐pathways epistasis emerged according to APOE*4 status. These findings suggest that interactions of risk factors may drive different trajectories of cognitive aging. Beyond helping to disentangle the genetic architecture of LOAD, such knowledge may improve precision in predicting the risk of dementia and enable effective sex‐ and APOE‐stratified preventive and therapeutic interventions for LOAD.


| INTRODUC TI ON
Alzheimer's disease (AD) is the most prevalent disease among people over 85 years of age in western countries, posing a significant challenge to public health systems around the world. Most AD cases are sporadic or late onset (LOAD; >65 years of age), with biological measures of disease being detectable as early as 20 years before the first cognitive symptoms are observed (Jagust, 2018). Decades of research have shed light on the neuropathological changes happening in the AD brain, and its complex etiology (Long & Holtzman, 2019), characterized by sex differences in several aspects of the disease, including its onset and progression, and the effects of APOE*4 genotype, the strongest common genetic risk factor for LOAD (Nebel et al., 2018). A current challenge is to clarify the contribution of genetic, epigenetic, and environmental factors in the multifactorial nature of LOAD, which shows a heritability of 58%-79%, with a large fraction attributable to the APOE locus. Genetic studies over the last few years, particularly coming from genome-wide association studies (GWAS) and large sequencing projects, have changed the perception of LOAD, highlighting its polygenic nature with multiple susceptibility genetic loci (Andrews et al., 2023). Also, functional genomic analyses pointed out that common LOAD risk variants operate in complex networks of genetic and metabolic interactions, regulated by "hub" genes and "peripheral master regulators" (Gui et al., 2021). In the interactome network of the cell, each variant may show different effects (either in magnitude or in direction) on disease onset, in relation to alleles at other loci (Ridge et al., 2016).
These epistatic effects may contribute substantially to the variation in disease susceptibility; that is, people carrying risk factors for LOAD but resilient to the disease, as well as people carrying the risk allele APOE*4 who live into their 90s without developing dementia.
While advances in technologies and increased availability of multi-omics data have expanded our knowledge about LOAD genetic architecture, many risk variants remain to be identified (Andrews et al., 2023). Such variants may be found in the underlying processes leading to LOAD, such as inflammation (Akiyama et al., 2000), apoptosis (Behl, 2000), stress response (Iatrou et al., 2021), and mitochondrial decay (Kwong et al., 2006). Interestingly, many of these mechanisms are common to human longevity, suggesting that the search of susceptibility factors could be enhanced by testing genes belonging to pathways influencing longevity and survival. As a proof-of-concept alongside APOE, which is the locus consistently associated with longevity across different populations (Abondio et al., 2019), several other LOAD-associated loci were found to affect the human lifespan (Tesi et al., 2021). Moreover, several studies reported LOAD risk variants that are associated with both AD and longevity (Bacalini et al., 2022;Dato et al., 2021;Tesi et al., 2021), emphasizing the importance of studying the pleiotropic effects and epistatic interactions of genetic variants.
Thus, searching for epistatic interactions between variants in genomic regions selected for their association with longevity may help to unravel some of the missing genetic variance of LOAD. To test this hypothesis, we leveraged a large collection of publicly available LOAD GWAS data and conducted gene-gene interaction analysis to find meaningful associations. Genomic regions were prioritized based on previous work by Dato and coworkers (Dato et al., 2018), who analyzed the joint effect on longevity of SNPs belonging to three candidate pathways, the insulin/insulin-like growth factor signaling (IIS), DNA repair, and stress response, respectively. These pathways were chosen as they regulate fundamental biological processes consistently recognized among the most relevant hallmarks of aging from model organisms to humans (Kuningas et al., 2008) and confirmed to have an important role in human longevity in large cohort studies (Deelen et al., 2019).
The analyzed dataset comprised genotypes for a total of 2469 gene variants from 20,766 subjects (11,038 LOAD cases and 9728 controls), belonging to several LOAD cohorts of Northwestern European ancestry. As we stated above growing evidence indicates sex-specific patterns of disease manifestation and sex-dependent effects of APOE on LOAD risk. Yet, beyond APOE*4, other genetic risk factors have been found that display sex-specific effects on LOAD, and the interplay between sex and the APOE allele has been also explored (Fan et al., 2020). In addition, sex-specific DNA methylation changes in LOAD pathology have been observed (Zhang et al., 2021). On the other hand, also the sex differences in genetic associations with longevity are remarkable (Zeng et al., 2018). Based on this evidence, we reasoned that stratifying genetic association analysis by sex and APOE status may facilitate the identification of sex-specific genetic risk loci and ultimately contribute to the understanding of disease heterogeneity between men and women.
Genotyping was performed using various high-density singlenucleotide variant microarrays across cohorts. Participants or their caregivers provided written informed consent in the original studies. The current study protocol was granted an exemption by the Stanford University institutional review board because the analyses were carried out on deidentified, off-the-shelf data.

K E Y W O R D S
Alzheimer's Disease, epistasis, gene-gene interaction, IGF1, longevity, polymorphism TA B L E 1 Demographic information for individuals in the analysis dataset.

| Genotyping data: Harmonization & imputation
The entire dataset includes 35,110 participants. Adopted SNP-array Quality Control (QC) procedures were like the ones previously reported (Napolioni et al., 2021). Subjects with autosomal missingness >5% and/or X-chromosome missingness >5% (compared to other subjects in the same dataset), age below 60 years, age information missing, or phenotype inconsistency [missing phenotype, diagnosis of mild cognitive impairment, or a neurodegenerative phenotype other than LOAD] were excluded from the analysis.
Since most of the samples belonged to the European population, we also determined their ancestry percentage in the discovery sample according to three genetically distinct European sub-ancestries, Northwestern, Southeastern, and Ashkenazi Jewish, using reference populations available from SNPweights v.2.1. European subjects were stratified into the above-mentioned ancestries when their ancestry percentage was > = 50% for any of the three sub-ancestries.
Subjects with genetic ancestry estimates discordant from selfreported ancestry, as well as for subjects showing sex-inconsistency, were excluded from the analyses.
After keeping only the EUR subjects, each GWAS dataset was QCed to remove the SNPs with a call rate ≤95%; Minor Allele Frequency (MAF) ≤1%; SNPs with MAF deviating more than 10% from the MAF reported in 1000 Genomes for the EUR population; SNPs with differential missingness between cases and controls (p < 0.05); SNPs deviating from Hardy-Weinberg Equilibrium (HWE) in controls (p < 5 × 10 −5 ); tri-allelic SNPs; and SNPs where the alleles are mismatched compared to the 1000 Genomes reference sequence. A/T and C/G SNPs were removed prior to imputation.
All the datasets were phased and imputed using the TopMed Imputation Server (Das et al., 2016). After imputation, variants with a r 2 info score ≤0.75 were excluded. For the statistical analyses, inter-dataset duplicates (IBD >0.95) were removed from the dataset having the lowest SNP coverage, while, in case of relatedness (IBD >0.0625) the affected or older subjects were kept, independently of SNP coverage.
For association testing analyses, we selected only the Northwestern European subjects since they represented most of the EUR population (approx. 80%) available across the collected GWAS.

| Selection of candidate genomic regions
Genomic regions harboring the genes selected for longevity in previous work (Dato et al., 2018) were queried. Genetic variants from the candidate regions were extracted from the full dataset (approx. 12 million variants), considering their hg19 genomic coordinates (+/− 10 kilobases from the gene boundaries), and further filtered by applying a MAF cut-off of 0.05 and genotyping rate of 0.95. The final list of variants was generated by Linkage disequilibrium pruning (r 2 > 0.75) to reduce the computational burden of the analysis through the removal of highly correlated variants. Finally, the final dataset comprised a total of 2,469 variants (2,360 SNPs and 109 Ins/ dels), as reported in Table S1.

| Statistical analyses
For all the analyses, the whole sample was split based on sex and the presence/absence of APOE*4, defining four groups: female APOE*4 carriers, male APOE*4 carriers, female APOE*4 non-carriers, and male APOE*4 non-carriers.
After the QC phase, a logistic regression analysis, with an additive model of association, was performed using PLINK 2.0 on the filtered dataset composed of 20,766 subjects (11,038 LOAD cases and 9,728 controls) and 2,469 variants, to estimate single-marker effects on the predisposition to LOAD. Age, three principal components from the ancestry analysis, APOE*2 dosage, and study site were used as covariates in the regression models. Results from the univariate analysis of the four study groups were meta-analyzed using GWAMA (Mägi & Morris, 2010). To determine the statistical significance of all the univariate analysis results, a Bonferroni's multiple testing correction was applied [(0.05/2469 variants*4 study groups)], yielding a p-value threshold of 5 × 10 −6 . A nominal p-value < 0.05 was used as a filter for the main effect estimation and for selecting the variants to include in the gene-gene interaction analysis carried out using Multifactor dimensionality reduction (MDR) (Ritchie et al., 2001).

p-value distribution enrichment analysis was performed using
Pearson's chi-square on two-way contingency tables testing the observed number of variants passing the nominal level of statistical significance (p < 0.05) versus the expected one, deeming as statistically significant a Bonferroni's multiple testing correction of p < 0.013 (0.05/4 study groups) for individual study group or p < 0.05 for the meta-analysis.
To plot single and common variants between groups of samples, Venn diagrams were created with VennDiagram R Package version 1.7.1 (https://cran.r-proje ct.org/web/packa ges/VennD iagra m/index.html).
Gene-based association test was performed by Versatile Genebased Association Study-2 version 2 (VEGAS2, https://vegas2.qimrb ergho fer.edu.au/) (Mishra & Macgregor, 2015) particularly useful for analyzing GWAS summary statistics. Based on the association p-values of the individual variants, VEGAS2 sums the effects of all the variants within a gene and generates a gene-based test statistic by doing simulations of the multivariate normal distribution.
The epistatic interaction of up to four bi-allelic variants was tested using MDR (Ritchie et al., 2001). This methodology estimates high-order interactions among variants, with respect to a given phenotype also when their individual effect is small to moderate, allowing the discovery of multi-loci genotype combinations associated with high or low disease risk. An entropy-based clustering algorithm sets a contingency table for k gene variants and calculates case-control ratios for each of the possible multi-loci genotypes; a genotype combination is considered high-level if it is more present in cases compared to controls. For each factor, the MDR interaction model describes percentage of entropy (information gain or IG) and plots a network of two-way interactions, with positive values of entropy indicating synergistic or non-additive interaction while negative entropy values indicate redundancy between the markers or lack of any synergistic interaction between them. In the network, red and orange connections indicate non-linear interactions, green and brown connections indicate independence or additivity, and blue connections indicate redundancy. For all 2-order, 3-order, or 4-order combinations, the best model is considered the one found more consistent in different replicates (expressed as CV, consistency values and accuracy, i.e., training balanced accuracy level).
To calculate significance, permutation testing was applied, dividing the data set into 10 portions, and using nine portions as a training data set, and the remaining as a testing data set. Ten thousand permutations were performed, to determine a cutoff threshold for an alpha = 0.05 significance level. For each order of interaction tested, an odds ratio (OR) is outputted, referring to the best combination of variants (best model), while determining the multi-locus highrisk combinations.

| Single-variant and gene-based analysis
The flowchart in Figure 1 details the study design. After quality control and filtering, we performed logistic regression analysis on the sample divided in four sub-groups according to sex-and APOE*4 F I G U R E 1 Flowchart describing the steps of the analysis after quality control and filtering. status, and meta-analyzing them thereafter. In Table S2 we reported the variants associated with the disease risk at a nominal p-value, along with their chromosomal location, the assigned gene and the main functional pathway associated with each coded protein. A graphical representation of all the obtained p-values is presented as a Manhattan plot ( Figure S1). These associations, however, did not stand upon correction for multiple comparisons (p < 5 x 10 −6 ) either when considering the single study group or when meta-analyzed across the four study groups. Nonetheless, we observed a statistically significant 45% enrichment (p = 0.002) in the distribution of nominally significant variants (p < 0.05) when the four study groups were meta-analyzed (Table 2), even though no statistically significant enrichment was found when analyzing individually the four study groups. Interestingly, we also detected a statistically significant enrichment from the meta-analysis Cochran's heterogeneity statistic's p-values (p = 0.020, Table 2), supporting the existence of APOE*4 and sex-specific effects.
Within the four study groups, the nominally associated variants belong to 126 different genes, of which 20 are in the IIS, 31 in DNA repair, and 23 in stress response pathways, whereas the remaining 46 are related to other pathways, such as immunity and membrane trafficking (see Table S2). These last genes emerged probably because of a high gene density and/or overlapping genes in some of the genomic regions considered, which extend 10 kb upstream and downstream of the candidate gene boundaries.
More specifically, the variants significantly associated in the sub-groups were: 148 in APOE*4 + females, 157 in APOE*4 + males, 115 in APOE*4 − females, and 136 in APOE*4 − males. The topvariants (p < 0.001) were: rs17810889-C8orf49 and rs5742665-IGF1 in APOE*4 + females, rs56190996-IGF1R and rs8113762-IRGQ in APOE*4 + males, rs28362737-AQP1 and rs35519594-XDH in APOE*4 − females, rs3729587-XPC and rs142270994-CTD-3094 K11.1 in APOE*4 − males. The Venn diagram for the gene variants listed in Table S2 shows the number of variants in each sub-group and those shared ( Figure 2). As it is shown in Table 3, 12 variants were associated with LOAD risk in females, independently from APOE*4 status, five of which showed an opposite direction of effect in the two sub-groups (Table 3a). In males, eight variants were associated with LOAD independently from APOE*4 status, five of them showing a divergent effect in the two sub-groups (  (Table 3d).
For testing the enrichment of associated variants in the same gene, we then performed a gene-based analysis using VEGAS2.

| Analysis of epistatic interactions
With the aim of finding gene-gene epistatic interactions between 2or among 3-and 4-markers associated with the disease risk, we used the MDR approach. Table 4 shows the gene-gene interactions for LOAD resulting from the analysis, while Figure 4 depicts the interaction networks between variants in each sub-group.
In APOE*4 + females, two 2-order epistatic interactions  Table S2) and seems to be a LOAD specific marker of APOE*4 + females, not being associated with disease in the other sub-groups.
No significant epistatic interactions were observed in the group of APOE*4 − females (Figure 4b). The variant rs28362737-AQP1 present in the network is the most associated with LOAD in this group, but it seems to have a univariate effect.
In APOE*4 + males (Figure 4c To determine whether the gene-gene interactions found were exclusive between the four sub-groups, we ran MDR using those selected interactions across all the groups. Notably, we observed that the rs62491484-NEIL2/rs35435718-PGPEP1L epistatic interaction found in APOE*4 + females occurred also in APOE*4 − males (p = 0.028) (Table 4). However, the high-risk genotypic combinations were not comparable ( Figure S2). Similarly, the rs11573680-RAD32B/rs4983559-ZBTB42 interaction found in APOE*4 − males occurred also in the APOE*4 + males (p = 0.038) (Table 4), although with a different pattern of high-risk genotypic combinations ( Figure S2).

| Functional annotation analysis
Next, we performed functional annotation to ascertain biological significance of the variants identified in the single variant and interaction analyses. To this end, several databases and tools F I G U R E 3 Schematic representation of gene-based analysis, reporting the top-genes (p < 0.01) associated with the disease in each different sub-group and those shared. Colors represent the three analyzed pathways.

TA B L E 4
Significant results of gene-gene interaction associations, resulting from the MDR analysis.

0.0003
Note: 2-3-and 4-loci interactions are shown, with respective pathways, selected for CV (10/10 replicates) and training balanced accuracy (>0.5). The OR value of the best combination of variants (best model) is reported. p-values reported in bold are considered statistically significant.
were considered, as reported in the Materials & Methods section. Table S4 reports the most relevant results of eQTL analysis of the associated variants and their proxies in high LD (r 2 > =0.8) According to this analysis some of the risk variants act as cis-eQTL regulatory elements which modulate the expression of the corresponding gene or of nearby genes in specific brain regions. Moreover, evidence of a regulatory role (1f or 1b score in RegulomeDb and association with regulatory elements in SNPnexus) was found for other variants associated with the disease phenotype.
Some variants reported a relevant number of proxies in LD (r 2 ≥ 0.8), although none showed evidence of stronger regulatory potential than the lead variant. As reported in Table S4, the significant variants we identified were not previously implicated in LOAD, yet some of them have been reported to affect some age-related disorders.

| DISCUSS ION
The genetic architecture of LOAD has been widely studied in recent years, and so far, 100 of risk genes and related rare and common genetic variants have been identified, but many remain to be uncovered (Andrews et al., 2023). As advanced age is the greatest risk factor for LOAD, shared genetic pathways between LOAD and longevity are expected, although their connections are still not fully F I G U R E 4 Interaction graphs, reporting the significant markers from MDR analysis, in the group of APOE*4 + females (a), in APOE*4 − females, in APOE*4 + males, and in APOE*4 − males (d). For each variant we reported the value of information gain (IG) in per cent, while numbers in the connections indicate the entropy-based IG for the variant pairs. Red bar and orange bar indicate the high-level synergies on the phenotype, while the brown indicate a medium-level interaction, green and blue connections with negative IG values indicate redundancy or lack of synergistic interactions between the markers.
understood (Tesi et al., 2021). Phenotypic heterogeneity (i.e., distinct groups of subjects present with different clinical syndromes) and/ or temporal heterogeneity (i.e., substantial inter-subject variance in age-at-onset and rate of decline) (Young et al., 2018)  In this study, we applied a gene-gene interaction approach to a set of genetic variants selected for being on or near genes involved in the IIS, the DNA repair, and the oxidative stress response pathways, previously linked to human longevity (Dato et al., 2018 (Misawa et al., 2008). In APOE*4 + males, the top-variants were rs56190996, in an intron of IGF1R (insulin-like growth factor receptor), and rs8113762 located at the 3'-UTR of IRGQ (immunityrelated GTPase Q), while in APOE*4 − subjects, the most significant variant was rs3729587 in the DNA repair gene XPC (Xeroderma pigmentosum, complementation group C). Many of these markers appear to be functionally relevant, being cis-eQTL or trans-eQTL, and specifically in brain regions, as reported in Table S4. However, we cannot exclude that one of their proxies in LD (r 2 ≥ 0.8) could be the actual causal variant, although none showed a potential regulatory effect higher than the associated variants.
It is also worth noting that some of the associated variants showed an opposite effect in the two sexes. This is line with several studies demonstrating that some variants, the so-called "sexually antagonistic variants", have a beneficial effect in one sex but deleterious (or null) effects in the other, thus having a role in shaping differences between males and females in age-related disease outcomes as well as in survival (Iannuzzi et al., 2023;Lagou et al., 2021).
Results from gene-based analysis further highlighted genes associated with LOAD in a sex-and APOE-specific manner. INSR, is the only one gene shared among the four sub-groups of patients, which encodes the insulin receptor, an important component of the insulin pathway. Through its binding with insulin, INSR controls the glucose metabolism in the brain helping to maintain neuronal functioning (de la Monte & Wands, 2005). Decline in glucose metabolism is indeed one of the earliest and most common anomalies observed in patients with LOAD (Akhtar & Sah, 2020). A recent study by Leclerc et al. (2023) reported that, in association with β-amyloid pathology, defects in the activation of INSR at the blood-brain barrier strongly contribute to brain insulin resistance in LOAD. Genetic variants of this gene were found enriched in centenarians, thus indicating INSR a key mediator of human longevity (Barbieri et al., 2003). Overall, these findings suggest that in LOAD as in longevity TXNRD1 and even more IGF1R, may represent hubs interconnecting multiple signaling pathways. The different sub-processes may instead explain the sex-and APOE-specific associations we found.
Taken together, these results suggest that longevity loci may also drive LOAD neuropathology through APOE-and sex-related specific gene-gene interactions. The complex interplay among sex, APOE, and age may influence the severity and the temporal trajectory of LOAD progression, creating a risk profile for LOAD that could serve to identify high-risk individuals (Riedel et al., 2016). Mechanisms underlying the sex differences are unknown; the literature largely supports the claim that it may be due to the known differences in longevity between men and women (Hossin, 2021), while some propose that sex dimorphisms in stress responses can contribute to the increased prevalence of LOAD in women (Yan et al., 2018). Sex divergence in biochemical responses to stress were reported along the hypothalamic-pituitary-adrenal axis and in the activation of the cortical corticotrophin-releasing factor receptor 1 signaling pathway, leading to distinct female-biased increases in molecules associated with LOAD pathogenesis (Yan et al., 2018). In our analysis, effectors of stress response, such as TXNRD1 and GATA4, have been found in all the sub-groups of patients, suggesting that an impaired ability to induce a stress response represents an underlying risk factor for LOAD. Interestingly, in APOE*4 + females, potentially experiencing higher levels of hydroxyl radicals and reduced levels of mitochondrial antioxidants compared to non-APOE*4 carriers (Ihara et al., 2000), all significant epistatic interactions were among genes involved in oxidative stress and those belonging to the other pathways.

| CON CLUS IONS
Collectively, the present results indicate that several loci repeatedly implicated in aging and longevity also contribute to late-onset Alzheimer's disease (LOAD) risk. Most of the top genes associated are assigned to pathways related to metabolism, highlighting their relevance both in the aging process and the pathological events leading to LOAD. Interestingly, this study bolsters the evidence of specific interactions among established risk factors for LOAD, that is APOE genotype and sex, and these genes. This suggests that different trajectories of cognitive aging may be the result of specific epistatic effects between genetic and non-genetic risk factors.
Although our conclusions are based on the evidence of "statistical epistasis", which magnitude and contribution to the variance of complex traits is a highly debated topic (Hivert et al., 2021), in accordance with other authors (Singhal et al., 2023) we think that the potential for "functional epistasis" to drive expressivity and explain clinical heterogeneity in complex diseases is mounting. In the complex scenario of LOAD, understanding the functional changes associated with different combinations of interacting entities (SNPs, genes, pathways, etc.) may help to disentangle the genetic architecture underlying disease development, and interindividual differences that underpin disease, finally leading to precision medicine approaches for early detection of individuals at higher risk for cognitive decline or dementia.