Genetic signature of human longevity in PKC and NF‐κB signaling

Abstract Gene variants associated with longevity are also associated with protection against cognitive decline, dementia and Alzheimer's disease, suggesting that common physiologic pathways act at the interface of longevity and cognitive function. To test the hypothesis that variants in genes implicated in cognitive function may promote exceptional longevity, we performed a comprehensive 3‐stage study to identify functional longevity‐associated variants in ~700 candidate genes in up to 450 centenarians and 500 controls by target capture sequencing analysis. We found an enrichment of longevity‐associated genes in the nPKC and NF‐κB signaling pathways by gene‐based association analyses. Functional analysis of the top three gene variants (NFKBIA, CLU, PRKCH) suggests that non‐coding variants modulate the expression of cognate genes, thereby reducing signaling through the nPKC and NF‐κB. This matches genetic studies in multiple model organisms, suggesting that the evolutionary conservation of reduced PKC and NF‐κB signaling pathways in exceptional longevity may include humans.


| INTRODUC TI ON
The heritability of life expectancy is estimated to be ~25% (Herskind et al., 1996;McGue et al., 1993), to lower than 10% considering inherited sociocultural factors (Ruby et al., 2018) in the general population, but it becomes more substantial after age 65 and 85 years, at 36% and 40% (Murabito et al., 2012), respectively. Family studies suggest that the genetic component of life expectancy is especially strong in the oldest old such as centenarians, who live 100 years or more (Adams et al., 2008;Barzilai et al., 2001). These studies support the utility of centenarians as a human model system of exceptional longevity, "decelerated" aging or "healthy" agers.
Being a centenarian is rare (only 1 in 5000 people lives to 100 years in the United States)  despite the recent increase in the life expectancy of the general population. In addition to an extended lifespan, centenarians have an extended healthspan via delaying, surviving, or escaping age-associated diseases including cardiovascular diseases, cancer, and neurodegenerative diseases (Ailshire et al., 2015;Hitt et al., 1999). Thus, exceptional longevity is obviously coupled with exceptional resistance to diseases that lead to earlier mortalities in humans.

Individuals with exceptional longevity manifest delayed onset of
Alzheimer's disease (AD) and dementia (Kliegel et al., 2004;Perls, 2004). Genetic studies indicate that longevity-associated genes may be protective against cognitive decline, dementia, and AD (Barzilai et al., 2006;Christensen et al., 2006;Dubal et al., 2014;Sanders et al., 2010;Sebastiani et al., 2012). The APOE gene, encoding apolipoprotein E, is a prime example shown to be associated with both AD and longevity (Christensen et al., 2006). Centenarians are depleted of AD-predisposing APOE ε4 allele, while they are enriched with the AD-protective APOE ε2 allele. A functional longevity-associated allele in the cholesteryl ester transfer protein (CETP) gene, I405V, is also significantly associated with slower memory decline and lower risk for dementia and AD (Barzilai et al., 2006;Sanders et al., 2010).

Longevity-associated KL-VS variants (haplotype including F352V
and C370S) in the Klotho gene (KLOTHO) are also associated with protection against cognitive decline (Dubal et al., 2014). A genomewide association study (GWAS) revealed that variants associated with longevity are most significantly enriched in genes related to AD and dementia . These results suggest that there are common underlying pathways between longevity and resilience to AD and related cognitive decline. In addition, as shown in the APOE gene, different alleles in the same gene can be associated with opposing outcomes to either accelerate cognitive decline and increase neurodegenerative disease risk or promote longevity and healthy aging in the brain. A rare variant (A673T) discovered in the well-known familial AD causal gene, amyloid precursor protein (APP), protects against AD, and related cognitive decline (Jonsson et al., 2012). Taken together, genes associated with cognitive function have emerged as interesting candidate genes that might contribute to human longevity.
In this study, we tested the hypothesis that variants in genes implicated in cognitive function may promote exceptional longevity in humans. We performed a comprehensive 3-stage study to identify functional longevity-associated variants in a total of 701 candidate genes by capture sequencing analysis in up to 450 long-lived individuals (95 years of age or older; defined as centenarians in this study) and 500 controls of Ashkenazi Jewish descent followed by functional studies to ascertain the biological significance of the variant ( Figure 1). Here, we report an enrichment of longevity-associated genes in the novel PKC (nPKC) and NF-κB signaling pathways among our candidate genes. Functional analysis of top 3 longevityassociated gene variants (PRKCH, NFKBIA, CLU) in vitro suggests that non-coding variants modulate the expression of cognate genes, thereby reducing signaling through the nPKC and NF-kB pathways.
Importantly, our hierarchical, multidisciplinary approach led to a genetic signature of human longevity: the tightly connected and highly conserved PKC and NF-κB signaling pathways.
Consistent with our data, decreased PKC and NF-κB signaling promote longevity in model organisms (Curran & Ruvkun, 2007;Spindler et al., 2012;Zhang et al., 2013). The directional parallels between the effects of reduction in PKC and NF-κB signaling on longevity of model organisms and those produced by the human longevity-associated PRKCH, NFKBIA, and CLU regulatory variants are quite compelling. These results suggest that reductions in PKC and NF-κB signaling are evolutionary-conserved longevity mechanisms in humans and thus represent therapeutic targets for extending healthspan and lifespan.
2 | RE SULTS 2.1 | Discovery of candidate longevity-associated genes and variants by target capture sequencing GWAS involving long-lived individuals has predicted the presence of rare protective variants with strong effects in centenarians (Deelen et al., 2011(Deelen et al., , 2014Nebel et al., 2011;Sebastiani et al., 2012). Indeed, a comprehensive candidate approach has demonstrated rare functional variants with protective effects are enriched in centenarians of reduced PKC and NF-κB signaling pathways in exceptional longevity may include humans.

K E Y W O R D S
centenarian, genetic variant, longevity, NF-κB, PKC, rare variant (Suh et al., 2008;Tazearslan et al., 2011), suggesting that centenarians may harbor individually rare, but collectively more common genetic variations in candidate longevity genes. Identifying such rare variants requires extensive sequencing analysis of a large centenarian cohort. To ensure identification of all possible longevity-associated variants, including rare variants, we performed a hierarchical costeffective sequencing analysis of Ashkenazi Jewish centenarians and controls ( Figure 1).
In stage 1, we selected 568 candidate genes (Table S1) implicated in cognitive function. These candidate genes included: (a) genes implicated in neurodegeneration such as AD and Parkinson's disease (PD); (b) AD-related genes such as APP and APOE and interacting pathways; (c) genes implicated in cognitive function such as memory formation mechanism and neuronal receptors; and (d) genes implicated in lipid metabolism such as cholesterol synthesis, transport, and metabolism, e.g. APOE and CETP.
We then performed target capture sequencing (Capture-seq) of the 568 candidate genes in 51 centenarians and 51 controls. Target regions of candidate genes included exons, exon-intron junctions, and 2 kb proximal upstream regions to identify both coding and noncoding variants. The Capture-seq analysis identified a total of 13,574 variants in our candidate genes. The variants showed a similar distribution between centenarians and controls across all genomic regions (Table S2). The vast majority of the variants (74.9%, 10,161 out of 13,574) occurred in non-coding regions with about 20% being previously unreported novel variants.
To identify candidate longevity-associated genes enriched with rare variants, we performed a gene-based association study that considered aggregates of rare variants in a gene region using SKAT analysis (Wu et al., 2011). We identified candidate longevity-associated genes that surpass the threshold of nominal p-value < 0.05 (Table   S3). We further performed Ingenuity Pathway Analysis (IPA) to identify pathways that were enriched in significant genes (p < 0.05) from the SKAT analysis. We found that almost all of the top enriched pathways (p < 0.05) contained three protein kinase C (PKC) family genes (PRKCB, PRKCH, PRKCI) (Table S4).

| Prioritization of candidate longevityassociated genes by genotyping
We performed Fisher's exact test to identify variants associated with longevity. 457 variants were significantly associated with longevity (p < 0.05), among which the most significant variant was rs753381 in PLCG1 gene (p-value: 0.0003, Table S5). The majority of variants were rare, accounting for 58.8% (7,984 out of 13,547 variants) with minor allele frequency (MAF) less than 0.05. We then selected variants for genotyping in a larger number of individuals, i.e. 474 centenarians and 551 controls, to prioritize longevity-associated genes, based on the following criteria: (a) association p-values (p < 0.05); (b) rare variants (MAF < 0.05) enriched in either centenarians or controls; and (c) potentially functional variants including non-synonymous, non-sense and frameshift variants with MAF difference between the two groups.
We successfully genotyped a total of 222 variants using Sequenom MassARRAY iPLEX assays (Section 4). A total of 23 variants were significantly associated with longevity (nominal p-value < 0.05), and 12 variants were enriched in centenarians as compared to controls (Tables S6 and S7). Notably, among the 12 variants, 8 variants were in genes involved in the PKC signaling pathway, including F I G U R E 1 Workflow of genetic study to discover longevity-associated genes with functional variants in candidate genes. A 3-stage study design was used for genetic discovery. In Stage 1 with 568 genes identified candidate genes in PKC and interacting pathways for the stage 2. The stage 2 study included 217 genes and identified 22 longevity-associated genes by Captures-seq with larger population. In Stage 3, the top functionally important pathways enriched with longevityassociated genes were selected. The impact of non-coding variants was studied in silico and in vitro to predict the impact on gene expression and therefore the impact on signaling. Mean with standard deviation of age of each group was indicated phospholipase C (PLC) family, Ca 2+ /calmodulin-dependent protein kinase (CaMK), and EGF Receptor (EGFR). Together with the results from the SKAT analysis and pathway analysis (Tables S3 and S4), this result led us to focus on PKC family genes and genes involved in the pathways that interact with PKC such as PLC and CaMK signaling as top longevity-associated genes among our initial 568 candidate genes implicated in cognitive function.

| A second stage Capture-seq analysis focused on PKC and PKC-interacting pathway genes
In stage 2, we selected a comprehensive list of 217 genes (Table S8) acting in the PKC and PKC-interacting pathways as candidate genes and (e) AD-and PD-associated genes from GWAS. We also included significant genes (p < 0.05) from the stage 1 Capture-seq study that were not in the PKC-interacting pathways.
We performed the stage 2 second Capture-seq analysis of the 217 genes in 450 centenarians and 500 controls. A total of 23,625 variants were discovered, among which 564 variants were significantly associated with longevity (p < 0.05) and the most significant variant was rs1092331 in PRKCH gene (p-value: 0.0001, Table S9).
To determine if any sub-pathways of our PKC pathway-centric candidate genes were enriched with longevity-associated genes, we first sub-categorized all candidate genes and then analyzed the enrichment ratio of longevity-associated genes detected by SKAT analysis in each category (Figure 2a,b and Table S11). We found significant enrichment of longevity-associated genes in two subpathways (p < 0.05), the novel PKC family (p = 0.024) and the NF-κB complex (p = 0.024) (Figure 2a,b).

| In silico analysis to identify potentially functional variants in longevity-associated genes
In Stage 3, we examined the functional relevance of the variants in the longevity-associated genes in the PKC and PKC-interacting pathways. For this purpose, we first conducted in silico analysis to prioritize potentially functional variants. For coding variants, we selected variants predicted to affect protein function and/or structure, including non-synonymous variants, stop-gain or stop-loss variants, frameshift variants, and splicing variants. For non-coding variants, we used RegulomeDB (http://regul omedb.org/) to identify potentially regulatory variants. For 3′ UTR variants, we took advantage of the starBase database (http://starb ase.sysu.edu.cn/), which provides the experimental information of RNA-binding protein such as Argonaute (Ago) protein from CLIP-seq analysis. Thus, the genomic location of RNA-binding protein in 3′ UTR is used to predict the microRNA-binding sites based on experimental results (Section 4).
Using the predicted functional variants, we performed multiple SKAT analysis of all variants, coding variants, regulatory variants, or combined putative functional variants of the 22 longevityassociated genes (Table 1). Along with SKAT analysis, which gives a weight to rare variants and considers both direction of enrichment in two groups, we also performed the SKAT-O and SKAT-C analyses that consider the directionality of rare variants enrichment in either one group and the effects of both common and rare variants, respectively.
Most of the 22 longevity-associated genes detected by SKAT (SKAT-all) also showed significant association with longevity when only predicted functional variants were analyzed by SKAT analysis and p50 (NFKB1) and the repressor IκBα (NFKBIA). In addition, the CLU gene that encodes clusterin/ApoJ protein showed overlapping interacting proteins with ApoE protein, alleles of which have been shown to associate with longevity and AD.
We also performed IPA ingenuity canonical pathway enrichment analysis to determine which functional pathways are enriched in TA B L E 1 Longevity-associated genes from the Stage 2 Captureseq in 450 centenarians and 500 controls (p < 0.05) longevity-associated genes using a total of 217 candidate genes that we used for Stage 2 Capture-seq analyses (Table S8). The most sig-  (Table S12).

| Functional analysis of regulatory variants in longevity-associated genes in cell culture models
Recent studies have shown that the vast majority (>95%) of the variants detected by GWAS are non-coding variants and 65% of noncoding GWAS-associated variants occur in enhancer regions (Hnisz et al., 2013), suggesting that gene regulatory changes contribute to F I G U R E 2 Sub-pathways of PKC and PKC-interacting pathways enriched with longevity-associated genes from the Stage 2 Capture-seq analyses. (a) 217 sub-pathway genes included in the Stage 2 Capture-seq were categorized and the number of longevity-associated genes (SKAT p < 0.05) (red bar) and non-significant genes (blue bar) in each sub-pathway was shown. Enrichment analysis of longevity-associated genes in each sub-pathway was performed by Fisher's exact test and -log (p-value) was indicated (black dot) on the graph. Dotted line indicated the threshold of significance of enrichment analysis (p = 0.05). (b) The pathway map indicates the PKC and PKC-interacting genes and sub-pathways. Longevity-associated genes (red boxes, SKAT p < 0.05) were indicated below the sub-pathways (black boxes). Blue circles indicate the top sub-pathways enriched with longevity-associated genes ( Figure 2a). Gray boxes indicate sub-pathways without longevityassociated genes inter-individual differences in genetic risk. Similarly, most of the variants detected by the Stage 2 Capture-seq analyses as well as those in the longevity-associated genes (SKAT p < 0.05) were located in non-coding regions of the genome (Table S10).
To ascertain biological significance of the longevity-associated gene variants, we performed functional analysis of non-coding variants with predicted regulatory potential using immune cell lines that may be functionally relevant based on the pathway analysis (Table   S12). We selected non-coding variants in the predicted regulatory regions of the two longevity-associated genes PRKCH and NFKBIA (Table 2)  We also included non-coding variants in the longevity-associated CLU gene (Tables 1 and 2) because of its known function in regulation of PKC pathway through inhibition of the NF-κB pathway (Santilli et al., 2003).
Due to extended linkage disequilibrium (LD) in the genome, multiple variants in an associated region can be identified as significant and hence identifying truly causal variants is highly challenging especially for non-coding variants. For example, while an intronic variant, rs1092331, was the most significant variant (p = 0.0001) in the PRKCH gene by single variant association analysis (Table 2), it has multiple variants in LD (R 2 = 1, Figure 4a). We selected 5 candidate genomic regions (E1-E5) harboring longevity-associated variants (V2-V6) or variants (V1, V7, and V8) in high LD that map to predicted enhancers based on well-known histone marks (H3K4me1, H3K27ac) as well as a DNase I hypersensitive site (DHS) in the UCSC genome browser (Figure 4a).

F I G U R E 3
Gene-based association studies with predicted functional variants and protein-protein interaction networks for the 22 longevity-associated genes. (a) The heatmap represents the significance of each gene-based association analysis that includes SKAT, SKAT-O, and SKAT-C. Each analysis used all variants (-all), predicted functional coding variants (-coding), predicted functional regulatory variants (-regulatory), or combined predicted functional coding and regulatory variants (-combined). The genes were from the Stage 2 Capture-seq analyses found to be longevity-associated (SKAT p < 0.05, Table 1). Black dotted lines divide each SKAT, SKAT-O, and SKAT-C analysis. Gray indicates the absence of variants. (b) The network was generated with the 22 longevity-associated genes (SKAT p < 0.05) from the Stage 2 Capture-seq analyses. The color of circles indicates the significance from SKAT analysis and the size of circles indicates the significance of SKAT analysis with predicted functional variants. The direct interactions between proteins are indicated by the thick green lines. The red circles represent the top enriched pathways including NF-κB and immune response (left) and PLC, PKC, EGF receptor, and downstream factors (right). Light blue medium-sized circles indicate the proteins included in our candidate gene list without SKAT significance. Green circles indicate the ApoE protein, alleles of which have been associated with longevity and AD in multiple human genetic

| DISCUSS ION
The ultimate goal of this study was to establish if genes implicated in maintaining cognitive function into old age also potentially impact human longevity. From the outset, this study focused on identification of rare, functional variants based on the results from previous GWAS that clearly suggesting the presence of rare, protective variants enriched in long-lived individuals (Deelen et al., 2011(Deelen et al., , 2014Nebel et al., 2011;Sebastiani et al., 2012). Our main hypothesis was that rare functional variants in genes implicated in cognitive function may contribute to human longevity. To test this hypothesis, we conducted a 3-stage systematic study (Figure 1). In stage 1 and 2 Capture-seq analysis, we identified 22 longevity-associated genes (Table 1), which were found to be enriched in the nPKC family and TA B L E 2 Candidate regulatory variants in top prioritized longevity-associated pathway genes for in vitro functional study the NF-κB complex by sub-pathway analysis (Figure 2a,b and Table   S11). By integrating in silico analysis and in vitro reporter assays in the Stage 3, we identified functional regulatory variants in three longevity-associated genes, PRKCH, NFKBIA, and CLU that modulated the expression of the cognate genes. The directionality of the effects on gene expression conferred by the longevity-associated variants suggested that these variants decrease signaling through the nPKC and NF-κB pathways, thereby contributing to the longevity phenotype.
Our study represents by far the largest study for a comprehensive genetic analysis of candidate genes, involving 474 centenarians and 551 controls, in search of longevity-associated genes and variants. It should be noted that defining and selecting control individuals pose a unique challenge in any case-control study of human longevity (Sebastiani et al., 2017). This issue becomes more challenging in the identification of rare variants associated with longevity. We have been tackling this critical issue by taking a strategic study design. First, we have been studying Ashkenazi Jewish centenarians and controls, a genetically isolated population which is more powerful in identifying rare causal variants not only for Mendelian disease genes but also complex traits such as Type 2 Diabetes (T2D) (Steinthorsdottir et al., 2014). Second, we have performed functional F I G U R E 4 Functional analysis of the longevity-associated regulatory variants in the PRKCH gene. (a) An overview of PRKCH gene region displayed in the UCSC genome browser. Red arrows indicate the longevity-associated variants and green arrows indicate the variants in high LD. E1 to E5 regions indicate the putative enhancer regions harboring the variants to be investigated in our reporter assays. (b) A basic design of enhancer reporter constructs. (c) Enhancer reporter assays using E1 to E5 constructs along with a vector as a baseline control in U-87 human glioblastoma cell line. The y-axis indicates relative fold changes in reporter activity of the E1 to E5 constructs harboring either wild-type (WT) or the longevity-associated variants (n = 3). * indicates the p-value less than 0.05 by t-test analyses to confirm the genetic and biological significance of top variants.
In the Stage 1 discovery genetic study, single variant association and gene-based association studies with a moderate sample size could detect some genetic signatures at the pathway level. However, it was critical to validate the association in a large number of individuals because only a small portion of significant variants and genes were validated. 6.35% (4 out of 63) of significant variants and 8.7% (2 out of 23) of genes in the Stage 1 Capture-seq remained significant in genotyping and Stage 2 Capture-seq analysis, respectively (Table S7 and Table 1). The Stage 2 Capture-seq analysis identified many non-coding region variants, which were mainly intronic (50.96%) and in the upstream region (13.36%), as well as exonic region variants (10.38%). Compared to all variants, a higher proportion of longevity-associated variants were identified in intronic (53.90%) and upstream (16.13%) regions, while lower in the exonic region F I G U R E 5 Functional analysis of the longevity-associated regulatory variants in the CLU gene. (a) An overview of CLU gene region displayed in the UCSC genome browser. Red arrows indicate the longevity-associated variants and blue arrowheads indicate the variants associated with risk of AD from GWAS. E1 and E2 indicate the putative enhancer regions harboring the variants to be investigated in reporter assays. (b) The design of the enhancer reporter constructs. (c) Enhancer reporter assays using E1, E2 constructs along with a vector as a baseline control in U-87 human glioblastoma cell line. The y-axis indicates relative fold changes in reporter activity of the E1 and E2 constructs harboring either wild-type (WT) or the longevity-associated variants (n = 3). * indicates the p-value less than 0.05 by t-test (9.04%). (Table S10). This suggests that variants in non-coding regions such as upstream and intronic regions may play an important role in longevity by regulation of their cognate gene expression as was implicated in GWAS studies (Hnisz et al., 2013).
RegulomeDB analysis identified that 4 out of 10 variants were equal or less than 5, which meet the criteria used for the SKAT analysis with functional regulatory variants. In studies using cell model, all of the 4 predicted functional variants affected luciferase reporter activity while the other variants did not change the activity (Figures   4c, 5c, and 6c), suggesting that functional regulatory variants were predicted well. The longevity-associated CLU variants are close to variants associated with the decreased risk of AD detected by multiple GWAS (Harold et al., 2009;Lambert et al., 2009) (Figure 5a).
An AD-associated variant, rs11136000, in the CLU gene was shown to be correlated with reduced incidence of AD as well as increased expression of the CLU1 isoform in the human brain (Ling et al., 2012).
Thus, we hypothesize that the longevity-associated intronic variant contributes to longevity and healthy cognitive aging by increasing CLU gene expression.
In this study, we identified a genetic signature of decreased signaling of PKC and NF-κB pathways that may contribute to human longevity. Interestingly, animal model studies showed that decreased signaling of PKC and NF-kB pathway increased lifespan. Decrease of the PKC pathway activity (Spindler et al., 2012), novel PKC pathway including PKCη (PRKCH) (Monje et al., 2011), PKCδ (PRKCD) (Curran & Ruvkun, 2007), and PKD3 (PRKD3) (Feng et al., 2007) have been implicated in longevity in model organisms. Negatively affecting expression of PLC family genes, which are directly upstream of PKC, is also reported to increase lifespan, especially PLCβ (Kawli et al., 2010) and PLCγ with EGF receptor (Iwasa et al., 2010). In agreement with the animal studies, PLCB1, PLCG2, and EGFR were found to be associated with longevity in our study. NF-κB complex genes act downstream of PKC. In our study, we identified the 3 major components of NF-κB complex, NFKBIA (IκBα), NFKB1 (p50), and RELA (p65), among the top 22 longevity-associated genes. In model organisms, inhibition of NF-κB activity delayed aging and increased lifespan, while enhanced activity accelerated aging (Kawahara et al., 2009;Tilstra et al., 2012;Zhang et al., 2013). Deficiency of CLU gene product in aged mice increases the severity of immune response mediated myocarditis in heart (McLaughlin et al., 2000) and glomerulopathy (Rosenberg et al., 2002), two age-related diseases. Indeed, in Drosophila, overexpression of human clusterin (CLU) increased stress resistance and extended lifespan (Lee et al., 2012). This suggests that this conserved pathway may influence longevity in humans, analogous as to what has been demonstrated in model organisms.
It is possible that the genes and signals we identified are population-specific and may need validations in different human populations (Franceschi et al., 2020). However, our data suggest that functional variants with the same directional impact on the evolutionary conserved genes as the variants found in our study may play a similar role in different human populations, providing evidence for therapeutic modulation.
With the identification of rare variants enriched in long-lived individuals, there is now a pressing need to functionally validate the variants to understand the mechanisms by which these variants contribute to longevity in humans. An integrated analysis using experimental and computational approaches in parallel will help elucidate the molecular mechanisms of functional variants, which then be further tested in cell and animal models (Zhang et al., 2020). The longevity-associated variants in the context of evolutionary conservation provide unique opportunities to translate genetic discoveries to therapeutic modulation.
In summary, by taking a hierarchical, comprehensive candidate approach, we found a genetic signature of human longevity in the hyperconnected PKC and NF-κB pathways among genes implicated in cognitive function. Our results suggest that reduction in PKC and NF-κB signaling may promote longevity in humans as has been observed in model organisms. Further studies will ultimately reveal the novel role of PKC and NF-κB signaling in longevity and maintenance of cognitive function in human populations and provide important mechanistic insights into the molecular basis of aging.

| Study subjects and sample collection
Our study group consisted of 474 Ashkenazi Jewish (AJ) centenarians and 551 AJ controls that were collected as part of a Longevity

| Rare variation association analysis
We used a R package called "SKAT" for SKAT, SKAT-O, and SKAT-C analysis as methods for rare variant association from both the Stage

| Selection of predicted functional regulatory variants
As the criteria for selecting predicted functional regulatory variants, we selected variants that have the RegulomeDB score equal to or less than 5, which overlapped with any least one transcription factor binding or DHS. In addition, if at least one experiment reported the Ago protein binding in an overlapping region of variants, we selected the variants predicted to be functional in 3′ UTR region.

| Cell culture condition
Human glioblastoma cell line U-87 MG, and human monocyte cell line THP-1 cells were purchased from American Type Culture Collection (ATCC). U-87 MG cells were maintained in Dulbecco's modified Eagle medium (DMEM), THP-1 cells were maintained in RPMI. Both of them were supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin at 37ºC in a humidified 5% CO 2 .

| Generation of luciferase reporter constructs and assay
Genomic DNAs from individuals who harbored PRKCH, CLU intronic heterozygote variants, and reference sequences were used to amplify the intronic regions by PCR. PCR was performed using Phusion high fidelity DNA polymerase (New England BioLabs, Inc., Massachusetts).
The amplified products were cloned into pGL3 vector with SV40 promoter (Promega), and the sequence of the construct was verified. For NFKBIA upstream variants, genomic DNA of an individual harboring reference sequence were used for PCR amplification and cloned into promoter-less pGL4 vector (Promega). Mutagenesis was performed to generate constructs with desired variants using QuikChange II XL

Site-Directed Mutagenesis Kit (Agilent).
For functional studies of the variants in PRKCH or CLU genes, 1 × 10 5 of U-87 MG cells were cultured on 24-well plates 24 hours prior to transfection, and transfected with 300 ng consisting of luciferase reporter plasmids and pRL-TK (Promega) in a ratio of 5:1.
For functional studies of NFKBIA gene variants, 2 × 10 4 of THP-1 cell were cultured on 96-well plates 24 hours prior to transfection, and transfected with 110 ng consisting of luciferase reporter plasmids and pRL-CMV (Promega) in a ratio of 10:1. Transfections were performed using X-tremeGENE HP transfection reagents (Roche) according to the manufacturer's instructions. 24 hours after transfection, cells were harvested and the luciferase activities were measured using a Dual-Luciferase Reporter Assay System (Promega) on a microplate reader (BioTek). Enhancer and promoter activities were normalized with Renilla luciferase activities.

| Protein-protein interaction network generation
We used the Human Protein Reference Database (HPRD, release 9), and High-quality INTeractomes (HINT) as the background network (with all self-interactions ignored) and our selected top genes as the "seeds". To generate the sub-network using Cytoscape program, we searched for all shortest paths between every pair of the "seeds". The resultant shortest paths together constitute the subnetwork. To make the sub-network simple and informative, we ignored any shortest paths longer than the smallest shortest path length with which all the selected top genes would be included in the sub-network.

ACK N OWLED G EM ENTS
We would like to thank Genomics Shared Facility at Albert Einstein

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

AUTH O R CO NTR I B UTI O N S
NJS and YS conceived and designed the experiments. SR and JH performed the experiments. SR, JH, TNK, QZ, SL, ZZ analyzed the data.
GA and NB collected and provided AJ samples. SR, LJN, PDR, and YS contributed to the writing of the manuscript. All authors read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequencing data that support the findings of this study have been deposited in NCBI Sequence Read Archive (PRJNA669033, PRJNA669034, and PRJNA669037).