Current allele distribution of the human longevity gene APOE in Europe can mainly be explained by ancient admixture

Abstract Variation in apolipoprotein E (APOE) has been shown to have the strongest genetic effect on human longevity. The aim of this study was to unravel the evolutionary history of the three major APOE alleles in Europe by analysing ancient samples up to 12,000 years old. We detected significant allele frequency shifts between populations and over time. Our analyses indicated that selection led to large frequency differences between the earliest European populations (i.e., hunter‐gatherers vs. first farmers), possibly due to changes in diet/lifestyle. In contrast, the allele distributions in populations from ~4000 BCE onward can mainly be explained by admixture, suggesting that it also played an important role in shaping current APOE variation. In any case, the resulting allele frequencies strongly influence the predisposition for longevity today, likely as a consequence of past adaptations and demographic processes.


| INTRODUC TI ON
Apolipoprotein E (APOE) was the first locus shown to be involved in human longevity and is still the genetic factor with the strongest effect on the phenotype (Deelen et al., 2019;Nebel et al., 2011).
Two coding polymorphisms (rs429358-T/C and rs7412-C/T) define the three haplotypes ε2, ε3, and ε4 which are commonly referred to as alleles. These alleles have a large influence on the development of age-related neurodegenerative and cardiovascular disorders. For both diseases, ε4 is deleterious compared with ε3 (Mahley, 2016). It is responsible for most of the genetic susceptibility to sporadic lateonset Alzheimer's disease (LOAD) (Cruts & van Broeckhoven, 1998).
The detrimental effect of ε4 is reflected in a low odds ratio of becoming long-lived (OR = 0.60 and OR = 0.52 for 90th and 99th percentile survivors, respectively) (Deelen et al., 2019). In contrast, ε2 is associated with a reduced risk of LOAD and is deemed a prolongevity allele (OR = 1.28 and OR = 1.47 for 90th and 99th percentile survivors, respectively) (Deelen et al., 2019). In addition, ε2 has recently been reported to increase lifespan irrespective of AD (Shinohara et al., 2020).
Although ε4 has been described as the ancestral allele (Fullerton et al., 2000), ε3 is dominant in most parts of the world. Globally, an unequal allele distribution has been observed, especially for ε2 and ε4 (Singh et al., 2006). Even in Europe, frequency ranges are wide (roughly 0.74-0.88 for ε3, 0.06-0.22 for ε4 and 0.04-0.12 for ε2), and there is a decreasing north-to-south cline for ε4 (Lucotte et al., 1997). The differences in allele frequencies, both at the European and global level, have so far mainly been attributed to natural selection (Eisenberg et al., 2010;Finch, 2010;Huebbe et al., 2015Huebbe et al., , 2011Singh et al., 2006). However, previous analyses on the evolutionary history of APOE considered only modern-day sequences or genotypes (Eisenberg et al., 2010;Finch, 2010;Fullerton et al., 2000). Therefore, conclusions about the place and time of origin of the three alleles, their later dispersal or the possible influence of demographic processes had to remain vague.
Thanks to the recent advances in the field of ancient DNA (aDNA) analysis, a more detailed picture of the APOE history can now be drawn by incorporating allele calls from archaeologically well-dated and well-defined human remains. In this study, we assembled 358 diploid APOE genotypes from 3521 publicly available aDNA data sets spanning more than 12,000 years. Our aim was to investigate the past events which led to the current APOE allele distribution in Europe. For this purpose, we took into consideration that the present-day European gene pool was basically derived from three prehistorical populations, the western hunter-gatherers (WHG), the Anatolian Neolithic (AN) farmers and western steppe herders (Haak et al., 2015;Mathieson et al., 2015). By analysing various ancient, medieval and modern groups of sufficient sample sizes, we obtained reliable frequency estimates and established spatiotemporal allele trajectories (Marciniak & Perry, 2017). We concluded that past admixture events played a greater role in shaping the European APOE variation than previously assumed.
Data was preferentially obtained in BAM format and in cases where such a format was unavailable, the raw fastq files were used instead. The methodology used to process the samples in fastq format was previously described by Immel et al. (2021). To minimize downstream genotype errors, the terminal positions of reads with >0.05 deamination rates were trimmed off with bamUtil v1.0.15.
BAM files from samples that passed the quality control step were piled up using SAMtools (Li et al., 2009) v1.15, filtering by base and mapping quality >20. The APOE polymorphic sites (rs429358--T/C and rs7412-C/T) were called using bcftools (Li, 2011) v1.15, removing genotypes with a depth of coverage <3. Samples missing a diploid genotype at both polymorphic sites were removed and not considered for further analysis. We completely avoided the use of pseudo-haploid calls, unlike a previous study (Abondio et al., 2019).
This was necessary to obtain more reliable allele frequencies and to be able to distinguish between homozygous and heterozygous samples.

| Population analysis
As described in a previous study (Immel et al., 2021), a principal component analysis was conducted on the merged genotyping data of ancient samples and selected modern reference populations from West Eurasia. The grouping of the samples into different ancient subpopulations (see Figure 1b) was based on this projection as well as archaeological data (taken from AADR annotation and Immel et al., 2021).
The proportions of parental populations in admixed more recent populations were calculated using supervised ADMIXTURE (Alexander et al., 2009) v1.3.0 analyses (100 bootstraps) on a pruned version of the previously mentioned genotyping data set. Only samples with genotype calls for at least one of the two APOE variants were included. Information on the parental samples used in the admixture analyses can be found in Tables S1a,1b.

| Allele frequencies
The frequency for ε3 was calculated from the counts of the rs429358-T -rs7412-C haplotype within a target population. To maximize sample size, frequencies for ε4 and ε2 were calculated from the counts of the allele-defining variant only, namely rs429358-C for ε4 or rs7412-T for ε2 (Table S2). APOE allele frequencies were also calculated for the 1000 Genomes (The 1000 Genomes Project Consortium et al., 2015) populations CEU, GBR, IBS, and TSI as reference (Table   S3). Significant differences between population frequencies were calculated using Fisher's exact test (Virtanen et al., 2020) (Table S4).
The p-values of all pair-wise comparisons (n = 105) were corrected for multiple testing using a false discovery rate of 0.05 (Benjamini & Hochberg, 1995).

| Testing for selection
A two-sided binomial test was used to estimate whether allele fre- Then, a binomial test (Virtanen et al., 2020)

| RE SULTS
We scanned the two APOE allelic sites (rs429358-T/C and rs7412-- The second significant admixture event happened around 2700-2500 BCE when a large group of western steppe herders associated with the Yamnaya culture moved to Europe and mixed with the LF (Figure 1d). This last episode resulted in an unequal admixture distribution across Europe, with a higher steppe component in the north and a higher farmer component in the south, both of which can still be seen in populations today (Haak et al., 2015).
We noted interesting trends in the ancient APOE frequencies.
As in modern Europeans, ε3 was also the most common allele in all ancient groups. However, the distribution of the three APOE alleles differed across ancient and modern populations. This was most apparent in the earliest European groups, the WHG and the EF, who showed substantial differences in observed APOE allele frequencies The WHG had a significantly higher ε4 frequency compared with the EF and all descendant European populations, while they completely missed the ε2 allele. On the other hand, the EF had the highest ε3 and lowest ε4 frequency of all examined populations, but an ε2 frequency that was more comparable with present-day Europeans. The admixture-informed expected frequencies were calculated from the observed allele frequencies of parental populations in conjunction with the mixture proportions (Figure 1c,d) which have a stronger LF component and, as a result, a lower ε4 frequency. The only ε4 frequency that significantly deviated from the admixture-informed expected frequency was found in CEU + GBR, which was higher than expected (binomial test, p = 0.019). We noticed a slightly different trend for ε2 frequencies.
While the northern Bronze/Iron Age (BA/IA_N) group had a frequency comparable with the steppe, the southern Bronze/Iron Age ε2 frequency was higher than both LF and the steppe but did not significantly deviate from the admixture-informed expected frequency, likely due to low sample sizes. Interestingly, the frequencies of ε2 (e, f) ε2 frequencies. Expected frequencies were calculated as a product of parental allele frequencies and parental admixture components in two demographic steps (WHG and EF form LF; LF and steppe form modern European gene pool (BA/IA_S, BA/IA_N, Viking, IBS + TSI, CEU + GBR). A binomial test was used to assess significant deviation between observed and expected frequencies (*p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001). Error bars signify 90% confidence interval (Clopper-Pearson beta distribution) of observed allele frequencies. For population abbreviations, see Figure 1.

| DISCUSS ION
In this study, we compiled the to-date largest set of diploid calls for the two most important APOE polymorphisms from published Eurasian ancient DNA samples. We found that the ε4 frequency was significantly higher in WHG than in the first farming populations or modern Europeans. Although the ε4 allele is considered detrimental to late-life health, it has also beneficial effects on early-life fitness today (i.e., antagonistic pleiotropy) (Finch, 2010;Huebbe et al., 2011;van Exel et al., 2017). In addition, it has been shown to be the ancestral allele (Fullerton et al., 2000) and likely benefited our ancestors. Hence, ε4 probably represented an adaptation to the hunter-gatherer lifestyle, the only form of subsistence in all human ancestors until the introduction of agriculture.
Modern indigenous populations with a long history as huntergatherers show similarly high frequencies of ε4 as the WHG (Singh et al., 2006) and seem to be exempt from its detrimental effects (Vasunilashorn et al., 2011). The generally heightened inflammatory response associated with ε4 (Dose et al., 2016;Gale et al., 2014) may have protected the WHG from certain viral and bacterial pathogens, as it has been shown, for example, for malaria and hepatitis C in modern populations (Dose et al., 2016;Fujioka et al., 2014;Wozniak et al., 2002). Additionally, recent research has revealed that ε4 is associated with improved cognitive development and increased fertility in a high-pathogen environment (Oriá et al., 2010;van Exel et al., 2017). The ε4 allele may also have helped to maintain a stable vitamin D metabolism in the dark- do not know if ε4 was already associated with age-dependent detrimental effects in the WHG, their high levels of aerobic exercise (compared with the average modern European) could have relaxed these constraints (Raichlen & Alexander, 2014). Furthermore, ε3/ ε4, which is the most common genotype amongst WHG presented in this study, has been significantly associated with increased VO 2max (a measure for cardiovascular fitness) after exercise relative to the other APOE genotypes (Huebbe et al., 2015). This may have been beneficial to the WHG who routinely went for long periods of endurance running or walking.
In contrast to the WHG, the EF had the lowest ε4 frequency out of all ancient populations and modern European references. Our F ST -analysis showed that the deviation in frequency of the ε4-SNP (rs429358) between WHG and EF was more extreme than over 95% of all other tested sites on chromosome 19, suggesting selection acted on the APOE alleles. The results from our Tajima's D analysis hint at possible selective sweeps in the APOE region of EF/LF, which may explain their low ε4 and high ε3 frequency, respectively.
The transition from hunter-gathering to farming was one of the most drastic shifts in subsistence in human history. Food became monotonous and supply susceptible to bad harvests, while physical activity shifted from aerobic exercise (endurance running) to more strenuous labour, which negatively affected overall health. There was also an increase of non-infectious inflammagens from various sources, such as indoor smoke caused by domestic fire, the intake of saturated fats, milk and gluten, and the cooking of food in general (Finch, 2012). This may have resulted in a mismatch between a highly pro-inflammatory environment (Finch, 2012) and the pro-inflammatory ε4 allele (Gale et al., 2014), possibly further increasing the risk for inflammatory disorders such as cardiovascular disease (CVD). While there are signs that CVD was already present in the late Neolithic/Copper Age (Gostner et al., 2011), we cannot say if the ε4 allele was also associated with CVD in past populations. Furthermore, recent aDNA studies have indicated that infectious agents may have played a lesser role in the Neolithic than previously assumed (Fuchs et al., 2019), relativizing any protective effects of ε4 against pathogens in this period.
In our data, the observed ε4 frequency of the LF can be explained by demographic processes, namely the admixture between its parental populations, the WHG and EF. The same can be said for offspring populations of the steppe and LF. Many studies have already described the north-to-south cline of ε4 in modern Europe and attributed it largely to selection (Eisenberg et al., 2010 Compared with any other ancient or modern European population, the EF had the highest ε3 frequency. We hypothesize that ε3-vs. ε4-carriers in early Neolithic populations were more resistant to famines (i.e., those caused by a bad harvest). This is in line with findings that show that the ε3 allele is associated with the potential to more efficiently use dietary energy and deposit fat in adipose tissue (Huebbe et al., 2015).
Interestingly, we did not detect ε2 in either the WHG or a smaller population of eastern hunter-gatherers (EHG; N = 13, Table S2), suggesting that ε2 may have been completely lost or reduced to negligible frequencies in the hunter-gatherer populations of Mesolithic Europe. Astonishingly, the same observation can be made for many modern indigenous hunter-gatherers from South and Central America (Marin et al., 1997;Vasunilashorn et al., 2011) or for Australian aborigines (Kamboh et al., 1991). Although it is possible that this decrease in allele frequency (or rather allele loss) happened as a result of genetic drift, it is unlikely that both WHG and EHG as well as modern hunter-gatherers were independently affected in the same manner. Rather, it seems that the evolutionarily older ε3 and ε4 alleles (Fullerton et al., 2000) were more advantageous for WHG. shown that especially Ruminococcaceae may help with their digestion (Umu et al., 2015). It is plausible that the EF handled food in this manner and that their diet had a higher overall starch content compared with WHG (Ollivier et al., 2016).
Following the two major demographic events, the observed ε2 frequency did not significantly differentiate from the admixtureinformed expected frequency in populations before the common era. However, the observed ε2 frequency was significantly higher than its expected frequency in all common era populations (i.e., Viking, IBS + TSI, CEU + GBR). This was especially noticeable in northern Europeans, where ε2 frequencies were higher than in any ancestral population, suggesting that admixture alone was not responsible for this increase. However, the overall increase was rather modest, which may explain why we did not detect any signatures of selection.
We conclude that the APOE allele frequencies of modern Europeans, which strongly influence the likelihood of becoming long-lived today, are most probably the consequence of the past demographic processes and adaptations outlined in this study. The different APOE allele frequencies of the WHG and EF likely resulted from adaptations to diet, physical activity and inflammatory load (Finch, 2012). We have also shown that the north-to-south cline of ε4 in modern Europe was the result of a higher steppe admixture rather than of selection/environmental differences. The persistence of the ε4 allele since the Bronze Age may be attributed to a rise in infectious diseases. It remains uncertain whether the increase in ε2 frequency in modern populations was the result of selection and, if so, which selective pressures may have been involved. Furthermore, it is unclear if and to what extent the APOE alleles already influenced longevity in past populations. Future association studies using ancient samples would be needed, which in turn would require accurate anthropological documentation of age at death. Longevity is a post-reproductive trait and therefore unlikely to have been the main driver of selection, unless we assume that a grandmother/grandfather effect played a positive role in past populations.

| LI M ITATI O N S
Unfortunately, the overall quality of aDNA data is much lower than of modern genomic data. While we made sure to use only higher quality data for our analyses, the number of samples that passed this quality control was limited, which in turn restricted statistical power.

AUTH O R CO NTR I B UTI O N S
Idea/concept: AN and BKK. Data curation/processing: DK and NdS.

ACK N OWLED G M ENTS
We acknowledge financial support from the Deutsche

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analyzed in this study.