Age‐dependent DNA methylation patterns on the Y chromosome in elderly males

Abstract The Y chromosome, a sex chromosome that only exists in males, has been ignored in traditional epigenetic association studies for multiple reasons. However, sex differences in aging‐related phenotypes and mortality could suggest a critical role of the sex chromosomes in the aging process. We obtained blood‐based DNA methylation data on the Y chromosome for 624 men from four cohorts and performed a chromosome‐wide epigenetic association analysis to detect Y‐linked CpGs differentially methylated over age and cross‐validated the significant CpGs in the four cohorts. We identified 40–219 significant CpG sites (false discovery rate <0.05) with >82% of them hypermethylated with increasing age, which is in strong contrast to the patterns reported on the autosomal chromosomes. Comparing the rate of change in the Y‐linked DNA methylation across cohorts that represent different age intervals revealed a trend of acceleration in DNA methylation with increasing age. The age‐dependent DNA methylation patterns on the Y chromosome were further examined for their association with all‐cause mortality with results suggesting that the predominant pattern of age‐related hypermethylation on the Y chromosome is associated with reduced risk of death.


| INTRODUCTION
The Y chromosome, the sex-determining chromosome found only in the male phenotype of the population, contains circa 57 million DNA base pairs (BP; Million BP: MP; H. Sapiens build: hg38/ GRCh38) and is often neglected in (epi)genetic studies. With its "small" size, it is only larger in terms of BP than both chromosome 21 (circa 47 MP) and chromosome 22 (circa 51 MP). The Y chromosome used to be considered mostly meaningless with a high percentage of repetitive and noncoding regions (sometimes referred to as "genomic deserts"). With improvements in analytical methods and technologies (Jobling & Tyler-Smith, 2003, 2017, the chromosome has become a more popular analysis target. The Y chromosome is most commonly studied regarding Y-linked haplogroups (Consortium YC, 2002;Knijff, 2000;Zerjal et al., 1997), especially within phylogeny due to its nature of low recombination and paternal inheritance. Moreover, loss of Y chromosome (LOY) has been observed with age (Forsberg, 2017) and Y-chromosomic deletions on specific regions (Yq11) that are associated with oligozoospermia and azoospermia phenotypes can cause infertility of different degrees (Vog et al., 1996).
In comparison with the recent upsurge in omics studies focusing on autosomal chromosomes, the sex chromosomes are often not included in analyses, especially within genome-wide association studies (GWAS), epigenome-wide association studies (EWAS), and transcriptome-wide association studies (TWAS) on complex diseases and traits (Wise, Gyi, & Manolio, 2013). This is unfortunate because the sex chromosomes could be influential (directly or indirectly) on certain diseases with sex differences (Khramtsova et al. 2019). Particularly, in the field of aging research, sex differences have been found to affect the trajectory of aging phenotypes (Dowling, 2014), agingrelated diseases such as Alzheimer disease and other dementias (Mazure & Swendsen, 2016), and mortality (Austad & Fischer, 2016;Case & Paxson, 2005). Although multiple EWASs have been performed to study the dynamic regulatory patterns of the aging methylome, current literature concerning associations between the Y chromosome and aging mainly describes LOY and copy number variants (CNV) (Zhou et al., 2016), which have been reported as far back as 1972 (Pierre & Hoagland, 1972).
Making use of existing multiple datasets on genome-wide DNA methylation in older male subjects, we performed an exploratory Y chromosome-wide association study on the aging-related methylation changes on the Y chromosome and compared them with those from the autosomal chromosomes. We replicate findings across datasets and correlate age-related methylation changes with all-cause mortality and discuss potential implications in the epigenetics of aging.

| Cross-sample/population replication of significant CpGs
An overall view of the above four sets of CpGs revealed a total of 282 CpGs significantly hypermethylated with FDR <0.05 in at least one cohort (Supporting Information Table S1), where 139 of the sites were also significantly hypermethylated in at least one other cohort with FDR <0.05. For the hypomethylated sites, 48 were significant in at least one of the cohorts with FDR <0.05 (Supporting Information Table S2) but with little validation (N = 3) in any of the other cohorts with FDR <0.05.

| Accelerated rate of hypermethylation in high age-groups
By plotting the coefficients of the significant hypermethylated CpGs over Y chromosome position (Figure 1), we see a tendency of increased hypermethylation profiles with increased age. We smoothed the regression coefficients for age (the rate of change) using locally weighted scatterplot smoother (Jacoby, 2000) (LOESS, α = 0.5), with residual standard errors of 0.51, 1.57, 0.917, and 1.772, for MADT, LSADT1, LSADT2, and LBC1921, respectively.
These lines clearly demonstrate a trend of a higher magnitude of regression coefficients for hypermethylation for cohorts of higher mean ages. More specifically, LBC1921 (mean age: 88.51) has higher coefficient values compared with MADT (mean age: 66.73), LSADT1 (mean age: 79.34), and LSADT2 (mean age: 81.69). The lines for LSADT1 and LSADT2 are intertwined but still notably higher compared to the line of MADT. Figure 2 displays the box plots for cohort age (a) and for the regression coefficients of significantly hypermethylated CpGs by dataset (b). We see a tendency of higher age with higher methylation level. The tendency is seen when looking at Figure 2c. Here, we see that the mean increase in age of 133% (factor 1.33, mean age 66.73-88.51 years) corresponds to the accelerated hypermethylation values by 260% percent (mean coefficients of 1.31-3.40).
Among the significant CpGs (FDR <0.05), 7 were present in all four cohorts (Table 3). Again, we checked how the coefficient of these corresponded to findings above. We ranked the coefficient of each site from each cohort to numbers between 1 and 4, where 1 was the highest value and 4 was the lowest. We saw that for all but one sites, LBC1921 had the highest value (score: 1). For MADT, all sites had the smallest value (score: 4). For LSADT1 and LSADT2, the scores were all either 2 or 3, except for a single site having score 1.
Again here, we are able to reveal how the higher ages in the cohorts correspond to higher coefficients for the 7 CpGs.
By performing the Wilcoxon rank-sum test (also known as Mann-Whitney-Wilcoxon (MWW) or U test) on all sites where the coefficients in all cases were hypermethylated (N = 125), revealed the same conclusions as above. The 125 sites were picked out on the basis that only the sites that at least one of them were significantly associated with age (FDR<0.05) in at least one cohort but had positive regression coefficients (i.e., increased methylation over age) for all four datasets. With an H 0 = no difference in coefficients between the cohorts (where the main difference of the cohorts is their age), the test results are produced in Table 4. For all tests except the comparison between LSADT1 and LSADT2 (p = 0.80), a significant difference was observed, with p = 2.7832e-06 for comparing LBC1921 with MADT. We can conclude that there are significant differences in the coefficients (i.e., the rate of change) between older and younger cohorts, as suggested by   Figure S1 where a PMS based on a list of top 30 CpGs best fits to the mortality data with a p-value of 9.59e-03 and adjusted R 2 of 0.035. Interestingly, the regression coefficient for all PMSs is negative (Supporting Information Table S4) indicating a negative correlation with mortality.

| DISCUSSION
By focusing on male-only samples, we were able to analyze the age pattern of Y-linked DNA methylation in older people. We identified significant CpG sites that change their methylation levels across ages. Different from the reported age-related methylation patterns dominated by decreased methylation over increasing ages (Johansson, Enroth, & Gyllensten, 2013;Li et al., 2017;Marttila et al., 2015), the Y-linked DNA methylation is characterized by increased methylation with increasing age, accounting for over 80% or 90% of all the significant age-associated CpGs. As shown by Figure 3, the

(c)
F I G U R E 2 Plots displaying accelerated increase in DNA methylation by significantly hypermethylated CpGs (FDR <0.05) in the four cohorts ordered by increasing mean age. The boxplots in 2a and 2b show the distribution of sample ages and regression coefficients of hypermethylated CpGs in each cohort. Acceleration in DNA hypermethylation with increasing age is clearly illustrated by plotting, for each cohort, the means of ages and means of regression coefficients normalized by MADT (2c) The detected age-associated hypermethylation on the Y chromosome is further consolidated by our cross-cohort replication as shown in Table 2. High replication rates were observed across the datasets for hypermethylated CpGs while that for the hypomethylated CpGs were mostly low. This contrast suggests that the observed overwhelming pattern of Y-linked hypermethylation could represent a striking difference in the age-related epigenetic control over sex and autosomal chromosomes as the age-associated methylation patterns on autosomal chromosomes are dominated by reduced methylation with increasing age (Li et al., 2017). We have recently compared the age-associated CpGs with mortality-associated CpGs on autosomal chromosomes found in the LBC1921 birth cohort and found very limited overlap (about 10% of the age-associated CpGs) between them although the overlap is significantly different from random (unpublished results). Most importantly, the overlapping CpGs are dominated by those age-related methylation patterns help to reduce mortality. Different from the autosomal chromosomes, the high overlap (44%) between age-and mortalityassociated CpGs on the Y chromosome ( Figure 3) highlights its high importance in successful aging in males.
Among the 5 annotated genes in Table 3, increased expression of NLGN4Y has been associated with autism (Ross, Tartaglia, Merry, Dalva, & Zinn, 2015) and expression of DDX3Y may modulate neuronal differentiation (Vakilian et al., 2015). A recent study reported that the expression of the TBL1Y gene plays an important role in cardiac differentiation (Meyfour et al., 2017). The other gene in Table 3 (TTTY23, TTTY20 This presents an obvious limitation of our study, and as such, our results should be interpreted with caution. We hope that future studies using high capacity design or methylation sequencing technique will help to validate our findings and uncover the impact of Y chromosome on male aging. F I G U R E 3 Scatterplot displaying relationship between age-related rate of change in DNA methylation and mortality. The empty dots are CpGs significantly methylated with age (FDR <0.05, the larger the dots the higher the statistical significance). The red dots are CpGs associated with mortality with p < 0.05 in the Cox regression model. The figure shows that age-associated CpGs predominantly contribute to reduced risk of death

| Study populations and samples
We analyzed Y chromosome data on four cohorts of middle-and older-aged subjects consisting of Middle-Aged Danish Twins (MADT) (Gaist et al., 2000),  Table 1 outlines the basic cohort characteristics.

| Preprocessing and quality control (QC)
Before

| CpG-based association tests
The CpG-based age association tests were modeled using linear regression models. For the Danish twin samples, twin pairing was included as a random factor in a mixed effect model. The regression analysis adjusted for blood cell-type composition (CD8T, CD4T, natural killer cell (NK), B cell, monocyte, and granulocyte) estimated using Houseman's method (Houseman et al., 2012) (1-4).
the R package minfi for the Danish twin data and R package celltype-s450 for the LBC1921 data.
The model for the twin cohorts was defined as: while for LBC1921, the model was defined as:

| Polygenic methylation score (PMS)
To summarize the effect of age-associated Y-linked CpGs on mortality, we use PMS as introduced by Linnér et al. (2017). For a list of q age-associated CpGs selected using a significance cutoff, the PMS for a sample j is calculated as the sum of their coefficients for age (β) multiplied by methylation levels of corresponding CpGs (DNAm), PMS j ¼ ∑ q i¼1 βðiÞ Ã DNAmði; jÞ. Effect on mortality for the calculated PMS is assessed by including it as a variable in the Cox regression model together with individual age as a covariate for adjustment.
The predictive power of PMS is evaluated by the incremental coefficient of determination (incremental R 2 ) calculated as the difference in R 2 (pseudo R 2 ) between the Cox model fitted with PMS and age and the model with age only.

ETHICS APPROVAL
The MADT study was approved by the Regional Committees on Health Research Ethics for Southern Denmark (S-VF-19980072

CONSENT FOR PUBLICATION
Written consent was obtained for all authors.

CONFLI CT OF INTEREST
None declared.

ACKNOWLEDG MENTS
We thank the cohort participants and team members who contributed to these studies.  The bold indicates nonsignificant differences in the coefficients of compared significantly hypermethylated CpGs.