Cell‐free DNA as a biomarker of aging

Abstract Cell‐free DNA (cfDNA) is present in the circulating plasma and other body fluids and is known to originate mainly from apoptotic cells. Here, we provide the first in vivo evidence of global and local chromatin changes in human aging by analyzing cfDNA from the blood of individuals of different age groups. Our results show that nucleosome signals inferred from cfDNA are consistent with the redistribution of heterochromatin observed in cellular senescence and aging in other model systems. In addition, we detected a relative cfDNA loss at several genomic locations, such as transcription start and termination sites, 5′UTR of L1HS retrotransposons and dimeric AluY elements with age. Our results also revealed age and deteriorating health status correlate with increased enrichment of signals from cells in different tissues. In conclusion, our results show that the sequencing of circulating cfDNA from human blood plasma can be used as a noninvasive methodology to study age‐associated changes to the epigenome in vivo.

Several aging biomarkers such as C-reactive protein and insulinlike growth factor-1 have been identified as predictive for mortality (Castagne et al., 2018). The identification of circulating biomarkers is of increasing interest in the study of human aging, especially when these biomarkers are applied to the measurement of biological age (Capri et al., 2015). Recent data showed that subjects of the same chronological age, including centenarians, can have younger or older biological ages that, in turn, are associated with morbidity and mortality (Chen et al., 2016). Among the different biomarkers that have been proposed, which include DNA methylation and N-glycans (Horvath, 2013;Miura & Endo, 2016), cell-free DNA (cfDNA) appears particularly promising due to the ease of collecting specimens and the ever-decreasing costs of genomic sequencing. However, little is known about how cfDNA changes with age.
High levels of cfDNA were first reported in 1966 in the circulating serum of patients with systemic lupus erythematosus (Tan, Schur, Carr, & Kunkel, 1966) and were later discovered in the plasma of cancer patients (Stroun, Anker, Lyautey, Lederrey, & Maurice, 1987).
cfDNA originates primarily from cell death through apoptosis or necrosis (van der Vaart & Pretorius, 2007), and recently, new methods have been developed to trace the tissues of origin of cfDNA through nucleosome positioning and methylation footprints (Lehmann-Werman et al., 2016;Snyder, Kircher, Hill, Daza, & Shendure, 2016). These methodologies allow the detection of tissue-specific damage or disease through liquid biopsies.
The aging process is associated with cellular stress and is accompanied by alterations to the number of apoptotic cells and DNA release (Jylhava et al., 2013;Pollack, Phaneuf, Dirks, & Leeuwenburgh, 2002). Older individuals were reported to exhibit higher levels of circulating cfDNA (Jylhava et al., 2011), including mitochondrial DNA (cf-mtDNA) (Pinti et al., 2014). Aging is also associated with chronic systemic inflammation or inflammaging. The cause of this phenomenon in older individuals may come from different sources, one of which is the increased number of senescent cells that can secrete senescence-associated secretory phenotype factors to drive inflammation (Franceschi & Campisi, 2014). Other factors such as age-associated accumulation of metabolites or cell debris, including self and non-self-nucleic acids (Franceschi, Garagnani, Vitale, Capri, & Salvioli, 2017), can act as damage-associated molecular patterns (DAMPs) that trigger immune response and subsequent inflammation (Franceschi & Campisi, 2014). For instance, high level of total cfDNA in nonagenarians is associated with systemic inflammation and frailty (Jylhava et al., 2013). cfDNA has also been shown to be one of the triggers to adipocyte inflammation in obese mice due to the increased cell death in fat tissues (Nishimoto et al., 2016).
In this study, we used cfDNA to characterize the nucleosome landscape and the contributing tissues of age-associated cell death.
We performed whole-genome sequencing on cfDNA collected from the plasma of individuals in four groups composed of young, old, and two cohorts of centenarians (divided into healthy and unhealthy populations).

| Profiling of cfDNA at different ages and in extreme longevity
Aging is known to be associated with increased cell death, which may contribute to a change in the cfDNA released into circulation.
Hence, we profiled cfDNA extracted from a total of 12 individuals from different ages and health conditions. Specifically, we performed whole-genome sequencing of cfDNA from three 25-year-old ± 0.5 (SD) subjects (referred to as young), three 71-year-old ± 1.6 subjects (referred to as old), and six 101.8-year-old ± 1.1 centenarians (Supporting Information Tables S1 and S2). The centenarian cohort was further divided into two groups: three healthy and three unhealthy individuals. Healthy centenarians were characterized as having good cognitive performance, that is, SMMSE (Standardized Mini-Mental State Examination) > 24 scores, retaining the ability to walk, and having a high ADL (Activities of Daily Living) score (Supporting Information Table S1). In contrast, unhealthy centenarians had dementia, were not able to perform the SMMSE, and were bedridden. In addition, the two centenarian cohorts showed significant differences in 5 out of 32 hemato-biochemical parameters tested: RBC, HGB, HCT, ALB, and HDL (Supporting Information Table S5).

| Cell-free DNA reveals in vivo nucleosome landscape changes with age
The analysis of the fragment length of cfDNA derived from blood plasma showed an enrichment of 166-to 175-bp fragments (Figure 1a, Supporting Information Table S3), which corresponds to the length of a chromatosome. In healthy individuals, most of these cfDNA fragments originate from apoptotic cells of hematopoietic origin (Lo et al., 2010). Several studies have demonstrated that the fragmentation patterns of cfDNA can reveal the nucleosome landscape of dead cells from which the cfDNA is derived. This is because DNA wrapped around the histone octamers and linker histones is protected from digestion during apoptosis (Ivanov, Baranova, Butler, Spellman, & Mileyko, 2015;Snyder et al., 2016).
We used DANPOS2 to identify the cfDNA signals and to define the nucleosome landscape in these samples (Chen et al., 2013). We compared the nucleosome positioning patterns to micrococcal nuclease-seq (MNase-seq) nucleosome signals of GM12878, a lymphoblastoid cell line, obtained from ENCODE (Consortium, 2012). The signals from cfDNA aligned well with the nucleosome signals from GM12878 (Pearson's = 0.77, p < 2.2 × 10 −16 ), suggesting similarity between the fragmentation patterns of cfDNA and MNase-treated To quantify and characterize this age-dependent change on a global scale, we analyzed the average cfDNA signals within 100-kb regions across the whole genome and annotated the signals with subcompartments identified from the GM12878 cell line (Rao et al., 2014;Figure 2a-c). These subcompartments were identified using Hi-C data and were associated with distinct histone modifications. Subcompartments A1 and A2 consist of euchromatic regions that are gene rich, subcompartment B1 consists of facultative heterochromatic regions, subcompartment B2 is enriched at the nuclear lamina and at nucleolus-associated domains (NADs), whereas subcompartment B3 is also enriched at the nuclear lamina but not at NADs. We excluded subcompartment B4 because it is only present on chromosome 19 (Rao et al., 2014). In young individuals, cfDNA signals are the highest in subcompartment B1, followed by compartment A and subcompartments B2 and B3 (Rao et al., 2014) (p < 2 × 10 −16 ; Kruskal-Wallis test and post hoc Dunn's test) (Figure 2a-d). We also applied the same method to GM12878 MNase-seq and observed the same pattern of signal enrichment in the different subcompartments to the cfDNA signals, further supporting the similarity between MNase-treated and cfDNA samples (p < 2 × 10 −16 ; Kruskal-Wallis test and post hoc Dunn's test) (Supporting Information Figure S2a,b). In contrast, we analyzed ATAC-seq data from GM12878 in the different subcompartments and observed higher signals in compartment A and lower signals in compartment B, including B1, suggesting that the highest signals observed previously in subcompartment B1 is a unique feature of both MNase-seq and cfDNA-seq (Supporting Information Figure S2c).
The variance of cfDNA signals across compartments significantly decreased with age (Levene's test, all pairwise comparisons p < 0.05) and is the lowest in the unhealthy centenarians, further supporting the redistribution of signals from heterochromatin regions to euchromatic regions in old age and deteriorating health condition (Figure 2d). Overall, we observed a significant increase in cfDNA signals in subcompartments B2 and B3 and decrease in subcompartments A1, A2, and B1 in the old group, healthy centenarians (except for A2), and unhealthy centenarians compared to young group (Figure 3a-d, Table 1). Significantly lower cfDNA signals were also observed in subcompartments A1 and A2, and significantly higher cfDNA signals were observed in subcompartments B2 and B3 in unhealthy centenarians and in the old group compared to healthy centenarians. This trend was similar to the comparison to the young group. (Supporting Information Figure S3a-d, Table 1). The comparison between unhealthy centenarians and the old group showed increased signals in subcompartments A2, B2, and B3 and decreased signals in subcompartments A1 and B1 in unhealthy centenarians.
The global cfDNA signals of unhealthy centenarians are highly correlated to old and differed the most from the young group (Supporting Information Figure S3e,f). Healthy centenarian displayed the highest cfDNA signals correlation among other age groups when compared to the young group (Pearson's correlation = 0.603).

| Local cfDNA profiles with age
To study local nucleosome profile changes with age in gene regions, we computed the average cfDNA signals across all genes relative to transcription start sites (TSS) and transcription termination sites (TTS) of genes in autosomal chromosomes (Figure 4a, and TTS, which are also commonly observed in MNase assays (Schones et al., 2008;Valouev et al., 2011). Notably, within the ±1,500 bp range from TSS, we observed a relative loss of cfDNA signals with age. We detected the highest cfDNA signals in young, followed by healthy centenarian, old, and, lastly, unhealthy centenarians ( Figure 4a, Supporting Information Figure S4a) (Kruskal-Wallis To identify the variability within age groups, we also calculated the coefficient of variation (CV) of cfDNA signals at the TSS. Young replicates showed a 6.2% CV, old replicates showed a 5.1% CV, healthy centenarian replicates showed a 4% CV, and unhealthy centenarian replicates showed a 5% CV. In addition, we also observed a similar decrease in signals within the ±1,500 bp range from TTS with age ( Figure 4b, Supporting Information Figure S4b)  CV, and unhealthy centenarian replicates showed a 1.5% CV. Global nucleosome loss has been reported in aging yeast, and this phenomenon leads to the deregulation of transcriptional activity with age (Hu et al., 2014).
In addition to TSS and TTS, we also assess cfDNA signals around CCCTC-binding factor (CTCF)-binding sites, which were obtained from GM12878 ENCODE ChIP-Seq data. CTCF has been shown to play roles in several biological processes, including the regulation of old and unhealthy centenarian, p = 0.13; healthy centenarian and unhealthy centenarian, p = 7.8 × 10 −7 ; healthy centenarian and old, p = 5.1 × 10 −4 ). Young replicates showed a 3.7% CV, old replicates showed a 2.7% CV, healthy centenarian replicates showed a 1.4% CV, and unhealthy centenarian replicates showed a 3.6% CV.
To assess the contribution of TSS and TTS to the global change of cfDNA distribution and local nucleosome profile, we masked ±1,500 bp of all TSS and TTS from the genome and reanalyzed the changes of cfDNA signals in subcompartments. We still observed a loss of signals in compartment A and subcompartment B1 and a gain of signals in compartment B2 and B3 in comparison with young (Supporting Information Figure S4d, Table S4). Therefore, the local

| cfDNA signals at transposable elements
We observed decreased cfDNA signals with age within the first 668 bp of the 5′UTR of a transposable element, L1HS, in which the promoter and enhancer were found (Speek, 2001;Swergold, 1990),   Figure S5b,c) at the 5′UTR of L1HS. We also observed the highest signals at AluY in the healthy individual, followed by liver cancer and prostate cancer (Supporting Information Figure S5d)

| Increased cfDNA from tissues in old individuals and unhealthy centenarians
A significant portion of cfDNA is derived from the hematopoietic lineage. However, deregulations of apoptosis have been implicated in aging as its rate increases in some cell types (Ciccocioppo et al., 2002;Tower, 2015;Vazquez-Padron et al., 2004). To identify the tissues that give rise to this cfDNA in young, old, and centenarians, we used the method developed by Snyder et al. (2016) Figure S6a).
We applied this method to the lung cancer dataset from Snyder et al. (2016). We detected that liver tissues from GTEx significantly increased in rank (p < 0.05) in the liver cancer sample compared to the two healthy controls, suggesting an increased contribution of cfDNA from the liver (Supporting Information Figure S6c) and verifying the method using the GTEx dataset.
By analyzing the aging cfDNA data, we detected the highest cor-  Figure S6b). We observed that healthy centenarians did not show any detectable increased tissues' cell death in comparison with the young group, whereas old individuals and unhealthy centenarians showed increased contributing tissues compared to young individuals (Figure 6a,b).

| DISCUSSION
Cell-free DNA has gained much attention in recent years for its translational potential as a biomarker for cancer (Jung, Fleischhacker, & Rabien, 2010), acute organ transplant rejection (De Vlaminck et al., 2015), and aneuploidy maternal screening tests for genetic disorders like Down syndrome (Ke, Zhao, & Wang, 2015). Here, we studied cfDNA via DNA sequencing in individuals of different aged and health conditions. cfDNA has been reported to increase with age and in nonagenarians (Jylhava et al., 2013;Pinti et al., 2014).
However, we did not observe any significant difference in the concentration of cfDNA in our samples across groups, due to the high interindividual variability within groups (1-way ANOVA, p = 0.8; Supporting Information Table S1). It is important to note that interindividual variation in biological age markers, such as DNA methylation and potentially age-associated proteins or transcript biomarkers, is usually high (Franceschi et al., 2018;Horvath, Garagnani, et al., 2015;Kooman et al., 2017;Kooman, Kotanko, Schols, Shiels, & Stenvinkel, 2014). This is in accordance with the concept of "immunobiography," which stipulates that high heterogeneity is expected particularly in old populations, as individuals may have accelerated or decelerated aging processes . Therefore, it is important to study cfDNA in the plasma of individuals in very different age groups, including centenarians, a group that has reached the extreme limit of human lifes- Tissue ranking Y o u n g O l d H e a l t h y c e n t e n a r i a n U n h e a l t h y c e n t e n a r i a n Y o u n g O l d H e a l t h y c e n t e n a r i a n U n h e a l t h y c e n t e n a r i a n Y o u n g O l d H e a l t h y c e n t e n a r i a n U n h e a l t h y c e n t e n a r i a n Y o u n g O l d H e a l t h y c e n t e n a r i a n U n h e a l t h y c e n t e n a r i a n (a)

Old -Young Healthy centenarian -Young Unhealthy centenarian -Young Esophagus squamous epithelium
Decreased rank ( 47 ) -Increased rank ( 57 )  aging is associated with the loss of nucleosomes (Hu et al., 2014) and replicative senescence is associated with increased accessibility of heterochromatic regions (Criscione, Teo, & Neretti, 2016;De Cecco, Criscione, Peckham, et al., 2013). So far, these genome-wide chromatin reorganizations have only been observed in model organisms and cell cultures.
Here, we provide comprehensive in vivo evidences of global and local chromatin changes in human aging. We observed the highest cfDNA signal in subcompartment B1, followed by A1, A2, B2, and, T A B L E 1 Change of cell-free DNA signals across different age groups. The change in direction is only identified for significant comparisons in which zero is excluded from the credible interval  (Li & Zhou, 2013). However, when we analyzed ATACseq signals in different subcompartments, all B subcompartments, including B1, showed lower signals than any of the A subcompartments. One way to explain this unique feature of B1 is a different accessibility of facultative heterochromatin to the enzymes used in the three assays. MNase is a 17-kDa protein (Taniuchi, Anfinsen, & Sodja, 1967) and caspase-activated DNase (CAD) is a 40-kDa protein (Liu et al., 1998). Both are smaller in size than the Tn5 transposase (53.3 kDa) (Naumann & Reznikoff, 2002) and the adapters that it carries. Therefore, it is possible that MNase and CAD can more easily access and cut DNA in the facultative heterochromatin than the larger size transposase. On the other hand, B2 and B3 consisting of constitutive heterochromatin are the least accessible for all three enzymes, hence, display the lowest signals among all subcompartments. This is consistent with the result of a recent study that shows that different MNase titration is required to explain chromatin accessibility because nucleosome signals vary with different level of digestion (Mieczkowski et al., 2016).
We observed increased cfDNA signals in subcompartments B2 and B3, which are enriched in lamina-associated domains (LADs), with age. Specifically, cfDNA levels from subcompartment B2 in old individuals and centenarian resemble that of subcompartment A1 in young individuals, suggesting that B2 has become more euchromatic.
In addition, subcompartment A1 in old becomes more similar to subcompartment B2 in young. This indicates that there is a switch between subcompartments A1 and B2 with age. The decrease in cfDNA signals in subcompartment A1 in unhealthy centenarians leads to the subcompartment having lower signals compared to subcompartments A2, B2, and B3, indicating that it has become the most heterochromatic subcompartment. Although signals from sub- suggest a crucial hypothesis on aging and aging-related diseases as a continuum during lifespan trajectories (Franceschi et al., 2018).
The activation of L1HS retrotransposon has also been implicated in aging (De Cecco, Criscione, Peckham, et al., 2013;De Cecco, Criscione, Peterson, et al., 2013). Unlike genes, L1HS has an internal promoter at the 5′UTR (Speek, 2001). We detected a relative loss of cfDNA signals in these locations with age, with unhealthy centenarian and old group showing the lowest cfDNA coverage compared to young individuals. We propose that cfDNA can be used as a method to identify L1HS activation in vivo. AluY also showed a similar trend of cfDNA level changes with age as L1HS, suggesting a derepression of this repetitive element in aging and cancer.
We also noticed that the fraction of reads mapping to the mitochondrial genome is slightly increased with age although not significant (data not shown). The key limitation that restricts our ability to investigate this feature is the short-length cell-free mtDNA (cf-mtDNA) (Zhang, Nakahira, Guo, Choi, & Gu, 2016) and the protocol that we used to extract cfDNA from the plasma does not specifically enrich for short DNA. Therefore, the signals from our samples might not represent the whole cf-mtDNA population in the plasma.
Cell-free DNA has been recently used to identify the tissue of origin of apoptotic cells (Feng, Jin, & Wu, 2018;Guo et al., 2017;Snyder et al., 2016;Sun et al., 2015). We applied one of these methods, originally developed for cancer studies (Snyder et al., 2016), to our cfDNA data and observed that the magnitude of the shifts in tissues ranking with age was lower compared to what has been reported in cancer patients, suggesting a moderate-to-low increase in cell death with age as compared to cancer. We did not detect any changes in tissue ranking in healthy centenarian compared to young, but we observed a few tissues increased in ranking in old and unhealthy centenarians compared to young. Because the methodology we used was developed in the context of conditions displaying large shifts in cfDNA tissue composition, we cannot exclude that the age-dependent shifts might be below its detection limit. For example, one of the limitations of this method is the low correlation between nucleosome profiles at genes and gene expression (Supporting Information Figure S6b), which reduces its sensitivity and specificity when a large number of tissue types are queried. Alternative approaches that use tissue-specific nucleosome profiles as opposed to gene expression might improve our ability to detect more subtle changes in cell death across multiple tissues.
Consistently across our study, we noted more similarity in the cfDNA profiles, both globally (compartments) and locally (e.g., TTS and TSS), between young and healthy centenarians, as opposed to old and unhealthy centenarians. Hence, our study suggests cfDNA profiling could be used not only as a biomarker of age but also as a predictor of health status. However, due to the small number of subjects used, our power to translate these findings into a real predictor is very limited. Larger cohorts, more age groups, and health conditions would be needed to properly take into account the interindividual variability, as well as any potential association with lifestyle and nutritional differences.

| Sample processing
Cell-free DNA was isolated from 3 to 4 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen, Hilden, Germany). At least 20 ng of total DNA from each sample was extracted and quantified using Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).

| Sequencing library preparation and sequencing
Sequencing libraries were prepared from the purified DNA using Ovation ® Ultralow Library Systems V2 library preparation kit (NuGen Technologies, San Carlos, CA, USA). Sequencing was performed on HiseqX-Ten Illumina platform at BGI Beijing Genomics Institute, Hong Kong.

| cfDNA sequencing reads alignment and processing
Paired-end reads were aligned to hs37d5 genome using BWA-MEM 0.7.12 (Heng Li, 2013). Reads mapping to the mitochondrial, X and Y chromosomes, and reads with MAPQ 0 were removed. Duplicate reads were discarded using picard-tools-1.88. Reads that were softclipped of <75 bp were retained. DANPOS2 (Chen et al., 2013) was used to assess the cfDNA signals at TSS, TTS, and CTCF. BAM files of triplicates from each age group were merged using SAMTools (Li et al., 2009) and used to identify fragment sizes using picard-tools-1.88 (http://broadinstitute.github.io/picard). Subcompartments information was obtained from the Hi-C data of GM12878 (Rao et al., 2014). cfDNA signals across subcompartments were identified by the number of reads mapping to each 100-kb region after downsampling all samples to 46,301,323 reads. The reads were averaged between replicates to generate the average cfDNA signals. To reconstruct the 3D organization of chr 11 at 100 kb resolution, Hi-C matrix of GM12878 was obtained using juicer (Durand et al., 2016) and the xyz coordinates were obtained using ShRec3D (Lesne, Riposo, Roger, Cournac, & Mozziconacci, 2014).

| Changes of cfDNA signals in subcompartments
DESeq2 1.20.0 (Love, Huber, & Anders, 2014) was used to identify the log fold change of each 100 kb region between age groups.
MCMCglmm (Hadfield, 2010) with default parameters was used to identify significant changes of cfDNA signals in different subcompartments.
Age group was used as the fixed factor predictors, and the 100-kb binning regions in each subcompartment were set as the random factor.

| Identification of cfDNA contributing tissues
We used the method developed by Snyder et al. (2016) (Snyder et al., 2016) were downloaded and processed the same way as our cfDNA samples to identify the tissues of origin. Prostate cancer cfDNA samples, IC26 (SRR2130025), IC13 (SRR2130012), IC40 (SRR2130038), a healthy sample, IH02 (SRR2130051), and a liver cancer sample, IC17 (SRR2130016) under accession number GSE71378 were downloaded and processed the same way as our cfDNA samples to identify the signals at L1HS and AluY.

| GM12878 MNase-seq data processing
GM12878 MNase-seq data deposited under accession number GSM920558 were downloaded and aligned to hs37d5 using bowtie (Langmead, Trapnell, Pop, & Salzberg, 2009). Mitochondrial, X and Y chromosomes mapped reads were removed. Duplicate reads were discarded using picard-tools-1.88, and 100-kb MNase-seq signals across subcompartments were identified by the number of reads mapping to each 100-kb region.

| GM12878 ATAC-seq data processing
GM12878 ATAC-seq data deposited under accession number GSE47753 were downloaded and aligned to hs37d5 using BWA-MEM 0.7.12 (Li, 2013). Duplicates were removed, and peaks were called using MACS 2.1.1 (Zhang et al., 2008). The number of reads in each 100 kb of the subcompartments was identified.

| cfDNA coverage at repetitive elements
Paired-end reads were aligned to hs37d5 using bowtie2 (Langmead & Salzberg, 2012) and separated into uniquely mapped and multi-mapped reads. Uniquely mapped reads were counted at each repetitive element's location, and their genomic positions in hs37d5 were converted to the positions in the consensus. The counts were then averaged at each position in the consensus. Multi-mapped reads were aligned to the repeat's assemblies representing each repetitive element subfamilies. Reads that mapped to AluY or L1HS were extracted and subsequently mapped to the respective repetitive element's consensus sequence. The read was divided by the number of repetitive element subfamilies that it mapped to, and the compiled counts in each consensus location were averaged.
The final counts were defined as the sum of counts obtained from the uniquely mapped counts and that of the multi-mapped counts and were normalized by the library size.

| Data availability
Cell-free DNA sequenced reads have been deposited in the NCBI GEO database with accession number GSE114511.

This work was supported by a Brown-Brazil Collaborative Research Fund
(GFT640009) and NIH/NIA R01AG050582-01A1 to NN, and by a FARB2-2014-Alma Mater Studiorum, University of Bologna-to MC.
Part of this research was conducted using computational resources at the Center for Computation and Visualization, Brown University.

CONFLI CT OF INTERESTS
None declared.

AUTHOR CONTRI BUTIONS
NN, CF, MC, and AMCF conceived and designed the study; MC, CM, and GP collected samples and extracted the cell-free DNA; YVT and NN designed the data analysis; YVT performed the data analysis; YVT, NN, MC, and CF wrote the manuscript.