Data mining of human plasma proteins generates a multitude of highly predictive aging clocks that reflect different aspects of aging

ABSTRACT We previously identified 529 proteins that had been reported by multiple different studies to change their expression level with age in human plasma. In the present study, we measured the q‐value and age coefficient of these proteins in a plasma proteomic dataset derived from 4263 individuals. A bioinformatics enrichment analysis of proteins that significantly trend toward increased expression with age strongly implicated diverse inflammatory processes. A literature search revealed that at least 64 of these 529 proteins are capable of regulating life span in an animal model. Nine of these proteins (AKT2, GDF11, GDF15, GHR, NAMPT, PAPPA, PLAU, PTEN, and SHC1) significantly extend life span when manipulated in mice or fish. By performing machine‐learning modeling in a plasma proteomic dataset derived from 3301 individuals, we discover an ultra‐predictive aging clock comprised of 491 protein entries. The Pearson correlation for this clock was 0.98 in the learning set and 0.96 in the test set while the median absolute error was 1.84 years in the learning set and 2.44 years in the test set. Using this clock, we demonstrate that aerobic‐exercised trained individuals have a younger predicted age than physically sedentary subjects. By testing clocks associated with 1565 different Reactome pathways, we also show that proteins associated with signal transduction or the immune system are especially capable of predicting human age. We additionally generate a multitude of age predictors that reflect different aspects of aging. For example, a clock comprised of proteins that regulate life span in animal models accurately predicts age.


| INTRODUC TI ON
A panel of molecules capable of predicting chronological age when modeled is referred to as an aging clock (Galkin et al., 2020a). Existing examples of human aging clocks include those comprised of methylated DNA (Hannum et al., 2013;Horvath, 2013), RNA (Mamoshina et al., 2018), proteins , metabolites (Rist et al., 2017;Robinson et al., 2020), biochemical markers (Putin et al., 2016;Sagers et al., 2020), or microbiota (Galkin, et al., 2020b). For a more detailed discussion of different types of aging clocks, we recommend a comprehensive review by Galkin et al. (2020a). A recent proteomic aging clock found that individuals with a lower predicted age than their chronological age performed better on cognitive and physical tests (Lehallier et al., 2019). An RNA clock demonstrated that the difference between predicted and actual age was associated with body mass index, blood pressure, fasting glucose, and cholesterol levels (Peters et al., 2015). A much larger body of work using DNA methylation clocks has shown that patients with age-related disease often have a higher predicted age than their chronological age (Horvath & Raj, 2018). These data suggest that aging clocks have the ability to measure biological age, which can be conceptualized as a composite measure that correlates with various health outcomes.
Given that it is not realistic to perform life span studies in humans, a prominent appeal of aging clocks is their potential ability to accelerate anti-aging clinical trials (Horvath & Raj, 2018). Prior to and after testing of an anti-aging intervention, biological age could be measured in a patient cohort. Theoretically, a therapy that successfully combats aging would be one where biological age is reduced compared to controls at the end of the treatment period.
Repeated measurements of biological age also have the potential to be highly informative on an individual level. They could, for example, suggest whether or not someone ought to more aggressively pursue health-promoting interventions to slow down their rate of aging. The requirement for repeat sampling necessitates a sample type that can be measured safely and easily, such as blood or saliva. Since the aging clock field is nascent, much work remains to be done to confidently determine if these theoretical applications are feasible.
In addition to existing drugs whose promising anti-aging potential should be safely tested in humans , designing novel therapies capable of improving human health span will require well-considered molecular targets. A wide variety of approaches have been historically utilized to identify aging-relevant targets and therapeutics, including RNAi screening in worms (Hansen et al., 2005), computational screening of the protein-drug interactome (Fuentealba et al., 2019), and omics-level expression screening in mice (Villeda et al., 2011). As an example of the latter, young mice exposed to the blood of old mice via heterochronic parabiosis exhibit decreased synaptic plasticity as well as impairments in memory and learning. A proteomics expression screen identified that the chemokine Ccl11 was the most significantly altered protein in these heterochronic parabionts. Subsequently, treating young mice with Ccl11 was found to induce various deleterious effects in the brain (Villeda et al., 2011).
With the ultimate objective of improving human health span in mind, we sought to better understand proteomic aging clocks and to identify high-quality protein targets that exhibit anti-aging clinical potential. Since systematic factors are powerful regulators of aging (Pluvinage & Wyss-Coray, 2020), we aimed to achieve these goals by comprehensively data mining human plasma proteins.

| Analysis of all 529 common plasma aging proteins in a large proteomics dataset
Our recent systematic review identified 529 proteins that were reported to change their expression level with age in human plasma by two or more different studies . In the present study, we began analyzing these proteins by measuring their q-value and age coefficient in a plasma proteomic dataset derived from 4263 healthy individuals with an age range of 18-95 years.
Proteomic measurements were previously performed using the SOMAscan assay, which utilizes individual SOMAmers to measure different proteins (Lehallier et al., 2019). Our 529 proteins (Table S1) were condensed into 523 protein entries (Table S2) in this dataset due to some measurements containing multiple different proteins.
For example, the heterotrimeric enzyme AMPK was measured using the single SOMAmer "PRKAA1.PRKAB1.PRKAG1." Twenty-seven proteins were not available for measurement and, of the 496 protein measurements, 476 (95.97%) significantly (q < 0.05) changed their expression level with age. Of these 476 significant protein entries, 115 (24.16%) trended toward a decreased expression level with age while 361 (75.84%) trended toward an increased expression level with age. These and other statistics are summarized in Table S3. The six protein measurements with the lowest q-values are shown in

| Many common aging plasma proteins have highly intriguing links to aging and/or health
Timp2 enhances cognition and synaptic plasticity (Castellano et al., 2017). Ablating Cdon in satellite cells hinders muscle regeneration in mice (Bae et al., 2020), mice lacking Il6 exhibit impaired liver regeneration (Cressman et al., 1996), and the myeloid cell-specific ablation of Plxnb2 in mice impairs motor recovery following spinal cord injury (Zhou et al., 2020). Cardiac hypoplasia is caused by the deletion of tmem87b in zebrafish (Russell et al., 2014) while mice overexpressing Nab1 are resistant to cardiac hypertrophy (Buitrago et al., 2005). Diabetes in mouse models of insulin resistance, insulin deficiency, and obesity can be reversed by the overexpression of Igfbp2 (Hedbacker et al., 2010) and, in contrast, mice harboring a mutation in Lep become obese and diabetic (Zhang et al., 1994).
More broadly, connections pertinent to age-related disease, the canonical insulin/IGF1, AMPK, and TOR aging pathways (Singh et al., 2019), and lipid metabolic pathways that directly regulate aging (Johnson & Stolzing, 2019) were identified. We selected the following 20 proteins to highlight that prominently impact longevity and/or age-related disease when manipulated in an animal model: Interesting literature connections for these proteins are listed in Table 1 and graphs visualizing how the expression level of these proteins changes with age are shown in Figure S1.

| A large proportion of common aging plasma proteins affect animal life span
Among the literature connections identified for all of our common aging plasma proteins (Table S2) Table 2 and graphs visualizing how the expression level of these proteins changes with age are shown in Figure S2. F I G U R E 1 529 proteins that were previously identified to change their expression level with age in human plasma were analyzed in a large, proteomic dataset derived from 4263 healthy individuals with an age range of 18-95 years. The six proteins that exhibited the most significant change in plasma expression level with age were CGA.FSHB (a), SOST (b), GDF15 (c), MLN (d), RET (e), and PTN (f). The expression trend over time is visually shown for each protein. RFU = relative fluorescent unit TA B L E 1 20 examples of common aging plasma proteins with highly intriguing links to aging and/or disease  (Glasson et al., 2005) • ADAMTS5 is overexpressed in osteoarthritic cartilage from mice and humans (Lin et al., 2009) Chandrashekar et al., 2007) • Bone loss is mitigated in ovariectomized mice treated with an antibody specific to the β-subunit of follicle-stimulating hormone (Zhu et al., 2012) • An antibody specific to the β-subunit of follicle-stimulating hormone decreases body fat, stimulates brown adipose tissue, and promotes thermogenesis in mice (Liu et al., 2017) FGA.FGB. IL6 4.13E−05, 7.16E−04 • The ability to ward off bacterial or viral infection is impaired in Il6 knockout mice (Kopf et al., 1994) • Genetically disrupting Il6 in mice impairs liver regeneration and causes liver failure (Cressman et al., 1996)

| Well-known anti-aging drugs and interventions are implicated by our common aging plasma proteins
Many of our 529 common aging plasma proteins were also implicated by established anti-aging drugs and interventions (Table S2) Among the enriched terms for all 35 vertebrate longevity proteins was the immunosuppressant "sirolimus," which is another name for rapamycin ( Figure S3B). Other aging-relevant enriched drug terms included "cardiovascular system" as well as the anti-cancer drugs "doxorubicin" and "erlotinib" ( Figure S3B).

| Diverse processes pertinent to the immune system are strongly implicated by plasma proteins that trend toward an increased expression level with age
We next performed enrichment analyses in the Gene Ontology Biological Process (GO BP) database (The Gene Ontology, 2019) for different sets of proteins. For the proteins that significantly trend toward increased expression with age, a very prominent theme of the immune system was apparent. Among the top 30 GO BP terms ( Figure 2), the following six terms relevant to the immune system were identified: "leukocyte migration," "response to molecule of bacterial origin," "response to interleukin-1," "granulocyte activation," "leukocyte cell-cell adhesion," and "viral life cycle." The proteins that significantly trend toward decreased expression with age were associated with the following enriched terms: "positive regulation of response to external stimulus," "protein activation cascade," "protein kinase B signaling," "extracellular structure organization," and "neutrophil mediated immunity" ( Figure S4A).
For the plasma proteins that can impact longevity in normal animals, the enriched terms were quite diverse ( Figure S4B).
Themes of nutrient intake and metabolism (i.e., "response to nutrient levels," "regulation of carbohydrate metabolic process," and "response to ketone") and the immune system (i.e., "response to transforming growth factor beta" and "neutrophil mediated immunity") were present. Terms relevant to protein homeostasis (i.e., "positive regulation of proteolysis") and stress resistance (i.e., "response to oxidative stress") were also identified ( Figure   S4B). For the larger list of proteins that can impact longevity in any animal model, we collated the top 30 GO BP terms ( Figure   S5). Prominent themes pertinent to cell movement, cell growth and proliferation, the immune system, and the circulatory system were identified ( Figure S5).  (Duan et al., 2016) • A homozygous mutation in UFM1 causes early-onset encephalopathy with progressive microcephaly in humans (Nahorski et al., 2018) For each protein, the q-value and age coefficient (measured in a human proteomic dataset derived from 4263 individuals aged 18-95 years) as well as three relevant connections to aging and/or disease are provided. proteins (Table S4). We additionally tested the following five clocks based on the top weighted set cover enrichment result (for all 529 proteins) in the Reactome (Jassal et al., 2020), Panther (Mi & Thomas, 2009), KEGG (Kanehisa & Goto, 2000), WikiPathways (Slenter et al., 2018), and GO BP (The Gene Ontology, 2019) databases: proteins associated with "peptide hormone biosynthesis" in Reactome, proteins associated with "plasminogen activating cascade" in Panther, proteins associated with "complement and coagulation cascades"

| Machine-learning analyses uncover numerous aging clocks reflecting different aspects of aging
in KEGG, proteins associated with "human complement system" in WikiPathways, and proteins associated with "leukocyte migration" in GO BP (Table S5).
The Pearson correlation for predicted vs. actual age (Figure 3a) and the median absolute error (MAE) (Figure 3b) (Migliaccio et al., 1999) For each protein, the q-value and age coefficient (measured in a human proteomic dataset derived from 4263 individuals aged 18-95 years) as well as the life span effect are included. Bolded words and numbers highlight the lifespan effect in response to a given intervention. a A follow-up study assessed life span in Shc1 knockout mice at two different locations. At one location, Shc1 −/− mice on a 40% calorie restriction diet exhibited a survival benefit (median 70th percentile survival was increased by 8%). At the other site, no longevity benefit was observed in Shc1 knockout mice fed ad libitum (Ramsey et al., 2014).

TA B L E 2
Examples of common aging plasma proteins that can significantly extend life span in a vertebrate animal model when manipulated these results to a clock comprised of all 2978 proteins available for measurement in our plasma proteomic dataset. Detailed information for each clock is provided in Table S6.
Of our 12 proposed plasma proteomic aging clocks (Tables S4   and S5) We additionally provide the SOMAmer name, UniProt ID, gene name,

F I G U R E 2
An overrepresentation analysis in the Gene Ontology Biological Process database was performed for all proteins that significantly (q < 0.05) change their expression level with age in human plasma and have a positive age coefficient. The top 30 enrichment results are presented as -log10(fdr) and protein name for each component of our most predictive clock in Table S7. Intercept and coefficient information is provided in Table   S8. The set of 491 protein entries that make up this ultra-accurate clock contains multiple common aging plasma proteins that are direct regulators of aging and health (Table S2) (Johnson, 2020), and the insulin receptor protein (Blüher, 2003).
An enrichment analysis of the proteins in this clock heavily implicated various immune and inflammatory processes ( Figure S8). This clock is predictive in both men and women (Table S9).
We additionally tested the ability of this ultra-predictive clock to measure age in two independent plasma proteomic datasets that were previously generated. The first dataset is comprised of 171 in-  Figure S9A). For the latter dataset, the Pearson correlation was 0.91 ( Figure S9B). Thus, this clock is able to accurately predict age with a Pearson correlation ≥0.9 in three different human cohorts (Figure 4 and Figure S9). This patient cohort contained individuals that were sedentary as well as individuals that were aerobic exercise-trained. Using our most predictive clock (Figure 4), we demonstrate that the sedentary individuals from this cohort exhibit a higher predicted age than their chronological age ( Figure 5). In contrast, those that are aerobic exercise-trained displayed a predicted age that was more similar to their chronological

F I G U R E 3
The ability of 13 different protein sets to predict age in a plasma proteomic dataset derived from 3301 human participants (age range of 18-76 years) was tested using machine learning. For each clock, the learning set utilized 2178 subjects and the test set utilized 1123 subjects. LASSO modeling was also performed for each clock to determine if a smaller set of proteins within the larger set could accurately predict human age. For each of these clocks, the Pearson correlation (a) and median absolute error (b) are reported. The two numbers in parenthesis for each clock indicate the number of available SOMAmers used for the subset of proteins identified by LASSO modeling or the full list of proteins age ( Figure 5). For sedentary individuals, the respective chronological and predicted ages were 37.54 ± 20.88 and 46.34 ± 26.48 years.
For aerobic exercise-trained individuals, the respective chronological and predicted ages were 37.35 ± 19.82 and 40.91 ± 18.48 years. The delta between chronological and predicted age was significantly different between the sedentary and aerobic exercise-trained groups (p-value = 6.7E-5). The predicted age difference between aerobic exercise-trained and sedentary individuals was 5.43 years.
Interestingly, many of the proteins contained in our 491-entry clock were previously used by Williams et al to generate plasma protein models that can accurately predict various health outcomes . We found that many of the proteins used to predict the following health outcomes were also present in our highly predictive clock: alcohol consumption, cardiopulmonary fitness, cardiovascular primary event risk, current cigarette smoking, diabetes diagnosis within 10 years, energy expenditure from physical activity, kidney filtration, lean body mass, liver steatosis, percent body fat, and visceral adipose tissue. The specific overlapped proteins for each health outcome predictor are listed in Table S10.

| Proteins associated with signal transduction or immune system pathways are especially adept at predicting human age
Our aging clock data (Figure 3) Figure 6a) and/or MAE (Figure 6b). Specifically, we show the 19 pathways with the highest Pearson correlations (Figure 6a) and the 19 pathways with the lowest MAEs (Figure 6b) in the LASSO test sets. The Reactome pathways with the five highest Pearson correlations were as follows: "signal transduction," "immune system," "metabolism of proteins," "innate immune system," and "extracellular matrix organization." Among the 19 Reactome pathways with the highest Pearson correlations (Figure 6a), the following five were all immune-related: "immune system," "innate immune system," "adaptive immune system," "cytokine signaling in immune system," and "neutrophil degranulation." The most predictive clock ("signal transduction") had a Pearson correlation of 0.94 in the learning set and 0.89 in the test set ( Figure 6a) as well as a MAE of 3.27 years in the learning set and 4.14 years in the test set ( Figure 6b). The "immune system" clock was a close second with a Pearson correlation of 0.93 in the learning set and 0.88 in the test set ( Figure 6a) as well as a MAE of 3.59 years in the learning set and 4.44 years in the test set ( Figure 6b). Plots of predicted age vs. chronological age for these two clocks are shown in Figure S10.
Out of all 1565 Reactome clocks tested (Table S11)  in the test set. For example, the "Degradation of the extracellular F I G U R E 6 The ability of 1565 protein sets associated with different Reactome pathways to predict age in a plasma proteomic dataset derived from 3301 human participants (age range of 18-76 years) was tested using machine learning. For each clock, the learning set utilized 2178 subjects and the test set utilized 1123 subjects. LASSO modeling was also performed for each clock to determine if a smaller set of proteins within the larger set could more accurately predict human age. We visualize the Pearson correlation (a) for the 19 pathways with the highest Pearson correlation. We also visualize the median absolute error (b) for the 19 pathways with the lowest median absolute error. The two numbers in parenthesis for each clock indicate the number of available SOMAmers used for the subset of proteins identified by LASSO modeling or the full list of proteins. The full name of the pathway abbreviated with ellipses is "Regulation of insulin-like growth factor (IGF) transport and uptake by insulin-like growth factor binding proteins (IGFBPs)" matrix" clock contained 54 SOMAmers and, in the test set, had a Pearson correlation of 0.76 and a MAE of 6.18 years. While less accurate, another interesting outlier was the "Negative regulation of TCF-dependent signaling by WNT ligand antagonists" clock, which contained 8 SOMAmers and had a Pearson correlation of 0.63 and a MAE of 8.07 years in the test set.

| DISCUSS ION
In the present study, we discover a novel, ultra-predictive clock comprised of 491 SOMAmers. Compared to a much larger array of existing aging clocks recently collated by Galkin et al. (2020a), this protein clock is especially predictive. This clock was capable of accurately predicting human age in three different plasma proteomic datasets and was used to demonstrate that physically inactive patients have a much higher predicted age than their chronological age. In contrast, patients that engage in frequent aerobic exercise exhibited a predicted age that was more similar to their chronological age. Since exercise is one of the most effective anti-aging interventions (Garatachea et al., 2015), these data suggest that this plasma protein age predictor can capture aspects of patient health. Moreover, we unveiled a multitude of novel aging clocks that are made up of a smaller set of proteins.
Since proteomics screening can be quite costly (Graham et al., 2005), the ability to predict human age using a minimal set of proteins obviates a financial barrier to performing aging clock measurements. It also makes the prediction of patient age logistically much simpler and therefore more conducive to widespread use. We additionally demonstrate that proteins tangibly associated with different aspects of aging (e.g., proteins that impact animal longevity, proteins that change their expression level with age, or proteins with a listing in the HAGR database) are able to robustly predict human age.
In total, we tested 13 custom clocks and 1565 different Reactome pathway clocks. While our data make it clear that the accuracy of a given clock is correlated with the number of protein entries used, there were several notable exceptions. For example, a clock comprised of proteins that significantly change their expression level with age (which used 561 SOMAmers) had a higher Pearson correlation and a lower MAE than a clock comprised of all measured proteins (which used 3283 SOMAmers). Thus, while the availability of more proteins tends to increase the predictive power of a given clock, the proteins chosen also influence the overall accuracy.
We additionally found nine proteins that both significantly change their expression level with age in human plasma and extend life span in normal vertebrates when manipulated. More broadly, we were able to identify a tangible connection to aging, disease, and health for all 523 protein entries that were comprehensively analyzed. It is important to note that, while some of these connections demonstrated a direct role in regulating the aging process (e.g., a genetic manipulation which impacts longevity and health span), others were more tangential and loosely associated with aging (e.g., protein expression levels were altered in patients with a specific age-related disease Our enrichment analysis revealed that a diverse set of processes relevant to inflammation and the immune system were strongly implicated by proteins that increase their expression level with age in human plasma. Furthermore, we found that proteins associated with immune system enrichment terms are especially adept at predicting human age. These findings corroborate an ever-growing body of data that intimately link aging with immune system dysfunction (Nikolich-Zugich, 2018). Atypically long-lived animals exhibit unique gene change relevant to inflammation  and genomic (Shen et al., 2020), transcriptomic (Peters et al., 2015), and proteomic (Tanaka et al., 2018) analyses in humans have all connected immunological changes with aging. Interestingly, our "innate immune system" Reactome clock was almost as predictive as our "immune system" clock, despite containing 438 fewer SOMAmers. This would suggest that the innate immune system is especially pertinent to human aging.
With these data in mind, it is quite intriguing that one of the most effective anti-aging drugs capable of extending life span and health span in mice is rapamycin (Bitto et al., 2016), which is clinically used as an immunosuppressant. Thus, clinical therapies that correct immune dysfunction may be particularly capable of improving human health span.
In summary, we propose and validate a plethora of novel aging clocks that are capable of predicting individual age in a large human cohort. Using the most predictive clock we identified, we show that sedentary subjects have a higher predicted age than their chronological age. We additionally discover that proteins which significantly change their expression level with age in human plasma are frequently direct regulators of age-related disease and/or life span in animal models. Thus, many of these proteins are worthy of further exploration as potential therapeutic targets for the extension of human health span.
We also show that diverse processes relevant to inflammation and the immune system are strongly implicated by aging-relevant proteins.
Future studies should build upon these data to help develop effective anti-aging therapies that can be safely utilized in the clinic.

| Statistical measurements for common aging plasma proteins
We previously identified 529 proteins that were reported to significantly change their expression level with age by two or more different studies . These common aging plasma proteins were analyzed in a plasma proteomic dataset derived from 4263 healthy individuals with an age range of 18-95 years (Lehallier et al., 2019). This 4263-person dataset reflects the combination of two different cohorts: 3301 individuals from the INTERVAL cohort and 962 individuals from the LonGenity cohort. All plasma proteomes were acquired using the SOMAscan assay. For each protein, the q-value and age coefficient were measured using an online software tool developed by Lehallier et al (Lehallier et al., 2019). Using this tool, a "Linear" regression line and an "All" subset were chosen to make graphs showing how the expression level of select proteins changes with age in human plasma. When multiple different SOMAmer measurements were available for a given protein entry, the first measurement listed was selected.

| Database and literature search for connections relevant to aging and health
For each of our common plasma aging proteins, we performed a comprehensive database and literature search to identify connections relevant to aging and health. This included searching for individual protein entries in the HAGR database (Tacutu et al., 2018). UniProt (UniProt, 2019) was utilized to identify default and alternative name recommendations and Alliance of Genome Resources (Alliance of Genome Resources, 2020) was used to find gene orthologs in different organisms. PubMed was employed to search for protein names in conjunction with the terms "lifespan" and "life span." Other search combinations included the protein name by itself or in combination with "aging," "disease," and/or "survival."

| Overrepresentation analyses
Overrepresentation analyses were performed similarly to before  using WebGestalt (Liao et al., 2019). UniProt IDs were provided as the inputs, the background was set to all protein-coding genes, and the FDR significance level was set to 0.05.

| Proteomic aging clock generation
The creation of proteomic aging clocks was performed similarly to before Lehallier et al., 2019 Sample selection, processing, and preparation were detailed previously (Sun et al., 2018).
To analyze the accuracy of the plasma proteome to predict chronological aging and the relative predictive power of specific signatures, we used glmnet

| Validation of the ultra-sensitive proteomic clock in independent cohorts and functional relevance
To validate the ultra-sensitive plasma proteomic clock in independent cohorts, we used an aging proteomic dataset covering a large life span range (Lehallier et al., 2019) and a dataset investigating the effect of exercise in young and old individuals (Santos-Parker et al., 2018). In the data generated by Lehallier et al. (Lehallier et al., 2019), the age ranged from 21 to 107 years with a median age of 70 years (first quartile = 58, third quartile = 89; 84 males and 87 females).
The samples originated from four different cohorts from the United States and Europe (VASeattle, PRIN06, PRIN09, and GEHA, N = 171).
RFUs for the 1305 proteins measured in these datasets were log10transformed and z-scored.
In the data generated by Santos-Parker et al. (2018), 31 young (aged 19-32 years, inactive n = 16, aerobic exercise-trained n = 15) and 16 healthy older (aged 55-77 years, inactive n = 8, aerobic exercise-trained n = 8) were measured. Of the 47 healthy subjects, 15 were female and 32 were male. The version of the SOMAscan platform used in this study measured 1129 proteins and RFUs were similarly log10-transformed and z-scored.
Only a subset of the 491 proteins constituting the ultra-sensitive proteomic clock was measured in these cohorts: n = 150 for the study by Lehallier et al. (2019) and n = 115 for the study by Santos-Parker et al. (2018). No re-fitting of the model was performed but we applied a correction coefficient that was estimated as follows: First, we predicted chronological age in the learning dataset of the INTERVAL cohort using the coefficients of the 491-SOMAmer proteomic clock but with only available proteins measured in the independent cohorts. Then, we fitted a linear model between predicted age and chronological age and estimated the correction coefficient to correct for slope offset of each subclock, separately. This correction coefficient was 2.62 for the study by Lehallier et al. (Lehallier et al., 2019) and 4.57 for the study by Santos-Parker et al. (2018).
To estimate whether aerobic exercise has an effect on aging, we calculated delta age, which corresponds to the difference between predicted age and chronological age, and tested statistical significance using the Wilcoxon signed-rank test. Finally, we compared the proteins constituting the ultra-predictive clock with protein predictors of 12 health traits such as smoking, percent body fat, and cardiopulmonary fitness according to a recent study from Williams et al. (2019). To do this, we mapped protein names to gene symbols and estimated the percentage of genes measured in our study that were involved in the aging clock and in the different, previously reported health outcome predictors.

ACK N OWLED G EM ENTS
AAJ would like to thank Dr. Leili Rohani (University of Calgary, Calgary, Alberta, Canada) for helpful correspondence. AAJ and MNS are additionally grateful to JL, ES, BS, YS, and JM. Although this work did not receive any financial support, TW-C would like to express gratitude for funding from the NOMIS Foundation and Nan Fung Life Sciences. In addition, MNS is grateful for support from NIH R01 GM102491-07, NCI P30 CA014195-46, and NIA 1RF1AG064049-01.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to declare.

AUTH O R CO NTR I B UTI O N S
BL performed the proteomic aging clock analyses and measurements, contributed to study design, and contributed to manuscript writing. MNS performed enrichment analyses and edited the manuscript. TW-C provided mentoring and essential resources for BL as well as reviewed the manuscript. AAJ conceived and designed the study, performed the database and literature review for all common aging plasma proteins, wrote the manuscript, and performed enrichment analyses.