• Open Access

Human aging is characterized by focused changes in gene expression and deregulation of alternative splicing


Dr Lorna W. Harries, Peninsula College of Medicine and Dentistry, Barrack Rd, Exeter EX25DW, UK. Tel.: 44 1392 406749; fax: 44 1392 406767; e-mail: Lorna.Harries@pms.ac.uk


Aging is a major risk factor for chronic disease in the human population, but there are little human data on gene expression alterations that accompany the process. We examined human peripheral blood leukocyte in-vivo RNA in a large-scale transcriptomic microarray study (subjects aged 30–104 years). We tested associations between probe expression intensity and advancing age (adjusting for confounding factors), initially in a discovery set (n = 458), following-up findings in a replication set (n = 240). We confirmed expression of key results by real-time PCR. Of 16 571 expressed probes, only 295 (2%) were robustly associated with age. Just six probes were required for a highly efficient model for distinguishing between young and old (area under the curve in replication set; 95%). The focused nature of age-related gene expression may therefore provide potential biomarkers of aging. Similarly, only 7 of 1065 biological or metabolic pathways were age-associated, in gene set enrichment analysis, notably including the processing of messenger RNAs (mRNAs); [P < 0.002, false discovery rate (FDR) q < 0.05]. This is supported by our observation of age-associated disruption to the balance of alternatively expressed isoforms for selected genes, suggesting that modification of mRNA processing may be a feature of human aging.


Advancing age is a major risk factor for many common diseases, including type 2 diabetes, cardiovascular disease, and many cancers (Butler et al., 2008). However, aging is characterized by progressively rising heterogeneity, with some people becoming frail in their 70s while others remain fit into their 90s or even longer. Characterizing the changes underpinning the heterogeneity of aging processes at a molecular level has been a long-held goal.

One theory of aging is that random and widespread unrepaired damage to DNA (and other molecules) accumulated over a lifetime may cause cellular senescence (Gensler et al., 1981), but it has not been established whether such damage is associated with large-scale alterations of gene expression in the aged human population. Alteration to highly sequence-dependent processes such as mRNA processing (Cartegni et al., 2002) has been suggested in previous studies (Yannarell et al., 1977; Meshorer & Soreq, 2002), but to date, there are little human data to assess this empirically. Several age-related diseases are known to be caused by alterations in the splicing patterns of the mRNA transcripts, including the Hutchison Gilford progeria syndrome, where premature aging is caused by a synonymous mutation (G608G) in the Lamin A (LMNA) gene, which obstructs the normal post-translational processing of the protein product and leads to the premature aging phenotype (Eriksson et al., 2003). Similarly, alterations in the relative balance of alternatively expressed microtubule-associated protein tau (MTAP) isoforms are a feature of Alzheimer’s disease-related tauopathies. (Chen et al., 2010).

Gene expression arrays provide a powerful technology for identifying age-related alteration to the levels of gene transcripts in a comprehensive genome-wide way. Identification of individual transcripts and functionally coherent gene sets that are under- or over-expressed with aging in humans would provide key insights into the mechanisms of aging processes and age-related disease (Zahn et al., 2006). This may provide a ‘biomarker signature’ for monitoring the effects of interventions to slow age-related changes, in an easily accessible tissue, peripheral blood leukocytes.

A variety of age-related expression analyses in cell lines or stored cell material have been reported, although results have had limited reproducibility (de Magalhaes et al., 2009). This is likely to be due to the small sample sizes in previous studies and to the sensitivity of mRNA transcripts to variation in aspects of storage and handling (Min et al., 2010). It is clear that identifying robust changes in age-related gene expression in humans will depend on large numbers of samples collected with optimal sample handling, so that results reflect in-vivo mRNA expression. Blood-derived leukocytes are a relevant tissue for the study of in-vivo aging processes in humans, as ‘immunosenescence’ is well described (Gruver et al., 2007). Blood is likely to remain the principal accessible ‘live’ tissue for large-scale in-vivo expression studies and clinical analysis in humans. Blood-derived white cell transcriptome studies have already proved valuable in identifying signatures of major diseases and drug responses some with promising clinical applications (Dumeaux et al., 2010).

We used a well-characterized population representative cohort, the InCHIANTI aging study (Ferrucci et al., 2000), to examine transcriptome-wide alterations in gene expression associated with chronological age in samples from 698 individuals by microarray analysis. We predicted widespread transcriptomic alterations with advancing age, and that inflammatory or immune function genes would be prominent in our results given our choice of target tissue (peripheral white blood cells). We aimed to identify both the most deregulated individual transcripts, but also the most deregulated gene sets grouped into biological pathways.

We found that although the largest single-transcript associations in human peripheral blood leukocytes do indeed include genes involved in inflammation or immune function, widespread alterations in gene expression levels were not apparent. In fact, only a very small proportion (295 of 16 571 transcripts; 2%) of transcripts demonstrated robust age-related differences in gene expression. Furthermore, a statistical model using just 6 of the top 25 transcripts was able to classify samples into ‘young’ or ‘old’ with high precision. These six transcripts may provide a biomarker set to monitor interventions aimed at slowing aging. Gene set enrichment analysis (GSEA) (a method to determine whether specific molecular or functional pathways are associated with a given trait) demonstrated that the pathways most disrupted by human aging include genes involved in messenger RNA splicing, polyadenylation, and other post-transcriptional events. This was accompanied by specific changes in the ratios of expression between isoform-specific transcripts for selected genes in our extended analysis. Deregulation of mRNA processing pathways may comprise a mechanism involved in human cellular aging.


A small proportion of transcripts demonstrate marked age-related expression differences

A total of 16 571 transcripts gave reliable signals above background (P ≤ 0.01) in > 5% of the sample population following QC of microarray data and were selected for analysis of gene expression alterations with age. The cohort was divided into discovery and test sets, based on analytical batch. Individual probes were only considered to be associated with chronological age (with adjustment for major confounders) if they reached a false discovery rate (FDR) q < 0.001 in the discovery set and also replicated in the test set with an FDR q < 0.05. Of 16 571 transcripts, we identified gene expression differences with age in only 2%; 360 probes in the discovery set, of which 295 replicated in our test set (see Table S1). Fifty (18%) of differentially expressed transcripts were up-regulated and 245 were down-regulated. Analysis of the extremes of the age distribution only (< 50 years (n = 93) vs. > 80 years (n = 232) yielded only 53 probes associated with age, supporting our observation that human aging is associated with large-scale alterations in the expression of only a small number of transcripts.

The chemokine receptor 6 (CCR6), chemokine receptor 7 (CCR7), and CD27 genes, previously associated with age and immunosenescence (Yung et al., 2007), were also found to be associated with age in our study. Plots of microarray signal intensity against increasing age (Cloud plots) for these, and the top 3 up- and down-regulated genes in our analysis, namely leucine-rich repeat neuronal protein 3 (LRRN3), endosialin (CD248), lymphoid enhancer-binding factor 1 (LEF1), vesicle-associated membrane protein 5 (VAMP5), guanylate binding protein 5 (GBP5), and signal transducer and activator of transcription 1 (STAT1), are presented in Fig. 1. The 25 most up-regulated and the 25 most down-regulated transcripts are shown in Table 1. The most statistically significant association between age and gene expression was for the LRRN3 gene (P = 8.2 × 10−28), thought to be involved in MAPK activity and endocytosis.

Figure 1.

 The expression of key genes with age. The figure shows the association between probe signal intensity (arbitrary units; Y-axis) and age (years; X-axis) for three genes known to be associated with age; CD27, chemokine receptor 6, and CCR7 (a–c), the top three down-regulated genes in our study; leucine-rich repeat neuronal protein 3, CD248, and lymphoid enhancer-binding factor 1 (d–f) and the top three up-regulated genes in our study; vesicle-associated membrane protein 5, GBP1, and signal transducer and activator of transcription 1 (g–i).

Table 1.   The most strongly (a) down-regulated (b) up-regulated genes with advancing age, by discovery and replication set
Number (by discovery P-value)ProbeProbe IDDiscovery setReplication set
Age coefficient P-valueFDR q-valueAge coefficient P-valueFDR q-value
  1. The gene and probe identities are given in each case. The age coefficient, the significance P-value, and the false discovery rate (FDR) q-value are also given.

  2. ABLIM1, actin-binding LIM protein 1; CCR6, chemokine receptor 6; LEF1, lymphoid enhancer-binding factor 1; LRRN3, leucine-rich repeat neuronal protein 3; STAT1, signal transducer and activator of transcription 1; VAMP5, vesicle-associated membrane protein 5.


It is possible that some of the effects we note may result from increases in age-related variation in inter-individual expression levels. However, the Breusch Pagan test of conditional heteroscedasticity did not provide evidence to support this hypothesis. The lowest P-value for the effect of age on residual expression variability failed to reach genome-wide significance before or after adjustment for age effects on mean expression (minimum P = 3.2 × 10−5, FDR q = 0.18). Therefore, these results provide no evidence for increased variability in inter-individual gene expression with age.

Six of the most deregulated 25 probes provide good discrimination between young and old

We then identified which of the 25 most strongly age-associated probes provide the best discrimination between a younger group (age < 65; n = 107 in discovery set) and older subjects (age ≥ 75; n = 308 in discovery set). We used multivariable logistic regression modeling to select probes that add significantly to classification (see methods, statistical analysis). Our final model included six genes; LRRN3, CD248, CCR6, GRAP (growth factor receptor-bound protein 2 related adaptor protein), VAMP5, and CD27, which together explained 63% of the age-group associated variation (Table S2). The multivariable logistic regression model was used to generate an age-group classification for subjects in the independent replication set. The ROC (Receiver Operating Characteristic) curve (Fig. S1) shows that this small subset of genes is sufficient to achieve exceptionally strong separation between older and younger subjects (ROC area under the curve, (AUC) = 96% in discovery set and 95% in validation set). The AUC for alternative age-group comparisons within our replication set is 91% for < 60 vs. age ≥ 60 and 84% for the age 60–69 vs. age 70–79.

Key expression changes replicate when assayed by an alternative method; TaqMan low-density array (TLDA) analysis

Taqman low-density array quantitative real-time PCR provides a robust independent laboratory technique to validate array results and provides more accurate estimates of effect sizes. We validated 27 of the most differentially expressed genes identified by microarray analysis using TLDA quantitative real-time PCR. Of the 27 genes tested, 22 showed clear differences in gene expression with age (see Table S3). Representative box plots (comparing data for 49 respondents aged 30–44 years and 50 respondents aged 85–104) for one gene previously associated with aging (CCR7) (Yung et al., 2007) and two of the top associations found in our study, LRRN3 and LEF1, are given in Fig. 2. Of those that did not replicate, two had no detectable expression (AKTIP and IGLL1) and three (RPS5, E2F5 and VEGFB) showed evidence of a trend for altered expression, but were not statistically significant. These transcripts were at the lower limit of statistical significance following Bonferroni correction for multiple testing (P = 1 × 10−5) on the microarray data, indicating that some of the genes located at the limits of statistical significance may not represent genuine hits.

Figure 2.

 Box plots showing gene expression changes obtained by TaqMan low-density array (TLDA). This figure demonstrates the difference in expression levels between young (30–44 years; n = 49) and old (85–104 years; n = 50 years) individuals as assessed by TLDA analysis. Gene expression levels expressed relative to the endogenous controls are given on the Y-axis, and the patient group (young or old) are given on the X-axis.

For our top gene (LRRN3), the mean expression intensity in the young sample (aged 30–44 years) was 1.65 (95%CI for the mean 1.36–1.94), compared with 0.53 (CI 0.44–0.62) in the older sample (85–104 years), a highly significant difference (P < 0.0001) (Table S3). This LRRN3 change represented a very large reduction, i.e. the expression in the older sample is 32% of that in the young. In our TLDA-validated genes, the expression in the older group ranged from 32% to 175% of that in the young.

Gene set enrichment analysis (GSEA) reveals that the major pathways affected by age in humans relate to post-transcriptional processing of messenger RNA transcripts

We applied GSEA (Subramanian et al., 2005) to identify pathways (rather than individual transcripts) that are associated with age. We examined 1065 predefined gene sets, grouped by (mainly Gene Ontology) classification of molecular function or biochemical processes. Of the 1065 pathways examined, only seven pathways were significantly associated with age, after accounting for the multiple statistical testing (FDR q-value < 0.05) (Table 2). These were total RNA binding (nominal P-value < 0.001), mRNA metabolic process (P < 0.001), mRNA binding (P < 0.001), RNA splicing (P < 0.001), mRNA processing (P < 0.001), ribonucleoprotein complex biogenesis and assembly (P < 0.001), and chromatin assembly or disassembly (P = 0.002). Figure 3 shows the enrichment plot for the RNA splicing pathway. Leading edge analysis (which identifies specific genes driving observed pathway associations) revealed that there was significant overlap between genes in these pathways, and that four pathways relate essentially to the same process; post-transcriptional processing of messenger RNA transcripts (Fig. S2). The remaining pathways involve genes responsible for opening or closing the chromatin structure to allow (or disallow) transcription, and genes involved in the production and action of ribosomes, permitting translation of the processed messenger RNA transcript. GSEA also revealed two biological function gene sets possibly up-regulated in older people (extracellular region genes P = 0.004 and ion transport genes P = 0.008), although neither were significant when accounting for multiple testing (FDR q-value = 0.197 and q-value = 0.207, respectively).

Table 2.   Gene set enrichment analysis plots for potential age-associated gene sets, for molecular function or biochemical process pathways
Significance rank with agePathwaySource of pathway definitionSize (number of genes in pathway)NESNominal P-valueFDR q-value
  1. Size refers to the number of genes in the pathway. The false discovery rate (FDR) is given by the q-value and the statistical significance by the P-value.

  2. GO, gene ontology subset; KEGG, Kyoto encyclopedia of genes and genomes; NES, normalized enrichment score for each gene.

 1RNA bindingGO200−2.00092< 0.0010.023
 2mRNA metabolic processesGO73−1.92912< 0.0010.027
 3mRNA bindingGO19−1.93444< 0.0010.029
 4RNA splicingGO75−1.95339< 0.0010.029
 5mRNA processingGO63−1.8974< 0.0010.03
 6Ribonucleoprotein complex biogenesis/assemblyGO68−1.89765< 0.0010.036
 7Chromatin assembly or disassemblyGO20−1.956690.0020.041
 8RNA processingGO136−1.98935< 0.0010.057
 9Protein RNA complex assemblyGO56−1.822040.0040.057
10DNA replicationGO73−1.824870.0020.063
11DNA-dependent DNA replicationGO37−1.775820.0040.085
12Purine metabolismKEGG101−1.95911< 0.0010.089
13Organelle lumenGO346−1.666220.0020.091
14Negative regulation of programmed cell deathGO110−1.71244< 0.0010.099
Figure 3.

 Gene set enrichment analysis plot for the RNA splicing pathway. The Y-axis gives the enrichment score for an association with age in the top panel and the ranked list matrix in the bottom panel. The X-axis refers to the rank of each gene in the ordered dataset. Each vertical line in the central portion of the figure refers to one gene within the pathway. The position of each line relative to the central dashed line indicates whether the gene is positively or negatively correlated with age. Positive correlations with age (age-pos) locate to the left of the figure, and negative correlations with age (age-neg) locate to the right side of the figure. The dashed line represents the null point, where a gene demonstrating no positive or negative correlation would appear. RLM, Ranked List Metric; ES, Enrichment score.

Evidence of age-related changes in isoform ratios with advancing age

Given the prevalence of genes involved in mRNA processing in our GSEA results, we sought to determine whether the pattern of alternatively expressed isoforms might therefore be disrupted with age. Of the ten genes fulfilling our criteria for study, we found evidence of disruption to splicing patterns in seven of these [actin-binding LIM protein 1 (ABLIM1), STAT1, CD79a molecule, immunoglobulin-associated alpha (CD79A), heat shock 60kDa protein 1 (HSPD1), clusterin-associated protein 1 (CLUAP1), lysosomal-associated membrane protein 2 (LAMP2), and SON DNA-binding protein (SON)], and the eighth, caspase 8 (CASP8) was near significant at P = 0.097 (Fig. S3; Table 3). The ABLIM1 gene codes for four alternatively processed reference sequence isoforms, NM_002313.5, NM_001003407.1, NM_001003408.1, and NM_006720.3. Comparison of the amount of NM_006720.3 relative to the other three isoforms reveals a progressive increase with age (correlation coefficient 0.0009; P ≤ 0.0001; supplementary Fig. S3). The STAT1, CD79A, HSPD1, CLUAP1, LAMP2, and SON genes exhibit similar disturbances to isoform ratios with advancing age (Table 3).

Table 3.   Analysis of the effect of age on the expression of alternatively spliced isoforms
GeneIDIsoformMeanSEComparisonCoeffCICI P-value
  1. The ratio of alternatively expressed isoforms of specific genes in relation to increasing age is given in the table above. Association with age is calculated by logistic regression. The analysis was carried out on 689 individuals.

  2. ABLIM1, actin-binding LIM protein 1; CAPS8, caspase 8; CI, confidence intervals; coeff, correlation coefficient; HSPD1, heat shock 60 kDa protein 1; ID, probe identity; isoform, transcript identity; LAMP2, lysosomal-associated membrane protein 2; Mean, mean ratio; SE, standard error, comparison indicates which comparisons were made for the analysis; STAT1, signal transducer and activator of transcription 1.

 2ilmn_2396672All115.6110.491/20.00090.00060.0012< 0.0001
 3ilmn_1785424All209.5453.731/30.00320.00270.0037< 0.0001
 11ilmn_2242491NM_024793.110361.481983.517/80.00000.00000.0000< 0.0001
 19ilmn_2247664NM_138927.1200.9645.4418/190.00170.00110.0023< 0.0001

In a separate analysis, we sought to provide extra evidence for disruption of splicing with increasing age using a more accurate quantitative technique, real-time PCR. We selected alternatively spliced genes from the top 250 genes most associated with age, for which commercial probes were available. Of eight genes tested in the fifty youngest and fifty oldest individuals, we found evidence for disruption to the ratio of alternatively expressed isoforms in 3 (EFNA1, GPR18, and VCAN; P = 0.05, 0.05, and 0.02–0.04, respectively, depending on the comparison), with the 4th (BCL11B) being very near significance at P = 0.053 (Fig. S4, Table 4).

Table 4.   Genes showing disruption to the balance of alternatively spliced isoforms by real-time PCR
GeneIDIsoformMeanSEComparisonCoeffCICI P-value
  1. The ratio of alternatively expressed isoforms of specific genes as determined by real-time PCR is correlated with age in the table above. Association with age is calculated by logistic regression. The analysis was carried out on 100 individuals.

  2. CI, confidence intervals; coeff, correlation coefficient; ID, probe identity; isoform, transcript identity; Mean, mean ratio; SE, standard error, comparison indicates which comparisons were made for the analysis.

 2hs01112109_m1NM_022898.11.0621520.369031 over 20.00125−0.00001750.00251820.053
 4hs01020895_m1NM_182685.11.3156931.1447353 over 4−0.00394−0.0078545−0.00002530.049
 6hs01649814_m1NM_005292.31.0813360.4463355 over 60.001893.60E-060.00377570.05
 8hs01007941_m1NM_001126336.21.0862350.4579617 over 80.0026470.00013220.00516190.039
 9hs01007943_m1NM_001164098.11.231250.8458639 over 7−0.01048−0.0193924−0.00156910.022
 10hs01007944_m1NM_004385.41.2100060.921499 over 10−0.00778−0.0154129−0.00014440.046
 11hs00244504_m1NM_004034.21.0663050.38838212 over 11    
 12hs00559416_m1NM_001156.31.9291578.42015311 over 12−0.00219−0.00574660.00135780.223
 13hs00540527_s1NM_001716.31.1167630.54078614 over 13    
 14hs00540548_s1NM_032966.11.1001770.49753513 over 14−0.0011−0.00262850.00042560.156
 15hs01006741_m1NM_002184.31.0683420.39724516 over 15    
 16hs01006742_m1NM_175767.21.8980885.47800715 over 160.004691−0.00457130.01395240.317
 18hs00985111_m1NM_014676.21.0913180.99502317 over 180.001596−0.00037540.00356820.111


In this study, we present the first assessment of age-related alterations in gene expression in a large population-based cohort. Contrary to our expectations, we found relatively few age-associated transcripts, but these did include several genes associated with inflammation or immune senescence, as expected. Gene set enrichment analysis indicated that only a very small number of molecular or biological function pathways were robustly associated with advancing age, comprising mainly genes involved in the processing and maturation of messenger RNA transcripts. This finding was supported by subsequent observation of age-associated differences in the balance of alternatively expressed isoforms, as suggested by previous studies (Yannarell et al., 1977; Meshorer et al., 2002). Our results suggest that modification of messenger RNA (mRNA) processing may comprise an important feature of human aging.

Very few genes (295 of 16 571; 2% of transcripts identified) demonstrated strong and robust associations with advancing age in our study. Power calculations based on the method of Hsieh et al. (1998) indicate that the relatively large sample available (n = 458 in discovery set) is sufficient to detect effect sizes for transcripts statistically ‘explaining’ as little as 5.3% of the variation in age. Thus, if we have not detected other genes because of ‘noisy’ data, individual effect sizes for such genes are likely to be of limited biological significance.

The transcripts with the largest effect sizes comprise a set of peripherally expressed biomarkers of the aging process, some substantially stronger than those currently known. Transcripts such as CCR6, CCR7, and CD27, previously reported to be involved in immune senescence (Yung et al., 2007), were also associated with age in our study (Fig. 1). As predicted, genes showing the largest differential in expression with age were involved in inflammatory responses, or immune function (de Magalhaes et al., 2009). Genes related to the human immune system, e.g. LRRN3, CD248, LEF1, CCR6, CCR7, CD27, and LTB (lymphotoxin beta – TNF superfamily, member 3), expressed lower levels of mRNA transcripts with increasing age in our cohort, which is in agreement with other studies in human peripheral lymphocytes (Hong et al., 2008). This is not unexpected given that our study tissue was white blood cells. Reduced expression of these genes with age implies that the mechanisms involved are likely to mark the process of ‘immunosenescence,’ characterized by reduced chemotactic migration of the immune mediator cells, lowered activation and differentiation of lymphocytes and macrophages with reduced synthesis of immunoglobulins and increased apoptosis of immune cells. Our finding of a strong association of lower CCR7 gene expression with advancing age validates the findings from Yung et al. (2007).

Our findings should also be considered in the context of age-related changes in lymphocyte composition. Some age-associated transcripts may be expressed in different white cell subtypes; the CD28, CCR7, and GZMH genes are differentially expressed in CD8+ CD28+ T-cells compared with CD8+ CD28 T-cells, the ratio of which is known to alter with age (Lazuardi et al., 2009). Some of the alterations in gene expression may therefore derive from differences in the composition of the lymphocyte population in older people. What remains to be seen is whether any of alterations in lymphocyte composition are themselves attributable to the altered gene expression patterns we have demonstrated.

Several other factors have been implicated in aging or longevity, including alterations to nutrient sensing pathways, such as insulin or TOR signaling, oxidative stress/DNA repair, inhibition of respiration, reproductive system signaling, and telomere-related mechanisms (Kenyon, 2010). Accordingly, genes representing some of these processes are evident among the top 100 associations we have identified. These include the PASK, FOXO1, DKC1, and MYC transcripts, involved in energy sensing, insulin signaling, telomere maintenance, and ribosomal biogenesis, respectively (Gu et al., 2009; Schlafli et al., 2009; Dai et al., 2010; Smith et al., 2010).

The association of individual gene transcripts with age does not account for correlations between markers. We developed a bi-class discriminant model that identified a set of six transcripts within the top 25 strongest associations that form a potential predictive signature of chronological aging (Table S2; Fig. S1). This model was exceptionally good at classifying young and old [the area under ROC curve was 96% in our discovery set, and this was only slightly diminished (95%) in the independent replication set]. The six genes in the predictive group comprise genes involved in immunity or inflammation (LRRN3, CD27, CCR6) but also include some that are involved in the maintenance or development of muscle tissue (VAMP5; myobrevin) or in vascularization (CD248, endosialin). Future versions of this discriminant model might attempt to disentangle ‘aging’ from age-related disease effects: however, doing so is conceptually and practically difficult, but the relative robustness of the model for different age-group comparisons suggests that the transcript set is not merely distinguishing disease-free, from diseased, individuals.

Gene set enrichment analysis revealed a striking restriction in the number of pathways that were associated with age in our dataset; only seven of 1065 molecular or biological function pathways are robustly associated with age. Four of these pathways identified are involved in mRNA processing (namely mRNA binding, mRNA processing, mRNA processing reactome, and RNA splicing). Three of the age-associated pathways we identified (RNA binding, RNA splicing, and ribosome biogenesis and assembly) were also noted in previous GSEA analyses in mice (Southworth et al., 2009). Differences between our results and the studies may arise from differences between species in the aging process, or from differences in the tissue specificity of changes.

RNA processing is the mechanism by which the initial RNA products transcribed from genes are prepared for eventual translation. This includes removal of introns and addition of the poly-A tail and 5′ Cap structures (Keene, 2010). These processes occur simultaneously and ensure diversity of the mRNA transcriptome and determine stability and half-life of the mRNA transcripts (Fong & Bentley, 2001). In our GSEA analysis, transcripts responsible for all aspects of mRNA processing were present in the core-enriched fraction of the pathway (Table S4 online), with a surprising amount of overlap between the groups (Fig. S2). The remaining pathways relate to the accessibility of the chromatin to transcription factors (chromatin assembly and disassembly) and to the processes that surround the translation of the mRNA transcripts (RNA-binding and ribonucleoprotein complex biogenesis and assembly), which is in keeping with previous observations of a reduction in rates of protein synthesis with age in humans and animal species (Ballard & Read, 1985; Kennedy & Kaeberlein, 2009).

Our observation that disruption to the proteins involved in mRNA processing appears to occur without widespread alterations in gene expression levels may indicate that while aged leukocytes in-vivo may be expressing most genes at comparable levels to those found in younger cells, there may be differences in the relative balance of splice products produced or increases in the occurrence of aberrantly spliced transcripts. In seven of ten alternatively spliced genes we studied, we found disruptions to the patterns of isoform expression with increasing age, and a near-significant result in one further case (Table 3). We also noted variation in splicing patterns with age by real-time PCR, finding alteration to the balance of alternatively expressed isoforms in 3 of 8 (38%) alternatively spliced genes, with the 4th very near significance. Our observation of very modest effects even in a very limited cohort of 100 people (7–37% alteration in the ratio of isoforms) suggests that the true level of splicing disruption may be higher than we report and warrants a more detailed, targeted study.

Whether or not the processing of a particular transcript is disrupted will depend on many factors, not least how many different sequence elements are necessary to ensure the usage of a particular splice junction. Presumably, highly regulated transcripts with weaker splice sites that are very dependent on additional sequence factors (such as exon splicing enhancers (Cartegni et al., 2002)) may be more susceptible to age-accumulated DNA and RNA damage and thus more likely to show disruption of splicing and other related processes with age. Interestingly, two of the most deregulated splicing proteins in our microarray data, SRFR6 (SRp55) and SRFS1 (ASF/SF2) (P = 1.3 × 10−6 and 0.0001, respectively), were members of the SRFS (Splicing Factor, Arginine/Serine-Rich) family of splicing factors, which are key players in maintaining the plasticitiy of the transcriptome by regulation of alternative splicing (Valcarcel & Green, 1996). This is particularly interesting because these proteins are key ligands for the exon splicing enhancer (ESE) motifs that regulate splice site choice in development and differentiation (Cartegni et al., 2002). Their deregulated expression during the aging process may therefore manifest as a reduction in the adaptive capacity of the transcriptome.

Disruption to the mRNA processing machinery may also lead to an increase in the occurrence of unusual or aberrant splicing products, which of course would not be present on the microarray chip. This would not be an unexpected finding in an aging organism, given that they may arise from mutations in the DNA sequence elements that control splice site usage, or from alterations in the RNA transcript itself, which is very susceptible to damage by oxidative and other insult (Kong & Lin, 2010). Some of the proteosomal components that intercept and neutralize aberrant proteins produced from such transcripts were up-regulated in our data (PMSB9; FDR q-value = 0.008, P = 6 × 10−5, PMSB10; q = 0.005, P = 0.0008).

There is a growing body of evidence that results from peripheral blood white cells are also relevant to less accessible tissues (Tang et al., 2003; Twine et al., 2003; Achiron et al., 2004). In some reports, the majority of changes appeared to be in genes with little tissue-specific regulation, suggesting that most of the transcriptomic alterations with age might be generalized (Rodwell et al., 2004; de Magalhaes et al., 2009). This is supported by other evidence, where age-related expression differences were compared in brain, kidney, and muscle, in a population of 81 human subjects, and found to be conserved in each of these tissues (Zahn et al., 2007). The age-related transcriptomic signature is also relatively stable; of the 50 top associations reported by Hong et al., (2008), 32 of the 42 represented in our data also showed age-related differences, at least at P ≤ 0.05 (data available from authors), despite the differences expected comparing our in-vivo leukocyte mRNA to mRNA from stored isolated lymphocytes. Lymphocytes are also the most appropriate tissue for the study of immune senescence, which is key to many chronic disease processes, including inflammation, and autoimmune alterations with age (Desai et al., 2010).

Possible limitations of our study analysis include the deliberate absence of accounting for specific disease. This is because an increased susceptibility to disease is intrinsic to any definition of aging, and thus accounting for disease would risk controlling out the associations of interest (and also poses major practical difficulties with undiagnosed disease being common at older ages). Similarly, our study does not account for variation in the transcriptome of different blood cell subtypes. However, as noted above, our results are consistent with those from isolated lymphocytes (Hong et al., 2008), and our approach avoids disruption to in-vivo expression patterns.

In conclusion, we present the first genome-wide assessment of age-related in-vivo leukocyte gene expression profiles in a large population-based study. As aging is associated with random damage to DNA (Gensler & Bernstein, 1981), we tested the hypothesis that this would result in widespread deregulation of gene expression. Instead, we found that human aging is associated with a small number of focused changes, mainly in individual genes associated with immune cell function. The major pathways associated with older age in humans were mainly involved with the processing of the primary RNA transcripts into mature mRNAs, an observation supported by our finding of age-related changes in the relative expression of alternatively expressed isoforms of example loci. We suggest that disruption to messenger RNA processing may comprise an important feature of aging in the human population.

Experimental procedures

Ethics statement

Ethical permission was granted by the Instituto Nazionale Riposo e Cura Anziani institutional review board in Italy. Participants gave informed consent to participate.

Cohort details

The InCHIANTI study (Ferrucci et al., 2000) is a population-based, prospective epidemiological study of factors affecting aging, in the Chianti area (Tuscany) of Italy. The participants were originally enrolled in 1998–2000 and were interviewed and examined every 3 years. The recent 9-year follow-up examination involved 733 participants. Characteristics of the study cohort are given in Table 5.

Table 5.   Characteristics of the discovery and replication samples. Clinical characteristics and demographics of the study cohort are given
 Discovery sampleReplication samplePooled sample
n % n % n %
Age (quartiles)
 30–68 years12126.425924.5818025.79
 69–78 years14130.79723021330.52
 79–82 years9119.874317.9213419.2
 83–104 years10522.936627.517124.5
 Bagno a Ripoli27058.958635.8335651
 High school6213.542510.428712.46
Pack years smoked (lifetime)
 < 209721.185623.3315321.92
Waist circumference (cm)
 Mean (SD)45895.6 (12.4)24094.8 (11.5)69895.3 (12.1)

RNA collection and extraction

Peripheral blood specimens preserving in-vivo RNA expression were collected at the 9-year follow-up (2008/9), using the PAXgene technology to preserve levels of mRNA transcripts as they were at the point of collection (Debey-Pascher et al., 2009). RNA was extracted from peripheral blood samples using the PAXgene Blood mRNA kit (Qiagen, Crawley, UK) according to the manufacturer’s instructions.

Whole transcriptome scan

Whole genome expression profiling of the samples was conducted using the Illumina Human HT-12 microarray (Illumina, San Diego, California, USA) as previously described (Zeller et al., 2010). Preprocessing of microarray data is described in the Data S1.

Statistical analysis

Our dataset was subdivided on the basis of hybridization batch: a discovery dataset containing approximately two-thirds of the full sample (n = 458 individuals), and an independent replication set (n = 240 individuals). This is a form of test-set cross-validation, a common and well-established approach when assessing performance of classification models (Lubomirski et al., 2007).

We calculated median centered gene expression levels on the log2 scale to ensure maximum overlap of profiles without altering their variance, in line with methodology used in previous studies (Idaghdour et al., 2010). The relationship between gene expression and chronological age was tested using a linear regression model with (log-transformed) gene expression level as the dependent variable, chronological age (recorded at RNA extraction) as an explanatory variable, and with adjustment for potential confounders. Separate regression models were fitted for each of the full set of 16 571 probes which passed QC in the discovery dataset. We used the FDR to account for multiple testing, applying an FDR cut off of q ≤ 0.001 to select probes expressing differentially with age. Transcripts selected at the screening stage in the discovery set were tested for association with age in the replication dataset using the same model specification. The development of a bi-class discriminant model to identify the transcripts explaining the majority of the association with age is described in the Data S1.

All our analyses were adjusted for the following potential confounding factors on gene expression: gender; lifetime pack years smoked (in five categories: none, < 20 years, 20–39 years, 40 plus years, and missing); waist circumference (as a continuous trait); highest level of education attained (in five categories: none, elementary, secondary, high school, and university/professional); and study site [individuals were drawn from a rural village (Greve) and an urban population (Bagno a Ripoli)]. We also controlled for potential hybridization and/or amplification batch effects in all our analyses (Table S5).

TaqMan low-density array (TLDA) validation of microarray results

A subset of the cohort comprising the oldest men and women (85–104 years, n = 50) was compared with one comprising the youngest men and women (30–44 years, n = 49) in this analysis. Total RNA (100 ng) was reversed transcribed in 20-μL reactions using the Superscript III VILO kit (Invitrogen, Paisley, UK), according to the manufacturer’s instructions. RNA samples were then used for TLDA analysis. A list of target genes is given in Table S6. Each 32-gene set included four endogenous control genes which had been empirically validated as being unaffected by age on the basis of the microarray results; 18S, GUSB, PPIA, and IDH3B. Reaction conditions are described in the Data S1.

Pathway analysis (GSEA)

Gene set enrichment analysis was performed to assess pathways or predefined gene sets associated with chronological age according to the method of Subramanian et al. (2005). ‘Enrichment statistic’ and ‘Metric for ranking gene’ parameters were configured to ‘Weighted’ and ‘Pearson,’ respectively. One thousand random permutations of the phenotype label were used to calculate the empirical P-values of each pathway. The gene sets with a nominal of P-value < 0.01 and FDR of ≤ 25% were considered as potential associated gene sets as previously described (Subramanian et al., 2005). Gene symbols and the Illumina annotation file were used to collapse 16 571 probes to 12 357 genes by taking the median intensity of probes representing each gene.

Molecular or biological function pathways and Gene Ontology (GO) gene sets were selected from the molecular signature database (MSigDB) (http://www.broadinstitute.org/gsea/msigdb/index.jsp). After filtering gene sets to those with a minimum of 15 and a maximum of 500 gene set size, 294, 439, 209, and 123 gene sets from the Canonical pathways, Biological Process Ontology gene sets, Molecular Function Ontology gene sets, and Cellular Components Ontology gene sets were used in the analysis, respectively.

Examination of age-related changes in microarray isoform ratios with advancing age

We examined the relative balance of alternatively expressed isoforms of selected genes for evidence of disruption to splicing patterns. We first examined the top 50 genes robustly associated with age (Table S1) identified as candidates for study, but only four of them, ABLIM1, CD79A, STAT1 and HSPD1, showed evidence both of alternative splicing, and expression identified by suitable probes within the expression data available to us. We then selected further genes for study based on the following criteria: the presence of > 5 alternative probes for the same gene, and the presence of > 2 reference isoforms for the gene. The probe sequences were then mapped back onto the specific transcript sequences to determine which probe signals to use for analysis. By these methods, we identified ten genes for study: ABLIM1, CD79A, STAT1, HSPD1, CASP8, CLUAP1, LAMP2, MAX, SON, and WTAP. We then carried out a logistic regression of the ratio of signal deriving from isoform-specific probes according to age. Data were adjusted for gender.

TaqMan low-density array (TLDA) assessment of disruption to mRNA splicing

A subset of the cohort comprising the oldest men and women (85–104 years, n = 50) was compared with one comprising the youngest men and women (30–44 years, n = 50) in this analysis. Total RNA (100 ng) was reversed transcribed in 20-μL reactions using the Superscript III VILO kit (Invitrogen), according to the manufacturer’s instructions. RNA samples were then used for TLDA analysis for transcripts of the ANXA7, BCL11B, CXCR5, EFNA1, GPR18, IL6ST, PUM1, and VCAN genes, which were selected on the basis that they were alternatively expressed isoforms of genes in the top 250 associations with age which produced a minimum of two isoforms. Endogenous control genes were IDH3B and GUSB, which were identified as the most stable controls by the GeNORM function of the StatMiner TLDA analysis software (Integromics, Granada, Spain). Reaction conditions are described in the Data S1.


We thank the many people who contributed to the InCHIANTI study, including all of the anonymous participants. This study was supported in part by the Intramural Research Program, National Institute on Aging and the U.S. National Institutes of Health. We thank Luke Pilling for extra statistical analyses. William Henley completed this work while seconded to PenCLARHC which is funded by the National Institute of Health Research (NIHR). The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

Author contributions

LWH oversaw the validation experiments, interpreted the data and cowrote the manuscript. DH carried out the microarray experiments. WH oversaw the statistical analysis, carried out the multivariable regression analysis, and contributed to the manuscript. AW carried out initial preprocessing of the data. AH carried out the analysis of differential splicing with age. RBS carried out the TLDA validation of the microarray results. HY carried out the GSEA. AD, TF contributed to the manuscript. AM oversaw the collection and extraction of the RNA samples. JMG aided in design of the InCHIANTI study and contributed to the manuscript. SB oversees the InCHIANTI study and contributed to the manuscript. AS oversaw the microarray experiments. LF organized the sample cohort and contributed to the manuscript. DM managed the project, interpreted the data, cowrote the manuscript, and contributed funding.