Smurfness‐based two‐phase model of ageing helps deconvolve the ageing transcriptional signature

Abstract Ageing is characterised at the molecular level by six transcriptional ‘hallmarks of ageing’, that are commonly described as progressively affected as time passes. By contrast, the ‘Smurf’ assay separates high‐and‐constant‐mortality risk individuals from healthy, zero‐mortality risk individuals, based on increased intestinal permeability. Performing whole body total RNA sequencing, we found that Smurfness distinguishes transcriptional changes associated with chronological age from those associated with biological age. We show that transcriptional heterogeneity increases with chronological age in non‐Smurf individuals preceding the other five hallmarks of ageing that are specifically associated with the Smurf state. Using this approach, we also devise targeted pro‐longevity genetic interventions delaying entry in the Smurf state. We anticipate that increased attention to the evolutionary conserved Smurf phenotype will bring about significant advances in our understanding of the mechanisms of ageing.


S3
. tSNE (perplexity = 10) on RNA-seq samples.tSNE is computed on all genes.Colour indicates Smurf status, symbols the age (as in legend).Similarly to the PCA results in Fig. 1a, Smurf and non-Smurfs samples form two groups.In the non-Smurf groups we can notice the samples segregating by age, while the Smurf group appear more mixed.The same "mixed" behaviour for the 40 days samples as in the PCA are identified.

S4.
Unsupervised hierarchical clustering on sample-to-sample distance.Distance matrix (euclidean distance) is computed using all genes.Three main clusters are identified, showing good separation between Smurfs and non Smurfs but for the 40 days samples, which appear to either correlate with one of the two groups or form a third cluster independently of the Smurf status.
S5. edgeR pipeline validates DESeq2 analysis.Each of the commonly 2362 DEGs identified by the two pipelines is plotted as a function of the estimated fold changes.Estimated pearson correlation between the two is 0.99.S6.Enrichment analysis on differentially expressed proteins, Smurf vs non-Smurfs.Interconnected GO BP significant categories are here represented as a network.The color indicates the level of deregulation (Panther Fold Change estimation) -http://www.pantherdb.org/-.The node size provides an approximate indication of the GO BP category size.Amongst the upregulated categories we mostly observe response to stress and proteins involved in metabolism (with a strong signal coming from the IMP biosynthesic process category -associated to purine metabolism, not observed in the transcriptome).The downregulated categories mostly map to ribosomal proteins, mitochondrial respiratory chain (complex I), metabolism (with the lipid catabolic process confirming what is observed in the Smurf transcriptome).The gene expression categories include numerous ribosomal proteins and should therefore not be interpreted as a signal regarding transcriptional regulation.

S7. PCA performed on metabolomic data.
Similarly to what occurs for th transcriptomic, the PCA on the quantification of 202 metabolites clearly separates Smurf and non-Smurf samples.PCA performed through MetaboAnalyst online platform.

S8. Smurf DEGs and metabolites FC on KEGG fatty acid biosynthesis pathway (dm00061).
The pathview R package is used to map the genes identified as DEGs in the Smurf/non-Smurf comparison and belonging to the KEGG fatty acid pathway (dm00061).The log2FC estimated by the DESeq2 analysis is represented by the color scale.The detected metabolites are colored according Smurf/non-Smurf log2FC, and associated to a * when significant to Wilcoxon test (p-value < 0.05).The downregulation of biosynthesis-mediating enzymes is associated by a decreased presence in Smurfs of the final fatty acid products, suggesting that the transcriptional signature is functional.Enzymatic complexes are annotated through unique identification code, while genes are automatically annotated with the humane symbol.To retrieve the Drosophila gene symbols from pathway's nodes, go to the online version and place the pointer on the gene on interest (https://www.genome.jp/pathway/dme00061).S9.Lactic acid levels are significantly higher in Smurfs.Smurf present a significant increase in lactic acid level (log2FC = 0.90; **p-value < 0.001) compared to non-Smurfs.This confirms that the transcriptional upregulation of Ldh in Smurfs is functional.Smurfs might rely more on fermentation after glycolysis (compared to the non-Smurfs) given the general impairment experience in mitochondria.

S10. Smurf DEGs and metabolites FC on KEGG TCA cycle pathway (dm00061).
The pathview R package is used to map the genes identified as DEGs in the Smurf/non-Smurf comparison and belonging to the KEGG fatty acid pathway (dm00061).The log2FC estimated by the DESeq2 analysis is represented by the color scale.The detected metabolites are colored according Smurf/non-Smurf log2FC, and associated to a * when significant to Wilcoxon test (p-value < 0.05).As already discussed in the Smurf transcriptome characterization, the TCA cycle displays wide downregulation.At a metabolomic level, the pathway missed the threshold for significance in the quantitative enrichment analysis (FDR = 0.13), and only succinate is significant to Wilcoxon test (log2FC = 1.28 , p-value < 0.05).However, given the general impairment of mitochondrial metabolism observed at a transcriptomic and proteomic level, we believe the trend observed in the metabolomic data could still be interesting and serve as hypothesis generator for further analysis.Enzymatic complexes are annotated through unique identification code, while genes are automatically annotated with the humane symbol.To retrieve the Drosophila gene symbols from pathway's nodes, go to the online version and place the pointer on the gene of interest (https://www.genome.jp/pathway/map00020).

S11. The number of DEGs in age-matched Smurf/non-Smurf comparisons decreases with chronological age.
When comparing age-matched Smurfs and non-Smurfs, different number of DEGs are retrieved (DEGs20 = 2190 , DEGs30 = 1982 , DEGs40 = 24).The dramatic drop of DEGs at 40 days suggests that the transcriptome of old Smurfs and non-Smurfs are more similar than at younger ages.This was already suggested by the PCA (Fig. 1a) and might suggest that the old non-Smurfs samples, collected in the old population, are enriched in pre-Smurfs compared to their younger counterparts.

S12. Higher relative standard deviation (RSD) in gene expression in our dataset is associated to lower counts.
We divided the RSD distributions in Fig. 4b into the four quartiles (x axis) and plotted the mean gene expression of the associated genes (y axis) for Smurfs and non-Smurfs at 20 and 40 days.The mean gene expression shows a decreasing trend over the four group, proved by the significant difference between the mean gene expression of the first and fourth quartile for both Smurf and non-Smurf at 20 and 40 days (wilcoxon test, p-value < 10 e-16 ).
S13. Longevity screening results.Summary of the results of the longevity screen carried out on the genes listed in Table 1.For each experiment, the 4 RU486 treatments and the two experimental setting ("adulthood only" and "development & adulthood") are listed.The controls are not represented as they are the reference for the statistical test (log-rank) and computation of the mean lifespan change.The size of the the point indicates the significance of the difference in the longevity curve (treatment compared to control), while the colour indicates the direction of the change -decrease or increase of mean lifespan.In most cases we detected a significant difference with negative effect on the populations' lifespan (blue large points).Interestingly most of the positive hits (red large points) map to the group of genes found by i-cisTarget as putative regulators of TFs up in Smurfs.

S14. CG4360 KD (adulthood & development) validation.
The effect showed in Fig. 7 on the CG4360 KD (adulthood & development setting), RU10 μg/mL, is confirmed by a third independent experiment.The effect is not observed on the "adulthood only" setting.The dotted line point at the median lifespan of the populations.The effect on the mean lifespan (ML) is + 9.5% (MLRU0 = 71.5,MLRU10 = 78.5).

S16. RU486 treatment does not affect lifespan.
In order to confirm that the RU486 treatment alone does not affect lifespan, we performed GS longevity experiment with the daGS driver inducing w KD (white KD does not affect longevity).We induced the GS with RU 200 μg/mL, corresponding to the highest treatment used in our longevity experiments.No significant difference in the longevity curves is detected in the "adulthood only" setting (MLRU0 = 37.4,MLRU200 = 37.9, p-value in figure).A significant difference is detected in the "adulthood & development" setting (MLRU0 = 37.0, MLRU200 = 38.7,p-value in figure).However, the modest effect ( +4.5%), together with the overlap of the confidence intervals of the curves, suggest that the effect is not biologically relevant.

S17. Two populations with non-significantly different lifespan experience the same Smurf proportion increase over time: the example of CG4360 KD (adulthood only). (i) Longevity experiment. CG4360
does not extend lifespan when knocked-down during adulthood only (MLRU0 = 71.6,MLRU10 = 75.5,log-rank p-value = 0.21).(ii) Smurf proportion evolution over time.The Smurf proportion significantly increases over time in the populations (slopeRU0 = 0.0036, p-valueRU0 = 1.50e-06, slopeRU10 = 0.0034 , p-valueRU10 = 6.12e-04).However, no significant difference is detected between the slope of the control and the treated population (p-value = 0.84), contrary to what observed when the populations have significantly different lifespan (Fig. 6b).

S18. Longevity experiments on males.
In order to investigat if the longevity effect found on females applies to males, we performed the experiment on males from the same GS line.Results are reported for the condition extending lifespan on females (RU50 μg/mL, adulthood only, for Adf1 and Trl; RU10 μg/mL, development & adulthood, for CG4360).No significant effect is detected for Adf1 and CG4360 (log-rank p-values reported in figure; Adf1: MLRU0 = 77.1 , MLRU50 = 74.4;CG4360: MLRU0 = 68.7 , MLRU10 = 65.8).A significant negative effect is detected for Trl KD (Trl: MLRU0 = 68.5 , MLRU50 = 62.9, -8.1%).However, the longevity curves are evolving similarly and the confidence intervals are diverging only after the T50 ; this suggests that the results need to be interpreted carefully, as the significance might not imply biological relevance.

Fig S19. Aef1 KD negatively affects life expectancy following the treatment gradient.
Aef1 KD negatively affects lifespan at all doses (MLRU0 = 87.6,MLRU10 = 82.2,MLRU50 = 78.6,MLRU100 = 77.4,MLRU200 = 73.3;p-value < 0.00001 for the log-rank test, details in Table S14).Dashed lines in figure indicate the median lifespan.The dose-dependent trend suggested by the ML values is confirmed when comparing the longevity curves of the treated populations, with only the RU50 and RU100 showing no significant difference (RU10-RU50: p-value = 3e-10 ; RU50-RU100: p-value = 0.2 ; RU100-RU200: p-value = 2e-04).Such trend suggest an effect on longevity of Aef1 rather than a toxic effect of the KD.Table S1.GSEA results, Smurf/non-Smurf analysis.List of the 59 significant deregulated GO BP categories (adjusted p-value < 0.05) from the GSEA analysis on the list of Smurf DEGs.Results are illustrated in Fig. 2 of the main text.GO BP category: ID and description of the biological process category; size: number of genes annotated in the category; NES: normalized enriched score; p.adjust: FDR correction on the p-value, * < 0.05, ** < 0.01.Table S2.Quantitative enrichment analysis on metabolites profile (S/NS), significant hits.Quantitative enrichment analysis on metabolites quantification (from MetaboAnalyst) results in 13 significant KEGG pathways.The TCA cycle missed the 5% significant threshold (FDR = 0.13), but most of the associated metabolites are present in the pyruvate metabolism pathway.In confirmation of what is seen with the transcriptomic, we find fatty acid metabolism associated pathways.A signal from amino acids metabolism is also detected.Metabolite set: KEGG pathway; Total: number of metabolites in the pathway; FDR: adjusted p-value.
Table S3.GSEA analysis on old Smurfs/young Smurfs.List of the 125 deregulated GO BP categories (adjusted pvalue < 0.05) from the GSEA analysis on the list of old Smurf DEGs.Results are partially illustrated in Fig. 4 of the main text.GO BP category: ID and description of the biological process category; size: number of genes annotated in the category; NES: normalized enriched score; p.adjust: FDR correction on the p-value, 0.01 < * < 0.05.
Table S4.GSEA analysis on old/young non-Smurfs.List of the 22 significant deregulated GO BP categories (adjusted p-value < 0.05) from the GSEA analysis on the list of old non-Smurf DEGs.Results are illustrated in Fig. 3a of the main text.GO BP category: ID and description of the biological process category; size: number of genes annotated in the category; NES: normalized enriched score; p.adjust: FDR correction on the p-value, * < 0.05, ** < 0.01.S5.Human genes from Ageing Atlas mapping to Smurf DEGs.A total of 134 (unique) human genes are retrieved by overlapping the 500 human genes annotated in the Ageing Atlas to the Smurf DEGs.Note that in the table some human genes are "duplicated" as they map to more than one fly gene, and the opposite.In total, 121 unique fly genes are found.Human symbol: human gene name; Flybase: Drosophila gene, flybase ID; Ageing marker: ageing marker annotated to the human gene (12 in total defined); log2FC (DESeq2): log2FC estimated by DESeq2 in the Smurf/non-Smurf analysis; FDR (DESeq2): adjusted p-value, FDR method, *** FDR < 0.001, ** FDR < 0.01, * FDR < 0.05.Table S6.Human genes from Ageing Atlas mapping to non-Smurf DEGs.A total of 25 (unique) human genes are retrieved by overlapping the 500 human genes annotated in the Ageing Atlas to the old non-Smurf DEGs.Note that in the table some human genes are "duplicated" as they map to more than one fly gene, and the opposite.In total, 24 unique fly genes are found.Human symbol: human gene name; Flybase: Drosophila gene, flybase ID; Ageing marker: ageing marker annotated to the human gene (12 in total defined); log2FC (DESeq2): log2FC estimated by DESeq2 in the Smurf/non-Smurf analysis; FDR (DESeq2): adjusted p-value, FDR method, *** FDR < 0.001, ** FDR < 0.01, * FDR < 0.05.Table S7.Drosophila longevity genes (GenAge) mapping to Smurf DEGs.Table S7.Drosophila longevity genes (GenAge) mapping to Smurf DEGs.Drosophila longevity genes (annotated in GenAge) mapping to Smurf DEGs.A total of 58 unique genes are identified.Note that the table contains duplicated gene symbols as multiple experiments can be reported for one gene.Symbol: Drosophila gene symbol; log2FC: log2FC Smurf/non-Smurfs estimated by DESeq2; effect: effect of the alteration lifespan; % effect: change in mean lifespan, in %; method: type of experiment performed; reference: reference of the study.Table S8.Drosophila longevity genes (GenAge) mapping to non-Smurf DEGs.Drosophila longevity genes (annotated in GenAge) mapping to old non-Smurf DEGs.A total of 11 unique genes are identified.Note that the table contains duplicated gene symbols as multiple experiments can be reported for one gene.Symbol: Drosophila gene symbol; log2FC: log2FC 20 days non-Smurf/40 days non-Smurfs estimated by DESeq2; effect: effect of the alteration lifespan; % effect: change in mean lifespan, in %; method: type of experiment performed; reference: reference of the study.

Table S9. Linear regression on non-Smurfs gene expression (time dependence).
Table S9.Linear regression on non-Smurfs gene expression (time dependence).

Table S9. Linear regression on non-Smurf gene expression (time dependence).
Table S9.Linear regression on non-Smurfs gene expression (time dependence).The 301 genes with significant slope over time in non-Smurfs, with r 2 > 0.5.Genes are ordered by descending slope value.Flybase: flybase ID; slope: β1 of the linear regression; p-value: F-statistic p-value; R squared : r 2 of the estimated linear regression; symbol: Gene symbol; DEGs overlap: specifies if the genes has been detected as significantly deregulated in Smurfs, old non-Smurfs, both or none.
Table S10.KEGG pathways affected by Smurfness.The 48 pathways identified as affected more by Smurfness than chronological age according to our expression dataset.KEGG path: KEGG ID and pathway name; Avg age correlation: average gene expression correlation with chronological age on the genes belonging to the pathway; Avg Smurf correlation: average gene expression correlation with Smurf on the genes belonging to the pathway; adjust pval (Fasano-Franceschini): adjusted p-value (FDR) from the Fasano-Franceschini test.
Table S11.KEGG pathways affected by chronological age.The 38 pathways identified as affected more by chronological age than Smurfness according to our expression dataset.KEGG path: KEGG ID and pathway name; Avg age correlation: average gene expression correlation with chronological age of the genes belonging to the pathway; Avg Smurf correlation: average gene expression correlation with Smurf on the genes belonging to the pathway; adjust pval (Fasano-Franceschini): adjusted p-value (FDR) from the Fasano-Franceschini test.
Table S13.i-cisTarget results.The table reports the best hits provided by i-cisTarget when the queries are 1) TFs upregulated in Smurfs, 2) TFs downregulated in Smurfs, 3) genes upregulated in Smurfs (log2FC > 2), genes upregulated in Smurfs (log2FC < -2).In all cases the gene symbol, score and putative detected targets are reported.
Table S14.Longevity screening results.Results are organized by groups according to the way the genes were detected (DESeq2 for the first two groups -up and down in Smurfs-, and i-cisTarget for the last two groups -putative regulators of Smurf TFs).Information about the gene and its alteration (KD or OX) are provided, together with the line