Phenotype correlations reveal the relationships of physiological systems underlying human ageing

Abstract Ageing is characterized by degeneration and loss of function across multiple physiological systems. To study the mechanisms and consequences of ageing, several metrics have been proposed in a hierarchical model, including biological, phenotypic and functional ageing. In particular, phenotypic ageing and interconnected changes in multiple physiological systems occur in all ageing individuals over time. Recently, phenotypic age, a new ageing measure, was proposed to capture morbidity and mortality risk across diverse subpopulations in US cohort studies. Although phenotypic age has been widely used, it may overlook the complex relationships among phenotypic biomarkers. Considering the correlation structure of these phenotypic biomarkers, we proposed a composite phenotype analysis (CPA) strategy to analyse 71 biomarkers from 2074 individuals in the Rugao Longitudinal Ageing Study. CPA grouped these biomarkers into 18 composite phenotypes according to their internal correlation, and these composite phenotypes were mostly consistent with prior findings. In addition, compared with prior findings, this strategy exhibited some different yet important implications. For example, the indicators of kidney and cardiovascular functions were tightly connected, implying internal interactions. The composite phenotypes were further verified through associations with functional metrics of ageing, including disability, depression, cognitive function and frailty. Compared to age alone, these composite phenotypes had better predictive performances for functional metrics of ageing. In summary, CPA could reveal the hidden relationships of physiological systems and identify the links between physiological systems and functional ageing metrics, thereby providing novel insights into potential mechanisms underlying human ageing.

risk across diverse subpopulations in US cohort studies. Although phenotypic age has been widely used, it may overlook the complex relationships among phenotypic biomarkers. Considering the correlation structure of these phenotypic biomarkers, we proposed a composite phenotype analysis (CPA) strategy to analyse 71 biomarkers from 2074 individuals in the Rugao Longitudinal Ageing Study. CPA grouped these biomarkers into 18 composite phenotypes according to their internal correlation, and these composite phenotypes were mostly consistent with prior findings. In addition, compared with prior findings, this strategy exhibited some different yet important implications. For example, the indicators of kidney and cardiovascular functions were tightly connected, implying internal interactions. The composite phenotypes were further verified through associations with functional metrics of ageing, including disability, depression, cognitive function and frailty. Compared to age alone, these composite phenotypes had better predictive performances for functional metrics of ageing. In summary, CPA could reveal the hidden relationships of physiological systems

| INTRODUC TI ON
Ageing is characterized by degeneration and loss of function across multiple physiological systems, and it has recently been investigated as a complex, multifactorial process (Maguire & Slater, 2013). To study the mechanisms and consequences of ageing, several metrics have been proposed in hierarchical models , namely biological, phenotypic and functional ageing. In particular, phenotypic ageing and interconnected changes in multiple physiological systems occur in all ageing individuals over time and may contribute to clinical diseases . For example, ageing is accompanied by a progressive decline in immune function, and toll-like receptor 5 (TLR5) may provide a critical mechanism to enhance immune responsiveness in older individuals (Qian et al., 2012). Recently, researchers have  proposed a new ageing measure, phenotypic age, to capture morbidity and mortality risk across diverse subpopulations in a US cohort study . Although phenotypic age can facilitate the identification of individuals at risk for a number of health conditions and deaths, and although it may serve as a useful tool for evaluating intervention effectiveness (Belsky et al., 2015;Liu et al., 2018), it still has some drawbacks. Phenotypic age was calculated based on a linear combination of chronological age and 9 multisystem clinical chemistry biomarkers, which may overlook the relationships among physiological systems (Zierer et al., 2015). Meanwhile, multiple physiological systems play important but different roles in the ageing process (López-Otín et al., 2013); thus, it was essential to explore the relationships between physiological systems.
To explore the relationships of physiological systems, systematically dissecting the correlation structure of these phenotypic biomarkers is essential. Analysis of the correlation structure of phenotypic biomarkers underlying human ageing may be a promising strategy (Freund, 2019). Generally, there are several methods of dissecting correlation structures from multiple phenotypic biomarkers, such as network-based methods (Newman et al., 2012;Zierer et al., 2015) and composite phenotype-based methods .
In network-based methods, phenotypic biomarkers are mostly grouped into several physiological modules based on prior knowledge (Freund, 2019;Newman et al., 2012). However, in composite phenotype-based methods, phenotypic biomarkers were grouped into several composite phenotypes using data-driven approaches.
One of the important strengths of composite phenotypes is their ability to reduce the data dimensions and capture efficient information from multiple single phenotypes . In addition, the correlation between composite phenotypes can reflect the relationships of physiological systems. Therefore, composite phenotype analysis could be an efficient strategy to understand the relationships of physiological systems underlying human ageing.
Here, we proposed a new framework of composite phenotype analysis (CPA) and applied it to 71 phenotypic biomarkers in the Rugao Longitudinal Ageing Study (RLAS) to reveal the relationships of physiological systems underlying human ageing. First, we grouped phenotypic biomarkers into several composite phenotypes and examined the robustness of the clustering results. Second, we compared the correlation structure of composite phenotypes with prior findings to validate the reliability and investigate the potential mechanisms of correlations in these composite phenotypes. Finally, we linked composite phenotypes to functional metrics of human ageing, including disability, depression, cognitive function and frailty, and further validated the application of CPA in both cross-sectional and longitudinal datasets.

| Exploring the correlation structure of phenotypic biomarkers underlying human ageing
In this study, we collected 71 phenotypic biomarkers from 2074 individuals (44.94% males) from the fourth wave of the RLAS (Liu et al., 2016). The basic descriptive statistics are summarized in Table   S1. Due to a lack of information on disability, cognitive impairment and frailty, 251 individuals were excluded from further analysis. The association of composite phenotypes with functional ageing metrics was analysed in 1823 individuals. Among them, 799 (43.83%) were males, and the mean age was 78.68 ± 4.79 years. According to the criteria, 169 (9.27%), 909 (49.86%) and 322 (17.66%) individuals were defined as having disability, cognitive impairment and frailty, respectively. The health status of the individuals was described in Table S2.
We used the maximum information coefficient (MIC) (Reshef et al., 2011) to measure both linear and nonlinear correlations between phenotypic biomarkers. To detect and filter spurious correlations in high-dimensional phenotypic datasets, we applied random matrix theory (Luo et al., 2007) to identify the appropriate threshold of MIC. We found that the eigenvalue spacing distribution of the MIC correlation matrix transitioned from a Winger-Dyson distribution to an exponential distribution when the candidate signal-noise and identify the links between physiological systems and functional ageing metrics, thereby providing novel insights into potential mechanisms underlying human ageing.

K E Y W O R D S
composite phenotype analysis, functional ageing, human ageing, phenotypic ageing, physiological systems separating threshold was set as 0.16 ( Figure S1). The elements in the MIC correlation matrix which were less than the threshold were set as zero ( Figure S2). The filtered MIC correlation matrix was then used as the adjacency matrix of the phenotypic network. Next, a sparse phenotypic network was built with each phenotypic biomarker as a vertex and filtered MIC as a weighted edge ( Figure 1a). Furthermore, to detect communities of these biomarkers, spectral clustering was applied on the sparse phenotypic network.
To examine the robustness of the clustering results, the samples were initially resampled 100 times, and the present clustering results were used as a reference. Then, normalized mutual information (NMI) was applied to quantify the concordance between the resampled results and the reference. The mean NMI was over 0.985, indicating that the clustering results were robust for resampled samples ( Figure S3A). To further evaluate the impact of thresholds on clustering results, the threshold was set from 0.1 to 0.5. When the threshold was set to approximately 0.16, the clustering results were stable compared with the reference ( Figure S3B). Furthermore, we also examined the impact of the threshold on the topology of the phenotypic network ( Figure S3C). The topological parameters included connectance, average path length, clustering coefficient and modularity. The distribution of connectance showed that the phenotypic network became sparse with the strict threshold. Meanwhile, the average path length, clustering coefficient and modularity reached the optimized peak value. These results indicated that the phenotypic structure was highly modularized with a threshold set to approximately 0.16.
In summary, we used MIC to quantify the correlations among phenotypic biomarkers and filtered the spurious correlation with the threshold suggested by random matrix theory. The structures of phenotypic biomarkers were identified through spectral clustering. In addition, the clustering results were robust to samples, and the threshold setting was reasonable. The relationships within clusters were much stronger than those between clusters ( Figure S2).
The correlation heat map ( Figure S2) showed the spectral clustering results of 71 phenotypic biomarkers, which were clustered into 18 groups.  (Parhofer, 2015). CP5 included electrolytes (Na, Cl) and blood gas (CO2CP). The transport of carbon dioxide is dependent on a chloride shift, which refers to the exchange of bicarbonate and chloride across the membrane of red blood cells (Crandall et al., 1981). The CP7 included indicators of kidney function (CREA, eGFR, UA, β2. MG, Cys. C) and cardiovascular function (HCY, FOL, BNP). Several epidemiological studies have confirmed the relationships between chronic kidney disease and cardiovascular risk factors (Amann et al., 2006). CP14 grouped CRP into white blood cell phenotypes, both of which are indicators of inflammation.

| Defining composite phenotypes of phenotypic biomarkers underlying human ageing
Therefore, these results indicated that CPA could reveal novel and meaningful classifications of these phenotypic biomarkers underlying human ageing.
The correlations of individual phenotypic biomarkers within the same composite phenotypes were primarily identified by the sparse phenotypic network (Figure 1a). In addition, the relationships among single phenotypic biomarkers across different composite phenotypes were investigated in the circular phenotypic network with a relatively low threshold ( Figure 1b). The correlations between individual phenotypes with MICs greater than 0.1 are supplemented in Table S3. In particular, phenotypes of body shape (CP1), blood lipids (CP4) and white blood cells (CP14) were significantly connected.
Additionally, the phenotypes of kidney and cardiovascular functions (CP7), white blood cells (CP14) and red blood cell counts (CP17) were significantly connected.

| Linking composite phenotypes and functional ageing metrics
To investigate the ageing signatures of composite phenotypes, we linked them with functional ageing metrics (Figure 2), including disability (ADL) (Katz et al., 1970), depression (GDS) (Dennis et al., 2012), cognitive function (HDS) (Imai & Hasegawa, 1994) and frailty (FP) (Fried et al., 2001). First, we used multiple linear regression to investigate the associations between composite phenotypes and functional ageing metrics (Model 1). Then, these associations were compared with those between age and the functional ageing metrics (Model 2). Finally, the combinations of composite phenotypes and age were linked to functional ageing metrics to explore the additional ageing signatures of composite phenotypes except for age (Model 3). These associations between composite phenotypes and functional ageing metrics were adjusted for covariates including marital status and educational levels (Model 4). These models were conducted in males and females separately (Table S4). Similar results were found in males and females.
For disability (ADL, Figure 2a), we found that CP7 were more informative than age in predicting disability. CP1, CP2, CP11, CP14, CP17 and CP18 had additional effects with age to correlate with disability. For depression (GDS, Figure 2b), we found that CP4 and CP17  Figure 2d), CP7 was more informative than age. In addition, CP1, CP14, CP15 and CP18 had additional effects with age to correlate with frailty.
In summary, we found that body shape (CP1), kidney and cardiovascular functions (CP7), white blood cells (CP14) and red blood cell counts (CP17) and distribution (CP18) were primarily associated were few improvements in the correlation between the composite phenotype of body shape (CP1), kidney and cardiovascular functions (CP7) and functional ageing metrics. In contrast, several composite phenotypes (CP14, CP17, CP18) had great additional effects with age that correlate with functional ageing metrics (Figure 2f).
After adjusting for covariates in Model 4, the composite phenotypes were still significantly correlated with functional ageing metrics (Table S4).

| Replicating the associations between composite phenotypes and functional ageing metrics
To validate the applications of CPA, we first examined these as- and cardiovascular functions) and CP17 (red blood cell counts) had remarkable AUCs for functional ageing metrics.

| Revealing relationships between kidney and cardiovascular functions in CP7
Considering the great correlation between CP7 and functional age- Furthermore, we explored the interactions between the kidney and cardiovascular functions using these biomarkers in CP7. Notably, we found that kidney and cardiovascular functions (CP7) were associated with frailty and that the interactions between them on frailty were also significant (Figure 4c,d). Specifically, we conducted principal component analysis on these indicators (Figure 4b). Its effects on frailty increased with higher cardiovascular PC1 levels (Figure 4c). Similarly, cardiovascular PC1 was also a risk factor for frailty, and its effects on frailty increased with higher kidney PC1 levels (Figure 4d). In summary, there were nonlinear additional effects of kidney and cardiovascular functions on frailty, suggesting synergistic effects between kidney and cardiovascular functions underlying human ageing.

| Evaluating the performance of composite phenotype analysis (CPA)
We proposed CPA as an integrated framework to systematically dissect the phenotype correlations in the RLAS. The workflow is summarized in Figure 5 and mainly consists of four steps: measuring the correlation between phenotypes (Step 1), pruning the phenotypic network ( Step 2), extracting the composite phenotypes ( Step 3) and linking the composite phenotypes to functional ageing metrics (Step 4). To investigate the performance of CPA, other three correlation measurements, three filtering thresholds, five clustering algorithms and five dimensionality reduction methods were enrolled. The benchmarking analysis was separated into four parts (Experimental procedures).
In summary, the advantages of CPA could be seen in several ways. First, both linear and nonlinear correlations between phenotypes were considered. Then, spectral clustering grouped the phenotypes based on Laplacian matrices of the phenotypic network.
The sparsity of the network contributed to the performance of spectral clustering, while other clustering algorithms were subject to the noise of spurious correlations between phenotypes. Finally, the composite phenotypes retained all the information of individual phenotypes. Linear regression and machine learning algorithms were applied to explore the connection between phenotypic and functional ageing. In CPA, we found that CP1, CP7, CP14, CP17 and CP18 had remarkable correlations and predictive abilities with functional ageing metrics.

| DISCUSS ION
In this study, we applied CPA to 71 biomarkers underlying human ageing to elucidate their correlation structure and obtained 18 composite phenotypes in the RLAS. These composite phenotypes captured more ageing information than age in correlation with Additionally, we found a significant correlation between blood lipids and blood glucose (CP4), a correlation between electrolytes and blood gas (CP5), and a correlation between kidney and cardiovascular functions (CP7). Furthermore, the effects of these phenotypic biomarkers on functional ageing metrics were not independent (as with CP7). In brief, there were interactions between these biomarkers, suggesting extensive relationships between these physiological systems underlying human ageing.
The potential mechanisms of correlations between composite phenotypes and functional ageing metrics were reasonable. Both CP1 (body shape) and CP14 (white blood cells) were associated with functional ageing metrics in our study. The potential mechanisms underlying this association may be that low-grade systemic inflammation was associated with CP1 (Visser et al., 1999) and CP14 (Leng et al., 2005). Inflammation may mediate the association of CP1 and CP14 with frailty (Leng et al., 2009), cognitive impairments (Zenaro et al., 2015) and disability (Nuesch et al., 2012) in older adults. CP17 (red blood cell counts) and CP18 (red blood cell distribution) were associated with cognitive impairments in our study. Previous studies observed similar results; mean cell haemoglobin (MCH) and red cell distribution width (RDW) were most strongly associated with cognitive function (Winchester et al., 2018) and iron deficiency anaemia (Goddard et al., 2011). Mendelian randomization studies demonstrated that increased iron reduces the risk of Parkinson's disease (Pichler et al., 2013). In addition, low-grade inflammation was associated with the development of anaemia (Nemeth & Ganz, 2014) and frailty (Soysal et al., 2016) in older adults; therefore, this could be one of the potential pathways underlying the association between anaemia and frailty.
The correlation structures between phenotypes in the RLAS, such as blood glucose and blood lipids (CP4), kidney and cardiovascular functions (CP7), are worthy of further investigation. The health of the population could contribute to the observed correlation structures. Therefore, we compared the correlations between phenotypes within healthy individuals and individuals with diseases (e.g. cardiovascular disease, chronic kidney disease and anaemia).
The correlations between indicators of kidney and cardiovascular functions were enhanced in subgroups with diseases ( Figure S4). For example, the SCC between β2. MG and BNP was 0.44, while it was 0.27 in the healthy subgroups. The dysregulation of physiological systems could lead to distinctive phenotype correlations, which was more evident in diseases.
Relationships among physiological systems are common, such as cardiopulmonary and brain-heart systems (Kuh et al., 2019;Schefold et al., 2016). Here, we also explored the interactions between the kidney and cardiovascular functions. The mechanisms by which the kidney and cardiovascular system are associated with frailty are not entirely understood. Probable explanations of the interaction were that cardiac and renal disease share several common bidirectional pathways, such as haemodynamic, (neuro) hormonal and cardiovascular disease-associated mechanisms (Schefold et al., 2016). All three mechanisms are interconnected and could negatively affect both cardiac and renal function (Schefold et al., 2016), thereby causing frailty by influencing physical and cognitive function.
Statistical approaches to infer networks from biological data include Gaussian graph models, Bayesian networks, correlation networks and information theory (Yu et al., 2013). It is essential to choose an appropriate method to quantify the similarity between the vertices of the network. We used the maximal information coefficient (MIC) to detect the correlations between phenotypes, which serves as a general tool in coexpression networks (Song et al., 2015).
With the increasing availability of biological data, filtering information in large complex networks of interactions is beneficial for the emergence of biological networks (Marcaccioli & Livan, 2019). In this study, we applied a global threshold developed from RMT (Luo et al., 2006). The thresholding methodology of RMT has been applied in gene and microbial networks (Deng et al., 2012;Luo et al., 2007). The suitability of the threshold was evaluated through its effects on the topology of the networks (Couto et al., 2017). RMT has been widely used in characterizing nonrandom phenomena in physical, material and social systems (Luo et al., 2007), and it has been well recognized in these systems that RMT analyses are efficient for distinguishing system-specific, nonrandom properties from random noise (Luo et al., 2006;Segal et al., 2003).
Network-based methods have been widely used in many fields, such as microbial communities, protein interactions and gene coexpression. In the framework of network-based methods, phenotypic biomarkers were mostly grouped into several physiological modules based on prior knowledge (Freund, 2019;Newman et al., 2012).
However, in CPA, phenotypic biomarkers were grouped into several composite phenotypes using data-driven methods. Therefore, we proposed using CPA to study ageing for the first time and revealed several relationships of physiological systems underlying human ageing. CPA could potentially be employed as a general strategy for studying complex traits, especially in the analysis of phenomics .
There were still some limitations in our study. First, we enrolled 71 markers to construct the phenotypic network and extract composite phenotypes. This may differ in other cohorts and change with different phenotypes or approaches. Second, the number of biomarkers in our cohort increased with each of the three waves, although some biomarkers were not measured in the previous waves. Finally, although the study cohort had been followed up for three years and found encouraging results, it was still essential to conduct longer-term follow-ups to validate our findings.
In summary, CPA provides a promising opportunity for researchers to understand the intrinsic correlation structure of phenotypic ageing biomarkers through a data-driven strategy. Furthermore, CPA could reveal the hidden relationships of physiological systems and identify the important links between physiological systems and functional ageing metrics, thereby providing novel insights into potential mechanisms underlying human ageing.

| Study population
We applied CPA to data from the fourth wave of the ageing arm of the Rugao Longitudinal Ageing Study (RLAS), a population-based observational two-arm cohort study conducted in Rugao, Jiangsu Province, China (Liu et al., 2016). The validation data were drawn from the second wave of the cohort. As previously described, the

| Biomarker datasets
Fasting blood samples of all participants were collected by trained nurses during the morning of the survey. Laboratory measurements (Table S1) included blood biochemistry (e.g. blood lipids), routine clinical examinations (e.g. blood pressure) and other blood biomarkers (e.g. homocysteine and B-type natriuretic peptide). Anthropometrics characteristics were measured. Grip strength was assessed using a Hand Grip Dynamometer (Shanghai Wanqing Rlrctron Co. Ltd., Shanghai, China), timed 'up and go' test (participants stand up from an armchair, walk 3 m, return and sit down again), 5-metres walking test and sit-to-stand from a chair test (Podsiadlo & Richardson, 1991). Electrocardiography was performed on each participant, and the ECG parameters, including heart rate, PR intervals (the time elapsing between the beginning of the P wave and the beginning of the next QRS complex), QRS duration (a series of waveforms on an electrocardiogram that represents depolarization of ventricular muscle cells), S wave in V1 (SV1), R wave in V5 (RV5) and QTc, were determined via the interpretation programs of the ECG machine. All ECG parameters and abnormalities were identified by another cardiologist. To adjust for heart rate, the Bazett formula (QTc =QT/√RR) was used in the present study.

| Functional metrics of ageing
Functional disability was assessed by the Katz scale (Katz et al., 1970). Each task had three response options: strongly independent, somewhat independent and strongly dependent. Participants who responded as somewhat independent or strongly dependent for any tasks were defined as having a functional disability. Depressive symptoms were measured using the 15-item Geriatric Depression Scale (GDS) (Yesavage, 1988), a validated self-report questionnaire commonly used for the assessment of depressive symptoms in older adults. The questionnaire contained 15 questions (yes or no) with a score of 0-15. In our study, a score of 6 or more was defined as having a depressive symptom (Dennis et al., 2012). Cognitive function was evaluated by the revised Hasegawa's dementia scale (HDS-R), which comprised orientation, memory, attention/calculation and verbal fluency (Imai & Hasegawa, 1994). HDS-R has been widely accepted in Asian populations in clinical and epidemiological surveys for the assessment of cognitive impairment (Sengchanh et al., 2019).
In our study, individuals who scored higher than 21.5 were defined as having normal cognitive function, while those who scored 21.5 or below were defined as having cognitive impairment. According to Fried et al., the frailty phenotype was defined in the following five domains: weight loss, exhaustion, low activity, weakness and slowness (Fried et al., 2001).

| Composite Phenotype Analysis (CPA)
The correlation between phenotypes was quantified by MIC (Reshef et al., 2011). The idea of MIC was that a scatterplot of the two variables could be partitioned to encapsulate the relationship. It explored all grids up to a maximal grid resolution to obtain the highest normalized mutual information between these two variables. The calculation of MIC was implemented with the R package 'Minerva'.
We calculated the MIC in males and females separately and averaged the results. The thresholds of the MIC correlation matrix were obtained through RMT (Luo et al., 2007). In detail, the nearestneighbour spacing distribution (NNSD) for the eigenvalues of a random symmetric matrix followed the Wigner-Dyson distribution.
While there were only strong correlations along the (block) diagonal of the matrix, it followed an exponential distribution. Therefore, with increasing threshold, the NNSD of the matrix transitioned from a Wigner-Dyson distribution to an exponential distribution.
The R package 'RMThreshold' provided algorithms based on RMT that could be used to determine an objective threshold for signalnoise separation in large random matrices. Community detection of phenotypic networks used spectral clustering algorithms. The topological parameters of the network included the average path length, clustering coefficient, connectance and modularity (Zhao & Liu,).

| Statistical analysis of composite phenotypes and functional ageing metrics
The composite phenotypes were linked to functional ageing metrics using multiple linear regression. We constructed four mod-

| The Benchmarking of CPA
The evaluation of CPA was separated into 4 parts to compare its performance with other common algorithms.
At Step1, we compared MIC with the Spearman correlation coefficient (SCC), Pearson correlation coefficient (PCC) and Kendall correlation coefficient (KCC) and found that the MIC was proportional to these others ( Figure S5A). We also found that some outliers were small on other coefficients but large on MIC, and the nonlinear relationships of these phenotypic biomarkers were quantified by MIC effectively. For example, the MIC between the ratio of waist to hip (WHR) and hip was 0.477, the SCC was 0.05, the PCC was 0.1 and the KCC was 0.027 ( Figure S5A). These results indicated that MIC outperformed SCC, PCC and KCC in quantifying the nonlinear correlations between phenotypic biomarkers.
At Step2, the raw phenotypic network was pruned through thresholds to obtain the sparse network, in which the relationships of phenotypes were more distinct. We compared RMT with methods of multiple testing correction (Bonferroni adjustment and false discovery rate correction) and empirical thresholds. The distributions of the correlation coefficient were checked ( Figure S5B). There were 1886 and 1222 remaining correlations for Bonferroni and false discovery rate (FDR), respectively, indicating that multiple testing corrections of p values to screen correlation were not enough. The top 5% and 10% of MIC were 0.139 and 0.215, respectively. According to the distribution of topology on the phenotypic network, these thresholds were less appropriate.
At Step3, a spectral clustering algorithm was used to extract composite phenotypes based on the sparse phenotypic network.
Five other clustering methods, including K-means clustering, partitioning around medoids (PAM), hierarchical clustering (Hclust), clustering large applications (CLARA) and divisive analysis clustering (DIANA), were also applied. The classification of phenotypes by CPA was consistent with prior knowledge and was used as a reference to evaluate the performance of other clustering algorithms. The results of other methods were aligned with the reference ( Figure S5C). The performance of CLARA was the best, with an NMI equal to 0.734.
Although it successfully captured the partial characteristics of composite phenotypes such as CP7, it was laborious to group the phenotypes and obtain reasonable classification.
At Step4, the composite phenotypes were linked to functional ageing metrics through linear regression. Other dimensionality reduction methods, including principal component analysis (PCA), canonical correlation analysis (CCA), partial least squares regression (PLS), nonnegative matrix factorization (NMF) and locally linear embedding (LLE), were also applied to construct the correlation between the composite phenotype and functional ageing metrics ( Figure S5D). The single phenotypes in CP7 were taken as an example in investigating the performance of these methods.
CP7, which was defined as a set of single phenotypes rather than a fixed latent variable, had the strongest correlation with these metrics. The components extracted by PLS, CCA and PCA were significantly correlated with disability, cognitive impairments and frailty. However, dimension reduction impaired the correlation compared with CPA.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated during and analysed during the current study are available from the corresponding author on reasonable request.