Metabolome‐Genome‐Wide Association Study (mGWAS) Reveals Novel Metabolites Associated with Future Type 2 Diabetes Risk and Susceptibility Loci in a Case‐Control Study in a Chinese Prospective Cohort

Abstract In a Chinese prospective cohort, 500 patients with new‐onset type 2 diabetes (T2D) within 4.61 years and 500 matched healthy participants are selected as case and control groups, and randomized into discovery and validation sets to discover the metabolite changes before T2D onset and the related diabetogenic loci. A serum metabolomics analysis reveals that 81 metabolites changed significantly before T2D onset. Based on binary logistic regression, eight metabolites are defined as a biomarker panel for T2D prediction. Pipecolinic acid, carnitine C14:0, epinephrine and phosphatidylethanolamine 34:2 are first found associated with future T2D. The addition of the biomarker panel to the clinical markers (BMI, triglycerides, and fasting glucose) significantly improves the predictive ability in the discovery and validation sets, respectively. By associating metabolomics with genomics, a significant correlation (p < 5.0 × 10−8) between eicosatetraenoic acid and the FADS1 (rs174559) gene is observed, and suggestive correlations (p < 5.0 × 10−6) between pipecolinic acid and CHRM3 (rs535514), and leucine/isoleucine and WWOX (rs72487966) are discovered. Elevated leucine/isoleucine levels increased the risk of T2D. In conclusion, multiple metabolic dysregulations are observed to occur before T2D onset, and the new biomarker panel can help to predict T2D risk.

lipids and eight clinical parameters improved the predictive ability of the clinical parameters from 0.710 to 0.781 in the discovery set and from 0.698 to 0.722 in the validation set. [5] T2D is a metabolic disease affected by multiple genes and has a strong genetic predisposition. Therefore, the discovery of genetic loci associated with T2D may help understand the pathogenesis. Recently, a metabolome-genome-wide association study (mGWAS) was used to verify known gene loci and identify new related gene loci. In a nested case-control study of a Korean population, nine metabolites were found to be significantly related to the occurrence of diabetes. [6] These T2D-related metabolites were used as intermediate molecular phenotypes in mGWAS to link genes and incident T2D. CPS1 (rs1047883), which is significantly associated with glycine, was shown to be related to the risk of T2D. The newly discovered genetic loci and their relationships with metabolites can help understand the complex pathogenesis of T2D. However, to the best of our knowledge, no mGWAS of T2D has been conducted in Chinese prospective nested case-control cohorts so far.
In this work, a prospective nested case-control study was conducted in the Dongfeng-Tongji cohort [7] of the Chinese population. The case group of this study consisted of 500 participants who developed T2D after an average follow-up of 4.61 ± 0.15 years, and each case was matched with a nondiabetic participant in the cohort to form the control group. Serum untargeted metabolomics analysis was performed to identify significantly differential metabolites before the onset of T2D, and a new combinational metabolite marker panel was established to predict T2D in advance. Then, metabolomics and genomics were integrated to reveal possible diabetogenic loci and causal relationships between metabolites and T2D.

Baseline Characteristics
Baseline characteristics were compared between the case group and control group in both the discovery and validation sets ( Table 1). As shown in Table 1, there was no significant difference in age, sex, smoking status, drinking status, or physical activity in the discovery set between the case and control groups. The case group had a significantly higher body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), triglyceride (TG) level, and fasting glucose (FG) level and a lower high-density lipoprotein (HDL) cholesterol level than the control group (p < 0.05, false positive rate (FDR) < 0.1). The odds ratios (ORs) per standard deviation (SD) increment in clinical parameters were calculated by Cox regression (Table S1, Supporting Information). For the significantly different clinical parameters, the risk of T2D was significantly positively correlated with baseline BMI, SBP, DBP, TG, and FG levels (the ORs per SD increment were greater than 1) and was significantly negatively correlated with HDL level (the OR per SD increment was less than 1). For the validation cohort, the BMI, TG, and FG levels were significantly higher in the participants in the case group than in those in the control group, and these parameters were significantly positively correlated with the risk of T2D.

Significantly Changed Metabolites and Pathways Related to T2D Onset
Serum samples of the discovery set were analyzed by a liquid chromatography-mass spectrometry (LC-MS)-based metabolomics approach. A total of 206 metabolites were identified according to our in-house database. [8] Among these identified metabolites, 94.7% of the metabolites had a relative standard deviation (RSD) value less than 30% among quality control (QC) samples in the discovery set ( Figure S1A, Supporting Information) and were used for the following data analysis. Nonparametric tests were conducted to define significantly changed metabolites. Figure 1A shows the changes of 81 significantly differential metabolites whose p values were below 0.05 and FDR values were below 0.1. To study the relationship between differential metabolites and future T2D risk, the ORs per SD increment were calculated ( Figure 1B). The multivariable-adjusted ORs were adjusted by matched factors (age, sex), obesity (BMI) and lifestyle factors (smoking status, drinking status and physical activity). As shown in Figure 1A,B, the levels of acylcarnitines, branched-chain amino acids, aromatic amino acids, free fatty acids (FFAs), phosphatidylcholines (PCs) and phosphatidylethanolamines (PEs) significantly increased before T2D occurred, and those metabolites were significantly positively correlated with the risk of future T2D (values of ORs greater than 1). Most acylcarnitines and fatty acids remained significant after multivariable adjustment. On the other hand, the levels of most lysophosphatidylethanolamines (LPEs) and lysophosphatidylcholines (LPCs) decreased, and they were significantly negatively correlated with the risk of future T2D (values of ORs less than 1). In addition, the results of pathway analysis based on significantly changed metabolites are shown in Figure 1C. Aminoacyl-tRNA biosynthesis (p < 0.001, FDR = 0.002) and biosynthesis of unsaturated fatty acids (p < 0.001, FDR = 0.002) were the most significantly altered pathways before T2D onset.

Future Risk-Related Metabolites and Metabolic Markers for T2D Prediction
To distinguish the control group from the case group, we performed binary logistic regression on the 81 metabolites that were significantly different between the case group and control group in the discovery set. A combinational metabolic biomarker panel containing eight metabolites was established, including pipecolinic acid, 1,5-anhydro-D-glucitol, LPC 18:2, carnitine C14:0, leucine/isoleucine, eicosatetraenoic acid (FFA 20:4), epinephrine and PE 34:2. For convenience, they are named combination metabolite biomarker (CMB). The change trends of these eight metabolites are shown in Figure 1A, and the distribution of ORs is shown in Figure 1B and In addition, binary logistic regression was also performed on significantly different clinical parameters in the discovery set. BMI, TG, and FG were defined as a combinational clinical biomarker panel (CCB). The ORs per SD increase in T2D were 3.028 (95% CI 2.332-3.933, p < 0.001) for CCB, 2.243 (95% CI 1.808-2.782, p < 0.001) for CMB and 3.445 (95% CI 2.604-4.558, p < 0.001) for the combination model (CMB + CCB) ( Figure 1D). The areas under the receiver operating characteristic curves (AUCs) of CMB and CCB were 0.708 and 0.755, respectively. Adding CMB to CCB significantly improved the predictive ability (AUC = 0.773, p = 0.031) ( Figure 1E).
To validate these three predictive models, 208 pairs of cases and matched control samples were used as the validation set. Serum samples in the validation set were analyzed by the same LC-MS-based metabolomics approach, and 89.8% of the 206 identified metabolites had an RSD value of less than 30% in the QC samples ( Figure S1B, Supporting Information). The ORs per SD increase in CMB, CCB and their combination were 1.832 (95% CI 1.475-2.276), 2.742 (95% CI 2.090-3.599) and 3.150 (95% CI 2.338-4.244), respectively, and all p values of the ORs were less than 0.001 ( Figure 1D). The AUCs of CMB and CCB were 0.687 and 0.772, respectively ( Figure 1F). Adding CMB to CCB significantly improved the predictive ability (AUC = 0.797, p = 0.017).

Susceptibility Loci and Causal Relationships
On the basis of the above study, we conducted mGWAS between significantly changed metabolites at baseline and SNPs to find the potential locus related to incident T2D. The GWAS results of FFA 20:4 are shown in Figure 2. Figure 2A is the Manhattan plot showing the correlation between loci on each chromosome and FFA 20:4. Detailed information on the p values is given in Table S3 (Supporting Information), and 13 loci were found to be related to FFA 20:4 at the suggestive or significant genome-wide thresholds. The Q-Q plots are shown in Figure S2   The multivariable-adjusted ORs were adjusted by age, sex, BMI, smoking status, drinking status, and physical activity. C) Pathway analysis based on significantly changed metabolites before diabetes onset. D) ORs per SD increment in predictive model scores of T2D. ROC curves of the discovery set E) and validation set F). CMB consisted of eight metabolites, and CCB consisted of BMI, TG and FG. The combination was composed of CMB and CCB.
The GWAS results of pipecolinic acid and leucine/isoleucine are also shown in Figure 2. Specific information on important SNPs related to pipecolinic acid and leucine/isoleucine at the suggestive or significant genome-wide thresholds is summarized in Table S3 (Supporting Information). The rs535514 SNP on chromosome 1 is in the region of Cholinergic Receptor Muscarinic 3 (CHRM3) gene, which is related to insulin secretion. [9] As shown in Figure 2D, allele C of rs535514 is related to an increased pipecolinic acid level (p = 5.40 × 10 −7 , β = 0.157, MAF = 0.42). Rs72487966 (SNP on chromosome 16) is located in the WW domain containing oxidoreductase (WWOX) gene region, which is a risk gene for T2D. [10] As shown in Figure 2F, allele G of rs72487966 is related to an increased leucine/ isoleucine level (p = 2.92 × 10 −6 , β = 0.411, MAF = 0.03). We also attempted to conduct two-sample Mendelian randomization (MR) analysis to evaluate whether there are causal relationships between pipecolinic acid or leucine/isoleucine and T2D. Based on the meta-analyses using summary statistics of the Dongfeng-Tongji cohort, and the publicly available results of the KORA cohort and the Twins UK cohort, [11] two-sample MR analysis of leucine/isoleucine levels was conducted, and the OR per SD genetically predicted difference was scaled using an inverse-variance-weighted method. As a result, the relationship between leucine/isoleucine and T2D was proven to be causal (OR = 0.004, 95% CI 1.001-1.006, p = 0.001).

Discussion
In this nested Chinese prospective cohort, an untargeted ultra-performance liquid chromatography-mass spectrometry (UPLC-MS)-based metabolic profiling of serum samples was performed and integrated with genomics. To the best of our knowledge, this is one of the most comprehensive metabolomics studies conducted on Chinese participants without diabetes at baseline to reveal the significant metabolic changes before the onset of T2D and predict the risk of future T2D. According to our results, 206 metabolites were identified. Among them, 81 metabolites, including carnitines, amino acids, fatty acids, organic acids, and lipids were significantly changed before T2D onset. Eight metabolites related to future T2D risk were defined as a CMB for T2D prediction. Among them, four metabolites (LPC 18:2, 1,5-anhydro-D-glucitol, FFA 20:4 and leucine/isoleucine) were reported in other cohort studies [12][13][14][15] and validated in our Chinese prospective cohort study. Four of them (pipecolinic acid, carnitine C14:0, epinephrine and PE 34:2) were first reported to be significantly associated with future T2D risk in a prospective cohort study. The AUCs of CMB were 0.708 and 0.687 in the discovery and validation sets, respectively, indicating that CMB had an acceptable predictive ability for the onset of T2D. The combination of CMB and CCB significantly improved the predictive ability of CCB from 0.755 to 0.773 in the discovery set and from 0.772 to 0.797 in the validation set.
In this study, before the onset of T2D, the levels of most LPEs and LPCs were decreased, while the levels of PCs and PEs were increased in the case group ( Figure 1A). LPCs and LPEs are primarily derived from PCs and PEs by the action of phospholipase A. [16] It was reported that the levels of phospholipids were significantly changed in diabetic patients. [17] The significant changes of those phospholipids in our study hint that abnormal phospholipid metabolism appears before the onset of T2D. It is worth mentioning that as a member of CMB, a lower level of LPC 18:2 was found to be associated with future T2D risk in a European cohort, [12] which was consistent with our results in the Chinese cohort. Furthermore, PE 34:0 was first reported to be associated with future T2D risk, and defined as a discriminant marker in our study.
FFAs are important energy substances that can be transferred into mitochondria for β-oxidation with the help of carnitine. In our results, the levels of FFAs and acylcarnitines were significantly increased before T2D occurred. Most of them were significantly positively correlated with the development of T2D (values of ORs greater than 1 and values of multivariable-adjusted p values less than 0.05). Among them, FFA 20:4 and carnitine C14:0 were also defined by binary logistic regression as predictors of T2D. According to published studies, the levels of FFAs and acylcarnitines are significantly increased in T2D patients. [18,19] FFA 20:4 was reported to be positively associated with future T2D in a French prospective cohort. [13] Baseline carnitine C14:0 was also analyzed in another Chinese prospective cohort of T2D, but neither significant differences between the case group and the control group nor a significant relationship with the risk of future T2D was found. [20] Furthermore, we calculated the ratio of carnitine C4:0/carnitine C3:0, which can indicate the activity of short-chain acyl-CoA dehydrogenase (SCAD). [21] SCAD is the rate-limiting enzyme of electron transport, which is the initial reaction in mitochondrial β-oxidation. [22] In our case group, the ratio of carnitine C4:0/carnitine C3:0 was significantly increased (p < 0.001), indicating the activity of SCAD was decreased. This demonstrated that before the occurrence of T2D, β-oxidation of fatty acids may have been impaired. Notably, it was reported that incomplete fatty acid β-oxidation can contribute to insulin resistance, which plays an important role in the pathogenesis of T2D. [23] Furthermore, according to our mGWAS results, the most relevant SNP of FFA 20:4 was rs174559 (SNP on chromosome 11), which is in the region of FADS1 gene (Figure 2A,B). Some other SNPs mapping to the FADS1 gene were reported to be significantly associated with FFA 20:4. [24] FADS1 encodes an important enzyme in the metabolism of unsaturated fatty acids. [25] According to the pathway analysis in this study ( Figure 1C), a significant disturbance in the biosynthesis of unsaturated fatty acids existed before the onset of T2D. FADS1 was also reported to be associated with an increased risk of T2D in a Japanese study. [26] Pipecolinic acid was decreased in the case group in our study and significantly negatively associated with future T2D risk. Pipecolinic acid was also analyzed in a prospective cohort of T2D patients in a Mediterranean population, but no significant association between the baseline level of pipecolinic acid and T2D risk was found (p > 0.05). [27] This is the first time that the baseline level of pipecolinic acid was found to be significantly related to future T2D risk. Furthermore, rs535514 was related to the pipecolinic acid level based on the mGWAS results, and rs535514 was in the regions of the CHRM3 gene ( Figure 2D). The CHRM3 gene encodes the cholinergic receptor muscarinic 3, which can interfere with the control of the hypothalamus to regulate appetite. Notably, an association between CHRM3 and the risk of early occurrence of T2D was found in a Pima Indian population. [28] CHRM3 is a gene related to the stimulation of insulin secretion. [9] Together with the cholinergic receptor muscarinic 3, acetylcholine is capable of regulating insulin secretion, [9] controlling islet cell proliferation and maintaining the glucose levels. [29] The levels of amino acids in serum were significantly increased and positively related to future T2D, including branched-chain amino acids (leucine/isoleucine), aromatic amino acids (phenylalanine and tyrosine), lysine and methionine. According to previous studies, [30,31] amino acid metabolism is closely related to T2D. For example, aromatic amino acids and branched-chain amino acids were reported to be correlated with insulin resistance. [30] Furthermore, high levels of branched amino acids were confirmed as risk factors for incident diabetes in a Japanese cohort. [14] Branched amino acids were also chosen as predictors of future diabetes. [31] In our work, leucine/isoleucine was significantly positively related to future T2D risk and selected as a marker for predicting the occurrence of diabetes, which is consistent with the previously mentioned article. According to the mGWAS results, a SNP (rs72487966) in the WWOX gene was related to leucine/isoleucine ( Figure 2F). The WWOX gene is related to HOMA-β, [32] can regulate glucose metabolism [33] and was found to be a susceptibility gene for diabetes. [10] Moreover, according to the results of two-sample MR analysis, the relationship between leucine/isoleucine and T2D was causal, and the elevated concentration of leucine/isoleucine increased the risk of T2D, which has been mutually confirmed with other published cohort studies. [11] This indicates that these metabolic pathways that produce leucine/isoleucine may contribute to the development of T2D.
To the best of our knowledge, this study reports correlations between pipecolinic acid and CHRM3, and between leucine/ isoleucine and WWOX for the first time. The new association might provide new insights into the pathogenesis of T2D. However, the correlations were suggestive, and more validation studies are needed.
Epinephrine and 1,5-anhydro-D-glucitol were two of the metabolites that made up the CMB in our study. Epinephrine can increase endogenous glucose production. [34] The lower 1,5-anhydro-D-glucitol level has been clinically used as a marker of the increase in blood glucose levels. [35] According to our results, a significant increase in epinephrine and a significant decrease in 1,5-anhydro-D-glucitol preexisted before the occurrence of T2D (Figure 1). A prospective study in Shanghai, China, reported that 1,5-anhydro-D-glucitol was negatively related to the risk of future T2D. [15] However, to the best of our knowledge, this is the first report of a positive association between adrenaline and future T2D risk.
However, some limitations existed in our study. The participants were all Chinese individuals from Wuhan and were relatively old, so more research is needed to verify whether our research results are applicable to other age groups and other ethnic groups. In addition, the results of this study should be independently replicated in cohorts from different centers for verification.
In summary, we discovered metabolic changes that occurred prior to the onset of T2D and eight risk-related metabolites that can be used to predict future T2D case in advance. Four metabolites among them were reported for the first time to be associated with future diabetes risk. The FADS1, CHRM3 and WWOX genes were found to be associated with T2D-related metabolites. Furthermore, the relationship between increased levels of leucine/isoleucine and an increased risk of T2D was proven to be causal. Although further biofunctional validation studies are needed, our findings complement the current understanding of diabetes onset and can provide a basis for further mechanistic studies.

Experimental Section
Study Design and Participants: The flowchart of the study is shown in Figure 3. Participants were from the Dongfeng-Tongji cohort. [7] All participants were retired employees of Dongfeng Motor Corporation with a mean age of 63 years. From September 2008 to June 2010, baseline serum samples were collected from 27009 individuals. All participants completed a lifestyle questionnaire and underwent a physical examination. In 2013, 25978 participants underwent a follow-up examination, which accounted for 96.2% of all participants. Written informed consent was provided by all participants involved in this study, and approval of the research protocol (No. 2012(10)) was provided by the Ethics and Human Subjects Committee of Tongji Medical College, Huazhong University of Science & Technology.
After 4.61 ± 0.15 years of follow-up, a total of 1039 eligible participants developed incidents in the Dongfeng-Tongji cohort, among whom 500 new-onset T2D subjects, excluding participants with cardiovascular disease or cancer, were selected as the case group. T2D was diagnosed if any of the following three criteria were met: [36] 1) fasting blood glucose level higher than 7.0 mmol L −1 ; 2) hemoglobin A1c level higher than 6.5%; and 3) self-reported diagnosis of T2D. Healthy subjects (n = 500) matched 1:1 by age and sex at baseline were selected as the control group. A total of 292 pairs of cases and matched control samples were randomly selected as the discovery set. The remaining 208 pairs of case and matched control samples were used as the validation set.
Untargeted UPLC-MS-based metabolic profiling of baseline fasting serum samples from the 1000 participants was performed. A Waters ACQUITY UPLC system (Waters Corp., Milford, MA, USA) was used for the separation of metabolites. For positive ion mode analysis, an ACQUITY UPLC BEH C8 (2.1 mm × 50 mm, 1.7 µm particle size, Waters Corp.) column and a TripleTOF 5600 mass spectrometer (AB SCIEX, Framingham, MA, USA) were used. For the negative ion mode analysis, an ACQUITY UPLC BEH T3 (2.1 mm × 50 mm, 1.8 µm particle size, Waters Corp.) column and a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) were used. The UPLC-MS method was specifically designed for large-scale metabolomics studies to assure the robustness and repeatability. [37] Differential metabolites between the groups with or without the onset of diabetes were identified in the discovery set, and the metabolites used as the CMB were selected from them. The optimal clinical parameters were selected as the CCB and combined with the CMB to create as a combination predictive model. The predictive ability of each model for T2D onset was assessed by the AUC. In addition, the predictive abilities of the three models were further evaluated in the validation set. Genotyping of SNPs of the participants was performed. Furthermore, a metabolome-genome-wide association study was performed to find associations between T2D-related metabolites and gene loci. Mendelian randomization (MR) analysis was conducted to scale possible causal relationships.
Statistical Analysis: Detailed information on the preparation of serum samples and the LC-MS methods for metabolomics analysis are shown in the Supporting Information. According to the in-house database, [8] metabolites were identified by retention time, m/z value and MS/MS information. The peak areas of metabolites were adjusted by internal standards. The identified metabolites whose RSD below 30% in QCs were used for subsequent data analysis. Nonparametric tests were conducted by Multi-Experiment Viewer software (version MEV 4.7.4) [38] to identify significantly changed metabolites and clinical parameters (p < 0.05, FDR < 0.1) between the case and control groups. Pathway analysis of significantly changed metabolites was conducted by using MetaboAnalyst 4.0 [39] (Xia Lab, McGill University, Montreal, QC, Canada; https://www.metaboanalyst.ca/). All of the following analyses were conducted by IBM SPSS Statistics for Windows (version 25.0, IBM Corp, Armonk, New York, USA). Odds ratios and 95% confidence intervals were calculated by the Cox regression model. Binary logistic regressions were carried out to define clinical biomarkers and metabolic biomarkers. Receiver operating characteristic (ROC) www.global-challenges.com curves were plotted to show the ability to differentiate between the controls and cases. Detailed information on the genotyping of SNPs of participants and genotype-metabolite association analysis is shown in the Supporting Information. As a result, ≈81 million SNP markers were obtained. Genotype-metabolite association tests were performed by an SE-weighted meta-analysis with METAL (https://genome.sph.umich. edu/wiki/METAL_Documentation). The mendelian randomization-based website (http://www.mrbase.org/) was used to conduct two-sample MR analysis.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.