Integrated biomarker profiling of the metabolome associated with impaired fasting glucose and type 2 diabetes mellitus in large‐scale Chinese patients

Dear Editor, Type 2 diabetes mellitus (T2DM) is an important cause of diabetes complications and mortality.1 The prevalence of prediabetes including impaired fasting glucose (IFG) is approximately one-third of the population in China,2 but comprehensive early risk evaluation is weak. Thus, it is important to identify diagnostic biomarkers of prediabetes and T2DM and improve the disease risk prediction ability.We created the validated integrated biomarker profiling (IBP) related to the development of IFG and T2DM. Moreover, the established service website of the IBP implied potential clinical application. To construct the IBPs of IFG (fasting blood glucose (FBG), 6.1 ≤ FBG < 7.0 mmol/L) and T2DM (FBG ≥ 7.0 mmol/L), 1705 participants (BMI < 30) from five centers in China recruited randomly assigned into the discovery (n= 153), test (n= 420), and validation phases (n= 1132, 146 hyperlipidemia patients as an interference group, training set of 792 [69.96%], test set of 340 [30.04%]) (Figures 1A and B), which were homogeneous (Table 1). The nontargeted metabolomics analysis was performed in the discovery and test phases to identify potential biomarkers of IFG and T2DM; in the validation phase, the potential biomarkers were quantified based on targeted metabolomics. In the discovery, after peak pretreatment using the 80% rule (Figures S1A-S1G in Supporting Information),3 the quality control samples clustered together (Figures S2A-S2B), which indicated that the analysis was stable and reliable. Furthermore, there were significant differences in metabolites among the normal glucose tolerance (NGT), IFG, and T2DM groups (Figures S2A-S2F)with not overfitting (Figure S1H).After screening and identification, 31 and 42 biomarker candidates were identified in the fasted serum of IFG and T2DM patients, respectively (Tables S1-S2 in Supporting Information). In the test phase, P < .05 was regarded as significant within similar retention time ranges. Basis on the results of logistic regression (LR) and receiver operating characteristic


Integrated biomarker profiling of the metabolome associated with impaired fasting glucose and type 2 diabetes mellitus in large-scale Chinese patients
Dear Editor, Type 2 diabetes mellitus (T2DM) is an important cause of diabetes complications and mortality. 1 The prevalence of prediabetes including impaired fasting glucose (IFG) is approximately one-third of the population in China, 2 but comprehensive early risk evaluation is weak. Thus, it is important to identify diagnostic biomarkers of prediabetes and T2DM and improve the disease risk prediction ability. We created the validated integrated biomarker profiling (IBP) related to the development of IFG and T2DM. Moreover, the established service website of the IBP implied potential clinical application.
The nontargeted metabolomics analysis was performed in the discovery and test phases to identify potential biomarkers of IFG and T2DM; in the validation phase, the potential biomarkers were quantified based on targeted metabolomics. In the discovery, after peak pretreatment using the 80% rule (Figures S1A-S1G in Supporting Information), 3 the quality control samples clustered together ( Figures S2A-S2B), which indicated that the analysis was stable and reliable. Furthermore, there were significant differences in metabolites among the normal glucose tolerance (NGT), IFG, and T2DM groups (Figures S2A-S2F) with not overfitting ( Figure S1H). After screening and identification, 31 and 42 biomarker candidates were identified in the fasted serum of IFG and T2DM patients, respectively (Tables S1-S2 in Supporting Information). In the test phase, P < .05 was regarded as significant within similar retention time ranges. Basis on the results of logistic regression (LR) and receiver operating characteristic  (Tables S3-S4), 41 metabolites were regarded as the potential biomarkers of IFG or T2DM in multicenter (Table S5).
In the validation phase, the significant differences were found in the serum concentrations of the potential biomarkers ( Figure S3). The results of risk analysis showed that l-valine, l-leucine, and l-isoleucine appeared to be risk factors for IFG and T2DM; lysophosphatidylcholine (LPC, P-16:0) appeared to be associated with a lower risk of IFG and T2DM, which was consistent with the previous studies. [4][5][6] Oppositely, the concentration of lphenylalanine was lower in the serum of IFG and T2DM patients than in that of individuals with NGT, which was contrary to the reported findings (Table S6). 4 Unfortunately, ROC analysis showed that the single potential biomarker performed poorly for the diagnosis of NGT, IFG, T2DM, and hyperlipidemia ( Figure S4). Therefore, it is necessary to integrate multiple biomarkers to comprehensively reflect the occurrence and development of diabetes.
The differences in metabolites might mediate the occurrence and development of diabetes associated with insulin resistance and dysfunction of pancreatic islet β-cells, 7 which was manifested the disorder of glycerophospholipid metabolism and amino acid metabolic pathways ( Figure  S5). With the Krebs cycle as the hub, 4,5,7 there was an interrelation among the metabolic pathways ( Figure 2). Thus, the risk biomarkers of diabetes should be regarded as a whole biological event associated with biological network. Therefore, the construction of IBP to assess diabetes risk from the perspective of multiple biomarkers that related to the occurrence and development of IFG and T2DM provides a potential biological mechanism basis. 8 Simply considering multiple nonquantitative biomarkers as a single biomarker ignored the overlap of individuals with and without the incidence of disease, which compromised their discriminatory ability. 9

F I G U R E 1 Study design. (A)
The three-step analysis strategy. Nontargeted metabolomics in the discovery and test phases was performed to identify and validate potential biomarkers. In the validation phase, the potential biomarkers were screened using Gini impurity to construct the integrated biomarker profilings of IFG and T2DM based on the eXtreme Gradient Boosting model, and individuals with hyperlipidemia were set as an interference group to evaluate the prediction accuracy of the integrated biomarker profiling. (B) The overview of study design. In the discovery phase, 153 subjects were enrolled to screen biomarker candidates of IFG and T2DM; 420 subjects were recruited to test the biomarker candidates in the test phase; in the validation phase, an independent training set of 792 subjects was used to construct the IBP prediction model for NGT, IFG, T2DM, and hyperlipidemia. Then, the IBP prediction models were evaluated with a test set of 340 subjects. Abbreviations: IFG, impaired fasting glucose; LLOQ, low limit of quantification; LPC, lysophosphatidylcholine; NGT, normal glucose tolerance; ROC, receiver operating characteristic curve; T2DM, type 2 diabetes mellitus; UHPLC-Q-Orbitrap-HRMS, ultra-high performance liquid chromatography Q Exactive-Orbitrap high-resolution mass spectrometer; UHPLC-TSQ-Altis QQQ MS, ultra-high performance liquid chromatography TSQ Altis triple quadrupole mass spectrometer    Figure 3A). 10 Further, Gini impurity was used to select 10 biomarkers that have better prediction ability of IFG and T2DM disease risk from targeted 16 potential biomarkers to construct IBP (XGBAUC = 0.823, Figures 3B and  C), which consisted of LPC (P-16:0), l-isoleucine, larginine, l-carnitine, l-phenylalanine, l-glutamic acid, l-lysine, l-methionine, l-leucine, and acetyl-l-carnitine ( Figure 3D). The predicted performance of the IBP was satisfactory. In the discovery phase, the prediction accuracy of the IBP for discrimination of NGT, IFG, and T2DM was 96% with high sensitivity and specificity. The AUC values of the IBP in the discrimination of IFG and NGT, T2DM and NGT, T2DM and IFG, and T2DM and hyperlipidemia were 0.804, 0.936, 0.823, and 0.937, respectively ( Figure S6). Moreover, the AUC value of the IBP for discrimination of NGT, IFG, and T2DM was 0.828 in the test phase (Table  S7). The IBP showed that the concentrations of 10 potential biomarkers were different in NGT, IFG, and T2DM (Figure 4). An unknown and random sample might belong to the group that has the highest predictive value in NGT, IFG, T2DM, and hyperlipidemia groups based on the XGBoost model. For examples, if the predictive values of unknown sample 1 in NGT, IFG, T2DM, and hyperlipidemia group were 0.092, 0.676, 0.139, and 0.093, respectively ( Figure 4D), which implied that it might be a patient with IFG ( Figure 4C). The predictions for representative samples in the other groups were shown in Figure 4. Moreover, we established a website of the IBPs for IFG and T2DM for the first time (http://pdm.lin-group.cn/) that can further improve the potential clinical public service ability of this study.
In conclusion, based on large sample data from the clinical real world, through metabolomics and machine learning methods, we have established the IBPs of IFG and T2DM and its public service website related to the occurrence and development of prediabetes and T2DM, which could avoid the use of multiple biomarkers that could confuse the interpretation of the results, reduce the impact of information fluctuation of single or isolated biomarker on the overall evaluation efficiency, and improve the auxiliary evaluation ability of the potential biomarkers for clinical diseases.

A C K N O W L E D G M E N T S
The authors are thankful to the all participants for their overwhelming support and dedication to this project. This work was supported by the National Natural Science

E T H I C S A P P R O VA L A N D C O N S E N T T O PA R T I C I PAT E
The study was approved by the Scientific Research Ethics Committee of Capital Medical University Affiliated Beijing Shijitan Hospital (2017-035) and registered on the Chinese Clinical Trial Registry (ChiCTR1800014301). All the participants provided their written informed consent.

C O N S E N T F O R P U B L I C AT I O N
Contents for publication were obtained from all patients.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare that they have no competing interests.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.

A U T H O R C O N T R I B U T I O N S
DY conceived and proposed the research topic. JL and ZY designed the study protocol, which was edited by DY. JL and ZY collected samples. LK, HC, SD, QQ, NZ, XF, SY, JT, SC, YH, TJ, ZW, NQ, and KD contributed to sample collection. JL, QJ, LL, and ZS performed sample detection. HY and HL created the machine learning-based classification algorithm. JL, HY, ZS, and DY participated in statistical analysis. JL, ZY, HY, HL, and DY contributed to writing, revising, and proofreading the manuscript. All authors read and approved the final manuscript.