A comparative UPLC‐Q‐Orbitrap‐MS untargeted metabolomics investigation of different parts of Clausena lansium (Lour.) Skeels

Abstract In this study, the non‐targeted large‐scale plant metabolomics (UPLC‐Q‐Orbitrap‐MS) was performed for the comparison of chemical profiling of the leaves, barks, flowers, peels, pulps, and seeds of Clausena lansium (Lour.) Skeels (called “wampee”). A total of 364 metabolites were identified, and 62 potential biomarkers were selected by the multivariate statistical analysis. Hierarchical cluster analysis suggested that the selected biomarkers were significant differential metabolites among various parts of wampee. Metabolic pathway analysis showed a significant enrichment of the “Flavone and flavonol synthesis” and “Isoquinoline alkaloid biosynthesis” pathway. This study provides important information for the isolation and identification of functional components from different tissues of wampee and the metabolic biosynthesis pathway elucidation in detail.

investigated recently. For instance, the polyphenol extracts from the leaves of wampee were proven to have antidiabetic and lipid-lowering effects on streptozotocin-induced type 2 diabetic rats (Kong, Su, Guo, Zeng, & Bi, 2018). ) and Chang, Ye, et al. (2018) also investigated the polyphenolics profile and antioxidant activity of wampee leaves during the development process.
Condensed tannins from the bark of wampee were reported to be capable of anti-α-glucosidase and antityrosinase (Chai et al., 2019).
In recent years, a number of carbazole alkaloids and furanocoumarins which exhibited hepatoprotective and antiproliferative activities were isolated from the stem barks and fruits of C. lansium (Adebajo et al., 2009;Du et al., 2015;Fu et al., 2018;Liu et al., 2019).
Notably, these alkaloids and coumarins exhibited impressive effects of anticancer and neuroprotection (Huang et al., 2016;Huang, Feng, Wang, & Lin, 2017;Iqbal et al., 2017;Ittiyavirah & Hameed, 2014;Liu et al., 2019). The seeds of wampee were also employed as folk medicine for the treatment of acute and chronic gastro-intestinal disorders (Shen et al., 2018). Amide alkaloids isolated from the seeds of C. lansium showed potent nematicidal activity against Panagrellus redivivus (Fan et al., 2018) and antifungal effect against Sclerotinia sclerotiorum . Comparative assessment of phytochemical profiles and antioxidant activities in fruits of five varieties of wampee was conducted to demonstrate the difference of total phenolics, flavonoids, and antioxidant activities among different cultivars Chang, Ye, et al., 2018). Therefore, different parts of C. lansium include diverse bioactive ingredients and exhibit various bio-functional values. However, to the best of our knowledge, the systematical analysis of the metabolites in different parts of C. lansium is limited that only the volatile components in leaf, pericarp, and seed of C. lansium have been characterized by GC-MS (He et al., 2018). The lack of comprehensive analysis of phytochemical constitute of wampee might hinder its utilization as food supplements or potential pharmaceuticals in modern medicine.
Nowadays, metabolomics has become an important technology for the phytochemical profiling of biological metabolites with the development of liquid chromatography coupled to mass spectrometry. Until now, over 100,000 metabolites from plants have been detected which might be less than 10% of the total (Trethewey, 2004).
Even though there is no single analytical method which can extract and identify all metabolites at one time, untargeted metabolomics aims to gather as many metabolites as possible (De Vos et al., 2007).
The objective of the present study was to comprehensively analyze the metabolomics of the leaves (CLL), barks (CLBa), flowers (CLF), peels (CLPe), pulps (CLPu), and seeds (CLS) of C. lansium through applying untargeted metabolomics approach using liquid chromatography tandem mass spectrometry (UPLC-Q-Orbitrap-MS). In order to compare the characteristic metabolites and identify the potential biomarkers, multivariate statistical analysis using bioinformatics tools was performed. A primary metabolic pathway analysis was also conducted to obtain putatively different metabolic biosynthesis pathway ascribing for the chemical composition of these six parts in wampee.

| Plant materials
The fruits, leaves, stems, and flowers of C. lansium were collected from the Wampee Resources Nursery of Institute of Fruit Tree Research, Guangdong Academy of Agricultural Sciences in Guangzhou, China. The pulps, peels, and seeds of the fruits were separated by a knife with caution and frozen in liquid nitrogen immediately. The barks were obtained from tender stems of C. lansium.
All the plant materials were firstly pre-treated with liquid nitrogen and then transferred to −80°C until extraction.

| Chemicals
HPLC-grade acetonitrile and methanol were purchased from Thermo Fisher Scientific Inc., and 2-chlorobenzalanine was obtained from Aladdin Reagent Co., Ltd. All other chemicals were analytical grade and used as received.

| Extraction
The extraction was conducted according to the method previously reported with minor modifications. Briefly, 200 mg samples were transferred into 5 ml tubes with five steel balls, and then they were placed into liquid nitrogen for 5 min before grind with a high flux organization grinding apparatus at 70 Hz for 1 min. Subsequently, 600 μl methanol (pre-cooled at −20°C) was added and the mixture was vibrated for 30 s. And then, the extraction was carried out by an ultrasonicator for 30 min at room temperature. Afterward, 750 μl chloroform (pre-cooled at −20°C) and 800 μl deionized water (4°C) was added into the tubes and shook for 60 s. After centrifugation at 13,523 g under 4°C for 10 min, the supernatant was obtained and lyophilized. The freeze-dried samples were dissolved by 250 μl of a mixture containing 4 mg/ml 2-chlorobenzalanine in methanol aqueous solution (1:1, 4℃). The dissolved samples were filtrated before detection by LC-MS. For the quality control (QC) samples, 20 µl of each prepare sample extract was mixed all together.

| LC-MS conditions
The chromatographic separation was accomplished in a Thermo Ultimate 3,000 system equipped with an ACQUITY UPLC HSS T3 (150 × 2.1 mm, 1.8 µm, Waters) column which maintained at 40°C.
The temperature of the autosampler was set at 8°C. Gradient elu-

| Mass data processing and multivariate data analyses
Raw LC-MS data were converted into mzXML format files via For multivariate statistical analysis, the XCMS output was further processed using Microsoft Excel (Microsoft), and the normalized data were imported into the Simca-P software version 11.0 (Umetrics AB, www.umetr ics.com/simca). All data were mean-centered and unit variance (UV)-scaled before PCA and PLS-DA applied in order to guard against overfitting. A default 7-fold (Leave-1/7th samples-Out) cross-validation procedure and 100 random permutations testing were carried out to guard against overfitting of supervised PLS-DA models. These discriminating metabolites were obtained using a statistically significant threshold of variable influence on projection (VIP > 1.0). Values were obtained from the PLS-DA model and were further validated via Student's t test (p < .05). The metabolites with VIP values above 1.0 and p values below .05 (threshold) were selected as discriminating metabolites between two classes of samples. Multivariate data analyses including principal components analysis (PCA), partial least squares discriminant analysis (PLS-DA), and Orthogonal PLS-DA (OPLS-DA) were conducted using the ropls R (version 3.3.2) package with methods described previously. (Thévenot, Roux, Xu, Ezan, & Junot, 2015) The ropls package is available from the Bioconductor repository (Gentleman et al., 2004).
Discriminating metabolites between 2 classes of samples were identified using a statistically significant threshold of Variable Importance in Projection (VIP) value (VIP ≥ 1), and further validated by one-way univariate analysis of variance (ANOVA) value (p ≤ .05).

| Heat map and KEGG annotation
Heat map was constructed using Euclidian distances and complete linkage grouping with the pheatmap package in R language (www.rproje ct.org), and the relative quantitative values of metabolites were normalized, transformed, and clustered through agglomerate hierarchical clustering. Metabolite correlation was assessed using Pearson Correlation Coefficient and constructed Cytoscape software (www. cytos cape.org). To further identify alternative metabolic pathways, differential metabolites were subjected to grouping and enrichment of metabolic pathway using MetaboAnalyst 4.0 software (www. metab oanal yst.ca) and KEGG database (www.kegg.jp). The identified differential metabolites were reacted to biochemical pathways according to the labeling in KEGG (http://www.kegg.jp/pathway).
Metabolic pathway enrichment and topological analysis were performed using the MetPA database (www.metab oanal yst.ca) to analyze metabolic pathways related to two different metabolites.

| Metabolites identification
In order to ensure the validation of results obtained from untargeted large-scale metabolomics, quality control (QC) and quality assurance (QA) were performed for the first place. The results of QC were illustrated in Figure S1 A,B which indicated that the extraction and detection of samples were stable. (Dunn et al., 2011) The ratio of characteristic peaks which the relative standard deviation (RSD) is <30% can reach about 70% (Figure S1 C,D), indicating that the data are reliable. (Want et al., 2010) After obtained, the information of m/z (mass to charge ratio), rt (retention time) and intensity, 15,635 and 18,064 precursor molecules were detected in positive mode and negative mode, respectively. Batch normalization was employed for all the data. In the present study, a total of 364 metabolites were identified from all the samples of wampee and the detailed information including their retention time, exact mass, molecular formula, Precursor type, match percentage, CAS number, and KEGG code was provided in Table S1. As shown in Table S1, there are 100 organic acids, 47 amino acids and derivates, 45 flavonoids, 28 nucleotides and derivates, 24 lipids, 19 alcohols, 17 amines, 15 carbohydrates, 13 vitamins and derivates, 11 alkaloids, 10 peptides, 5 coumarins, 4 aldehydes, 3 ketones, 2 indole derivatives, 1 terpene, and 20 other metabolites identified from different parts of C. lansium. Organic acids are the main constituents in C. lansium, followed with amino acids and flavonoids. Flavonoids were proved to be powerful radical quenchers in various systems.  Most of the flavonoids were firstly identified in C. lansium such as Procyanidin B2, malvidin 3-glucoside, isoquercitrin, astragalin, quercetin 3-arabinoside, taxifolin, sakuranetin, etc.. Procyanidin B2, and malvidin 3-glucoside mainly derived from grape seed and red wine were proved to exert protective effects against cardiovascular diseases (Bub, Watzl, Heeb, Rechkemmer, & Briviba, 2001;Li & Zhu, 2019). Isoquercitrin  (Liu, Laaksonen, Yang, Zhang, & Yang, 2020). Sinomenine, identified in this study, is a bioactive alkaloid which has been used as a treatment of rheumatoid diseases. (Zhang, Zhang, Zheng, & Tian, 2019) Vindoline, an indole alkaloid with anticancer activity, which is derived from Catharanthus roseus, was also detected. (Taher et al., 2019) Therefore, the global identification of metabolites in different parts of wampee may provide new insights for the understanding of the bioactivities of C. lansium.

| Biomarker probe
As illustrated in the base peak ion (BPI) chromatograms of different parts of wampee (Figure 1a,b), the differences between samples were evident as observed in both negative and positive ion modes.
In the current study, multivariate statistical analysis was carried out However, unsupervised analysis, such as PCA, cannot ignore intra-group errors, eliminate random errors that are irrelevant to the purpose of the study, and it pays too much attention to details which ignore the whole rules of the data, and ultimately, it cannot distinguish find differences and differential compounds between groups. In this case, it is necessary to apply supervised analysis, such as partial least squares-discrimination analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) for the further probe of biomarkers among different parts of wampee. In Figure 2a, plot of PLS-DA indicates the significant differences among different groups and good repetition of the data in individual groups. Nevertheless, overfitting is usually existed during the modeling of PLS-DA. The permutations test was employed for the evaluation of statistical significance of the model. As shown in Figure 2b, R2, and Q2 are similar, indicating that each of the subjects contributes equally and uniformly to the observed group separation (Wheelock & Wheelock, 2013). Hence, the modeling of PLS-DA utilized in this study is stable and reproducible.
OPLS-DA and S-plot were applied to different compared groups to identify biomarkers among different parts of wampee. The scatter score plot of OPLS-DA inferred from the comparison of all the samples in was presented in Figure 2c, and the detailed parameters of OPLS-DA for different compared groups were shown in Table 1. As can be seen from Figure 2c and Table 1, differences among CLL, CLF, CLBa, CLPu, CLS, and CLPe were significant and each model has high R2 and Q2 values suggesting that the models are reliable. Therefore, F I G U R E 2 Partial least squares-discriminate analysis (PLS-DA) of different parts of wampee in negative mode (a). Permutations plot of the PLS-DA model for the CLL versus CLF versus CLBa versus CLPu versus CLS versus CLPe (b). Orthogonal projections to latent structures discriminant analysis (OPLS-DA) of different parts of wampee in negative mode (c). (CLF, CLBa, CLL, CLPe, CLPu, and CLS represent the flowers, barks, leaves, peels, pulps, and seeds of wampee) the results of multivariate statistical analysis demonstrate significant differences in metabolites among different parts of wampee. The identification of significantly changing potential biomarkers was filtered by means of ANOVA p value ≤.05 and VIP ≥ 1. The information of the differential metabolites compared in various groups was offered in Table S2. In addition, S-plots were conducted for all the compared models (shown in Figure S2) to further probe the significant marker compounds. In Figure S2, the variables with significant differences will be plotted at the top right and the bottom left, and the ones with no significant differences will appeared in the middle of the plot. Under the help of the multivariate statistical analysis, as a result, a total of 62 potential biomarkers of different parts of wampee were selected and summarized in Table 2. The selected differential metabolites include 16 organic acid, 10 flavonoids, 9 amino acids, 6 lipids, 4 alkaloids, 4 nucleotides, 2 carbohydrates, 2 vitamins, 2 peptides, 2 alcohols, 1 terpene, 1 coumarin, and 3 others.

| Hierarchical cluster analysis
In order to visualize the relative contents of the differential me-

| KEGG annotation and metabolic pathway analysis
The metabolites in plant are usually synthesized through complex metabolic reactions under the help of different genes and proteins which form complex pathways and networks. And the secondary plant metabolites are determined by both the genus and the ; R2Y is the model interpretability (for Y variable dataset); Q2 is the percentage of model predictability, DM means differential metabolites selected by VIP > 1, and the detailed information of the DM was provided in Table S2. which contains information about chemicals, enzyme molecules, and enzyme reactions (Kanehisa & Goto, 2000). The KEGG database was employed to map and define the metabolic pathways of the differential metabolites in various comparison groups using A. thaliana as the library.
In this study, pathway analysis was carried out for all the 11 compared groups, and the detailed results were provided in Table S3. And a bubble plot with the most significant pathway marked was also presented in Figure 4. The impact factor in Figure 4 was defined as the number of metabolites mapped to a certain pathway versus to the total number of metabolites mapped to this pathway. and glutamate metabolism. In consequence, the "Flavone and flavonol synthesis," "Isoquinoline alkaloid biosynthesis," "Nicotinate and nicotinamide metabolism," "Phenylalanine metabolism," and "Alanine, aspartate and glutamate metabolism" have been identified as important metabolic pathways for the formation of tissue difference of metabolites in C. lansium. Furthermore, the KEGG annotation and metabolic pathway analysis (Figure 4, Table S3) are consistent with the result that there are 10 flavonoids, 9 amino acids, 4 alkaloids, and 4 nucleotides differential metabolites different parts of wampee.

| CONCLUSIONS
The comparative non-targeted large-scale plant metabolomics approach was carried out for the evaluation of the biomarkers of different tissues from C. lansium. To the best of our knowledge, this is F I G U R E 3 Heatmap of hierarchical clustering analysis of differential metabolites selected as the biomarkers in different parts of wampee. The abscissa indicates different groups labeled with different color for the main groups and numerically marked for the subgroups. The ordinate indicates the differential metabolites selected as the potential biomarkers in different parts of wampee. The bar at the right of the heat map represents relative expression values the first report about the metabolism of wampee, and a total of 364 metabolites were identified. The results of PLS, OPLS-DA, S-plot, and HCA suggested that the selected 62 chemicals as potential biomarkers can be utilized for the differentiation of the leaves, barks, flowers, pulps, peels, and seeds of wampee. Metabolic pathways of "Flavone and flavonol synthesis," "Isoquinoline alkaloid biosynthesis," "Nicotinate and nicotinamide metabolism," "Phenylalanine metabolism,," and "alanine, aspartate, and glutamate metabolism" are important for the synthesis of differential metabolites among the different tissues of C. lansium. Therefore, on one hand, this investigation could provide good basis for the isolation and identification of new constituents from C. lansium; on the other hand, further detailed elucidation of the biosynthesis of certain differential metabolites among various tissues of wampee could be conducted accordingly in the future.

ACK N OWLED G EM ENT
This work was financially supported by National Modern Agricultural Center, and the Special fund for scientific innovation strategy-construction of high level Academy of Agriculture Science (R2018QD-021). (CLF, CLBa, CLL, CLPe, CLPu, and CLS represent the flowers, barks, leaves, peels, pulps, and seeds of wampee) Each bubble in the plot represents a metabolic pathway whose abscissa and bubble size jointly indicate the magnitude of the impact factors of the pathway in the topological analysis. The bubble ordinates and colors represent the p values (negative natural logarithm, i.e., −log p-value) of the enrichment analysis, with darker colors showing a higher degree of enrichment. The most significant pathway was labeled