Discrimination for geographical origin of Panax quinquefolius L. using UPLC Q‐Orbitrap MS‐based metabolomics approach

Abstract American ginseng, Panax quinquefolius L., is an important medicinal plant with multiple pharmacological effects and high nutritional value. American ginseng from different geographical origins varies in quality and price. However, there was no approach for discriminating American ginseng from different geographical origins to date. In this study, a metabolomic method based on the UPLC–Orbitrap fusion platform was established to comprehensively determine and analyze metabolites of American ginseng from America and Canada, Heilongjiang, Jilin, Liaoning, and Shandong provinces in China. A total of 382 metabolites were detected, including 230 saponins, 30 amino acids and derivatives, 27 organic acids and derivatives, 25 lipids, 17 carbohydrates and derivatives, 10 phenols, 8 nucleotides, and derivatives, as well as 35 other metabolites. Metabolite differences between North America and Asia producing areas were more obvious than within Asia. Twenty metabolites, contributed most to the differentiation of producing areas, were identified as potential markers with prediction accuracy higher than 91%. The results provide new insights into the metabolite composition of American ginseng from different origins, which will help discriminate origins and promote quality control of American ginseng.


| INTRODUC TI ON
American ginseng (Panax quinquefolius L.), native to the eastern temperate forest areas of North America, was first discovered in Quebec, Canada. American ginseng began to be cultivated in China since the 1980s (McGraw et al., 2013). With over 40 years of development, China had become the third largest country for American ginseng cultivation and the current major producing areas are Heilongjiang, Jilin, Liaoning, and Shandong Provinces (Huang et al., 2013).
As a well-known medicinal plant, American ginseng is consumed as dietary supplements and functional food with a large market demand. It is famous for a wide range of pharmacological effects, such as anticancer, antioxidative, antiaging, antifatigue, and enhancement of memory and immunity effects (Cheong et al., 2014;Hwang et al., 2014;Kim et al., 2018;Kwok, 2011;Qi et al., 2010Qi et al., , 2011Riaz et al., 2019;Tan et al., 2013). Various chemical constituents, including ginsenosides, lipids, polysaccharides, organic acids, amino acids, phenolic acids, and vitamins, have been identified in American ginseng, which was responsible for multiple efficacy (Guo et al., 2015;Hou, 1977;Lin et al., 2019;Wang et al., 2015).

Medicinal plants of different origins varied significantly in bioac-
tive ingredient, directly affecting their quality (Liu et al., 2017). For example, Carthamus tinctorius L. from Hunan province had higher contents of hydroxysafflor yellow A and succinate and showed stronger antioxidant, anticoagulant, and cardiovascular protection effects than other provinces (Lu et al., 2019). Higher amounts of ophiopogonones were observed in the extract of Ophiopogon japonicus from Zhejiang, which showed a stronger antioxidant and antiinflammatory capacity than that from Sichuan (Zhao et al., 2017).
Meanwhile, it has been reported that there existed differences in the quality and ingredients of P. quinquefolius L. from different origins (Chen et al., 2022). However, the potential chemical markers to discriminate American ginseng from different geographical origins have not been reported to date. Moreover, American ginseng roots vary in price depending on geographical origins. Therefore, it is necessary to discriminate American ginseng roots from different origins in the world.
Metabolomics is an omics technology focused on the comprehensive profiling of small-molecular metabolites present in biological system through identification and quantification (Dudzik et al., 2018). As the final recipients of biological information, these metabolites can be considered as chemical phenotype of plants under specific environment (Gemperline et al., 2016). Therefore, metabolomics was widely used in the assessment of quality and geographical origin of plant products (Gika et al., 2014).
To find potential origin-dependent markers of American ginseng from five different producing areas, including UC (America and Canada), HLJ (Heilongjiang province), JL (Jilin province), LN (Liaoning province), and SD (Shandong province), we established a metabolomics approach based on UPLC Q-Orbitrap MS to comprehensively analyze the chemical composition of American ginseng. Furthermore, multivariate statistical tools were used to investigate the differences in metabolites that could contribute to the geographical origin of American ginseng. The results will provide more foundation and reference for evaluation and discrimination of American ginseng.

| Plant materials
The samples for the discrimination of geographical origin were taken from five producing areas which were the main cultivating regions of American ginseng (America and Canada, Heilongjiang, Jilin, Liaoning, and Shandong in China). Each producing area selected 3-6 sampling points. The sampling points were scattered in each producing area, which can represent the main regions of American ginseng products. The sampling principle and design are described in detail in Supplementary Information Section 1 ( Figure S1). China. Each sample's detailed information is given in Table S1.

| Sample preparation
The samples were washed and dried at 38°C until a constant weight was achieved, then main roots were separated and pulverized into powder, and passed through a 60-mesh sieve. A quantity of 0.1 g accurately weighed fine powder was extracted with 10 mL of 70% (v/v) methanol in an ultrasonic water bath for 60 min .
The supernatant of extract was filtered through a syringe filter (0.22 μm) before liquid chromatography mass spectrometry system analysis. In addition, a quality control (QC) sample was prepared by pooling a 100-μL aliquot of all test solutions.

| HPLC Q-Orbitrap MS analysis
Chromatographic separation was performed with Ultimate 3000 UHPLC system (Thermo Fisher Scientific), equipped with an ACQUITY Premier HSS T3 column (1.8 μm, 100 × 2.1 mm). The column oven was maintained at 35°C. The mobile phases consisted of 0.1% formic acid in water (A) and 0.1% formic acid in acetonitrile (B). The mobile phase system was run in the following gradient program: High-resolution MS data were recorded on an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). Spray voltages were set at 3.7 kV and 2.7 kV for positive and negative modes, respectively. Other source parameters were set as follows: Ion Transfer Tube temperature, 320°C; Vaporizer temperature, 320°C; Sheath gas, 40 arbitrary units; Auxiliary gas, 5 arbitrary units. The Orbitrap analyzer scanned over a mass range of m/z 85-1500 at a resolution of 60,000 for MS 1 scan and a resolution of 15,000 for HCD-MS 2 scan. The MS/MS product ions under mixed NCE (normalized collision energy) of 30%, 40%, and 55% were recorded to acquire more fragmentation information.

| Data processing and metabolite identification
The acquired mass data were imported to Compound Discoverer 3.1 (Thermo Fisher Scientific) for peak detection and alignment.
A data matrix, involving the information of the sample name, peak number (t R -m/z pair) and normalized peak area, was finally obtained.
Then, preprocessed data were imported into SIMCA-14.1 (Umetrics) for PCA and OPLS-DA analysis. Volcano plot was constructed on a cloud platform (https://cloud.metwa re.cn). Venn and heatmaps were generated by TBtools software (Chen et al., 2020). Statistical analyses were performed by one-way ANOVA and LSD method using IBM SPSS Statistics 26 software (p < .01).

| Metabolite profiling of American ginseng collected from different origins
A total of 3572 and 4009 spectra were detected from American ginseng in the negative and positive ion modes, respectively.
Altogether 382 metabolites were identified in all samples (Table S2) effectively eliminate irrelevant influences and screen for differential metabolites among comparison groups (Worley & Powers, 2016).
Recent works showed that it was an effective tool to determinate geographical origins of medicinal plants (Mais et al., 2018;Wang et al., 2020). As shown in the OPLS-DA score plot (Figure 2c), samples from five producing areas, including UC, HLJ, JL, LN, and SD, were easily classified. Data points with farther geographical distance were generally far away from each other in OPLS-DA score plot, suggesting that geographical distance positively related to variation degree among five groups. As a closely related species of P. quinquefolius L., Panax ginseng from Gangwon, Gaeseong, Punggi, Chungbuk, Jeonbuk, and Anseong in Korean could also be differentiated by metabolites based on LC-MS data , further demonstrating the impact of geographical location on metabolite constituents of Panax species. These geographical effects may be due to the varied climate and soil properties among different producing areas.

| Differential metabolites of American ginseng cultivated from five producing areas
The VIP value reflects the importance of each metabolite in the OPLS-DA model (Bylesjö et al., 2006). Firstly, the top three VIP metabolites were used to understand metabolite differences among five producing areas. Although there were certain differences in the contents of these three metabolites, they were indistinguishable among the five regions, making it difficult to clearly differentiate American ginseng from these regions ( Figure S8). Thus, OPLS-DA was further applied to evaluate the differences between each pairwise comparison. The R 2 Y and Q 2 Y values of the OPLS-DA models were greater than 0.95 in any pairwise comparisons, confirming that the models had good explaining and prediction ability. In the permutation test (200 times), the original R 2 Y and Q 2 Y are always substantially higher than the corresponding permutation values, indicating the models were not overfitting (Figure 3). The results showed that samples from different producing areas were clearly separated in each pairwise comparison, which could, thus, be used to further identify differential metabolites.
Based on fold-change (FC ≥2 or ≤0.5) and variable importance in projection (VIP≥1) score , 12 metabolites (10 upregulated, 2 downregulated) were identified as differential Overall, the number of differential metabolites for UC was more than in Chinese producing areas. With respect to Chinese producing areas, neighboring provinces had fewer number of differential metabolites compared to nonadjacent provinces. These results show the tendency that the farther geographical distance is, the more difference will be.
Venn diagram showed the shared differential metabolites of comparisons between a certain producing area versus the other four ( Figure 4K-O). A total of 65 shared differential metabolites were identified between UC and the other four producing areas, including 33 saponins, 9 lipids, 7 amino acids and derivatives, 6 nucleotides and derivatives, 3 organic acids and derivatives, 1 carbohydrate and derivatives, and 6 others (Table S3)

| Discovery of potential markers to distinguish American ginseng from different producing areas
Based on the results of HCA (Figure 5), only UC samples could clearly be separated from all other producing areas, while samples from four provinces in China could not be separated from each other. Therefore, potential markers to distinguish American ginseng between UC and China were screened in the first step. To find more influential F I G U R E 5 Heatmap hierarchical clustering analysis of shared differential metabolites. metabolites among shared differential metabolites between UC and China, OPLS-DA models were established for samples from the two farther producing areas (Figure 6a). Based on VIP ranking, PPD-(Glc-Glc)-Glc-Glc-Glc-Mal (M212), guanosine (M57), and guanine (M58) with top three VIP scores, were identified as the potential markers.
The relative content of these three markers in US samples was significantly lower than that of China (Figure 6b). Chen et al. (2022)  In the second step, potential markers were screened to distin- It was reported that the content of 20(S)-Ginsenoside Rg 2 in American ginseng from northeast is higher than that of Shandong province in China (Huang et al., 2013), which was in accordance with our observation. But it is interesting to note that 20(R)-Ginsenoside Rg 2 makes a more obvious distinction between the two regions in our study. Si (2021) screened 19 differential metabolites between American ginseng from Jilin and Shandong in China; however, none of them was differential metabolite in our study. To sum up, the study in this part provided 17 potential markers for distinguishing American ginseng from four producing areas in China and provided over 91% of correct predictions. As far as we know, the 17 potential markers have not been reported.

| CON CLUS ION
In this study, a metabolomic method based on the UPLC-Orbitrap fusion platform was established to comprehensively determine and analyze metabolites of American ginseng. Totally, 382 metabolites were identified in American ginseng from five producing areas. There are larger number of differential metabolites between UC and Chinese producing areas than within four Chinese producing areas. Twenty potential chemical markers were identified for the first time, which could effectively discriminate (ROC