A novel strategy of “pick the best of the best” for the nondestructive identification of Poria cocos based on near‐infrared spectroscopy

Abstract In this paper, a novel strategy of “pick the best of the best” was proposed for the nondestructive identification of different‐origin and adulterated Poria cocos with near‐infrared spectroscopy. First, various preprocessing methods were divided into three classes: baseline correction, scattering and trend correction, and scaling. The single preprocessing methods with the best predictions in each class were selected. Then, the selected preprocessing methods were combined in pairs according to three classes. The pair combination preprocessing methods with the best predictions and also better predictions than single methods were selected. Finally, the selected pair combination preprocessing method was combined with the methods in the unselected class. The three combination preprocessing methods with the best predictions and also better predictions than pair combination methods were selected as the final prediction. With this strategy, the optimized preprocessing combination can be obtained quickly, and the identification accuracy with principal component analysis method can be greatly improved. 0% identification accuracy of adulterated samples and 12.5% identification accuracy of different‐origin samples were obtained with the raw data. However, 100% accuracy of adulterated samples, 93.8% accuracy of calibration dataset, and 75% accuracy of validation dataset can be obtained with the novel strategy. The developed technology can be regarded as a simple, rapid, and accurate nondestructive identification method for different‐origin and adulterated samples, and has a broad application prospect in the future.

pharmacological studies have shown that Poria cocos has the effects of liver protection, immune regulation, anti-tumor, anti-oxidation, anti-inflammatory, and anti-virus (Khan et al., 2018;Lee et al., 2018;Miao et al., 2016). The active substances of Poria cocos are easily affected by planting region. The research shows that the chemical components of Poria cocos from different producing areas are not same, while the efficacy and drug properties are also different (Y. Li et al., 2016;Wang et al., 2013). However, the difference of Poria cocos from different producing areas is not significant in appearance and is difficult to be identified by laymen. In recent years, some illegal businessmen used starch as Poria cocos powder or mixed starch into Poria cocos powder to seek illegal interests (Chang et al., 2019). However, it is not easy to distinguish Poria cocos powder from adulterated samples. Various analytical technologies have been used for the analysis of Poria cocos samples (K. Li et al., 2013; Y. Li et al., 2016;Zhu et al., 2018). For example, ultra-high performance liquid chromatography-ultraviolet-tandem mass spectrometry (UPLC-UV-MS) was applied for the analysis of the fingerprint of triterpenoid constituents in Poria cocos (K. Li et al., 2013). A quality assessment system comprised of a tandem technique of ultraviolet spectroscopy and ultra-fast liquid chromatography (UFLC) was applied for the analysis of Poria cocos in different regions of Yunnan (Y. Li et al., 2016). The result showed that the composition and content of polysaccharides and triterpenoids in Poria cocos from different areas are different. In addition, the content of polysaccharides in different medicinal parts is also different. However, most of these methods are destructive, time-consuming, and laborious. Besides, the research on the identification of Poria cocos adulteration is basically in the blank. Therefore, it is urgent to develop a rapid and nondestructive method for the identification of Poria cocos from different producing areas and adulterated samples.
Near-infrared spectroscopy (NIRS) technology has been widely used in the rapid and nondestructive analysis of agricultural (Tardaguila et al., 2017), food (P. Li et al., 2020;P. Li et al., 2019) and pharmaceutical samples (P. Li et al., 2012), which is based on overtones and combinations of fundamental vibrations from the H-containing group. However, due to the complexity of the samples, the useful information is usually carried by broad spectral peaks. Besides, the spectra are often disturbed by stray light, noise, and baseline drift interferences. In order to eliminate the background and noise interference in the spectra, a large number of spectral preprocessing methods have been proposed before modeling, including de-bias correction, detrend (DT), maximum and minimum normalization (MinMax), standard normal variate (SNV) transformation, multiplicative scatter correction (MSC), first-order derivative (1st Der), second-order derivative (2nd Der), and continuous wavelet transform (CWT) (Bian et al., 2016;Han et al., 2017;Ma et al., 2020). De-bias correction and DT algorithms are used to eliminate baseline drift. MinMax method is the most frequently used scaling technique that normalizes all variables to a certain range. SNV and MSC methods can be used to eliminate the light scattering effect due to uneven distribution of particle and different particle size. CWT and derivative methods can subtract the influence of background or drift in the signal. The resolution and sensitivity of the spectra can also be improved.
However, the signal noise ratio may decrease in the same time.
In order to eliminate multiple interferences in the spectra, a combination of various preprocessing methods is often needed (Bian et al., 2020). How to choose the best preprocessing method and its combination is a problem that must be considered. Visual inspection (Engel et al., 2013) and trial-and-error strategy (Chen & Grant, 2012) are the most used preprocessing selection method.
However, the strategies are time-consuming especially for the analysis of large dataset.
The aim of this study is to develop an effective way to find the optimized preprocessing method and establish a nondestructive method for the identification of different-origin and adulterated Poria cocos. Besides, to the best of our knowledge, there is no report on simultaneous identification of different-origin and adulterated Poria cocos. Spectra of Poria cocos powder samples were obtained directly by the NIRS instrument. The optimized preprocessing method was found by a novel strategy of "pick the best of the best" quickly. Principal component analysis (PCA) combined with the optimized preprocessing methods were used for the identification of different-origin and adulterated Poria cocos.

| Poria cocos sample
Poria cocos powder samples from different areas were purchased from several local pharmacy shops, which were produced in Anhui, Fujian, Guangxi, Guizhou, Henan, Hunan, Sichuan, and Yunnan. All

| Instrumentation and measurements
The spectra were obtained directly by the Antaris II NIRS instrument (ThermoFisher, USA) in diffuse reflectance mode with integrating sphere diffuse reflection accessory. The powder samples were placed in a quartz sample bottle and measured by directly placing the sample bottle at the center of the light spot. To increase signal-to-noise ratio (SNR), all reference and sample spectra were measured with scan number 128. The measurements were repeated three times and averaged. Each spectrum is composed of 1557 data points recorded from 10,000 to 4,000 cm −1 . The resolution of the instrument is 4 cm -1 .

| Data analysis
In order to eliminate the background and noise interference in the spectra, spectral preprocessing methods, including DT, de-bias, SNV transformation, MinMax, MSC, 1st Der, 2nd Der, and CWT methods, were used for spectra pretreatment before PCA calculation.
In the calculations of CWT method, "haar" wavelet and scale =20 were adopted. In order to eliminate multiple interferences in the spectra, combination of various preprocessing methods was used.
How to choose the best preprocessing method and its combination is a problem that must be considered. However, it is time-consuming to get the best pretreatment combination by random combination.
A total of 109,601 preprocessing methods and their combinations be obtained, including no preprocessing, one to eight preprocessing combinations. The strategies are time-consuming especially for the analysis of large dataset. In this paper, the best preprocessing method was selected by employing the principle "pick the best of the best." First, according to the literature (Bian et al., 2020;Gerretzen et al., ,2015Gerretzen et al., , , 2016, eight preprocessing methods were divided into three classes: baseline correction, scattering and trend correction, and scaling, shown in Table 1. CWT and derivative methods (1st Der and 2nd Der) are baseline correction methods which can subtract the influence of background or drift in the signal. SNV and MSC methods are two well-known scatter correction methods which can eliminate the light scattering effect due to uneven distribution of particle and different particle size. MinMax method is the most frequently used scaling technique that normalizes all variables to a certain range. The preprocessing methods with the best predictions in each class were selected. Then, the selected preprocessing methods were combined in pairs according to the three classes. The pair combination preprocessing methods with the best predictions and also better predictions than single methods were selected. Finally, the selected pair combination preprocessing methods were combined with the methods in the unselected class. The three combination preprocessing methods with the best predictions and also better predictions than pair combination method were selected as the final prediction.
A total of 119 preprocessing methods and their combinations can be obtained, shown in Table 2.

| Identification of Poria cocos by pair combination preprocessing methods with "pick the best of the best" strategy
The best preprocessing method was selected by employing the principle "pick the best of the best." The selected single preprocessing methods were combined in pairs according to the three classes. Therefore, methods of Nos. 12-14, 17-19, 33, 34, 36-38, 40-42, 46, and 47 were selected. Figure 5 shows the cumulative variance contribution rate of the first two scores, the identification with the best predictions and also better predictions than single method was selected. Therefore, the preprocessing method of 2nd Der+MinMax was selected for the next calculation.

| Identification of Poria cocos by three combination preprocessing methods with "pick the best of the best" strategy
Combination of three preprocessing techniques with "pick the best of the best" strategy was applied to further improve the accuracy.

F I G U R E 5
The cumulative variance contribution rate of first two scores (a), the identification accuracies of calibration dataset (b), validation dataset (c), and adulterated samples (d) by pair combination preprocessing methods. The red line is the accuracy with the raw data, and the blue dash line is the accuracy with the best single preprocessing methods

F I G U R E 6
The cumulative variance contribution rate of first two scores (a), the identification accuracies of calibration dataset (b), validation dataset (c), and adulterated samples (d) by three combination preprocessing methods. The red line is the accuracy with the raw data. The blue dash line is the accuracy with the best single preprocessing methods. The green dot line is the accuracy with the best pair preprocessing methods Based on the section 3.2, the selected 2nd Der+MinMax method was combined with the methods in the unselected class. Therefore, methods of Nos. 66, 67, 54, 55, 79, and 82 were selected. Figure 6 shows the accuracies of Poria cocos powder obtained combining three combination preprocessing methods. The identification accuracy of adulterated samples is 100.0% with all combinations of three combination preprocessing methods. The cumulative variance contribution rates of first two scores with 2nd Der+MinMax+SNV and 2nd Der+MinMax+MSC are less than 71% and the identification accuracies of calibration dataset are unreliable. The best identification accuracy of calibration dataset is 93.8% with the 2nd Der+SNV+MinMax, 2nd Der+MSC+MinMax, SNV+2nd Der+MinMax, and MSC+2nd Der+MinMax methods. Although the accuracy of validation dataset with DT+MinMax+2nd Der is the highest (87.5%), the cumulative variance contribution rate of first two scores is less than 69% and the identification accuracy of adulterated samples is 0%. The best identification accuracy of validation dataset is 75% with 2nd Der+SNV+MinMax, 2nd Der+MSC+MinMax, SNV+2nd Der+MinMax, and MSC+2nd Der+MinMax methods. Therefore, the four combinations are the best preprocessing methods.
The selected four combinations are in order of baseline correction, scatter and trend correction, and scaling or scatter and trend correction, baseline correction, and scaling, which are basically the same as that reported in the literatures (Bian et al., 2020;Gerretzen et al., ,2015Gerretzen et al., , , 2016  producing areas is not significant, and is difficult to be identified. In addition, there is no need to provide prior knowledge of categories in PCA method, and the identification ability of PCA method is not very strong. 100% identification accuracy cannot be obtained with PCA methods and the supervised identification method using prior knowledge may be more effective.

| CON CLUS ION
In this paper, a novel strategy of "pick the best of the best" was proposed to obtain the optimized preprocessing method for the non-

CO N FLI C T S O F I NTE R E S T
The authors declared that they have no conflicts of interest to this work.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.