Qualitative discrimination of Chinese dianhong black tea grades based on a handheld spectroscopy system coupled with chemometrics.

Abstract The evaluation of Chinese dianhong black tea (CDBT) grades was an important indicator to ensure its quality. A handheld spectroscopy system combined with chemometrics was utilized to assess CDBT from eight grades. Both variables selection methods, namely genetic algorithm (GA) and successive projections algorithm (SPA), were employed to acquire the feature variables of each sample spectrum. A partial least‐squares discriminant analysis (PLS‐DA) and support vector machine (SVM) algorithms were applied for the establishment of the grading discrimination models based on near‐infrared spectroscopy (NIRS). Comparisons of the portable and benchtop NIRS systems were implemented to obtain the optimal discriminant models. Experimental results showed that GA‐SVM models by the handheld sensors yielded the best predictive performance with the correct discriminant rate (CDR) of 98.75% and 100% in the training set and prediction set, respectively. This study demonstrated that the handheld system combined with a suitable chemometric and feature information selection method could successfully be used for the rapid and efficient discrimination of CDBT rankings. It was promising to establish a specific economical portable NIRS sensor for in situ quality assurance of CDBT grades.

(var. assamica) classified into different grades in accordance with the variable in black tea quality that determined by growing condition, harvesting season and process technology. In general, ordinary consumers hold the traditional views that a high price represents an excellent quality of goods in the market (Zhu et al., 2019). Hence, the price of CDBT is nearly the only pathway for normal consumers to evaluate tea quality and discriminate its grades. Besides, the adulteration of CDBT and serving inferior products as superior ones are quite widespread in the tea trade which damages consumer rights and the far-reaching tea culture (Li, Wei, Ning, & Zhang, 2015;Xu, Wang, & Gu, 2019). It is urgent to build a reliable and efficient method for discriminating the quality grades of CDBT (Pang et al., 2012).
During the last few decades, the quality assessment of CDBT is decisively judged by the sensory tests of the experienced tea tasters (Pan, Sun, Li, Deng, & Zhang, 2019). The skilled experts grade the tea samples on a scale, separately for appearance, taste, and aroma, which lacks reproducibility and impartiality due to the tasters' physical or physiological factors . The contents of main biochemical constituents (e.g., catechins, tea pigments, and amino acids) from different tea rankings simultaneously affect its quality parameters. Currently, the sensory evaluation combined with chemical composition analysis is considered as a more accurate the assessment scheme of CDBT grades. Conventional methods of chemical analysis have been used to determine the main chemical components of tea, such as gas chromatography (GC) , high-performance liquid chromatography (HPLC) (Zhang, Jing, et al., 2019;Zhou, Sun, et al., 2019), and colorimetric measurements . However, all of the above mentioned methods are time-consuming and high-cost techniques. The traditional means are unable to achieve the rapid assessment of CDBT quality rankings. So far, the fast detection approaches for tea quality grades mainly involve nanotechnology (Zhu et al., 2019), colorimetric sensor array-based artificial olfactory Li, Xie, et al., 2017), and near-infrared spectroscopy (NIRS) coupling with chemometric methods (Fu, Xu, Yu, Ye, & Cui, 2013;Ikeda, Kanaya, Yonetani, Kobayashi, & Fukusaki, 2007). While, the nanotechnology and colorimetric sensor array based on the reaction of toxic chemical reagents are not mature enough from practical applications. NIRS has proven to be a powerful analysis tool applied widely for qualitative identification in the agricultural and food industries (Deng et al., 2020). NIRS technique evaluates characteristic information through the analysis of the molecular bonds in the NIRS band (e.g., C-H, N-H, O-H, and S-H), which are the primary structural components of organic molecules . At present, extensive research on using NIRS technique in tea is reported. Only two studies have recorded the application of NIRS method to the discrimination of tea grades (Fu et al., 2013;Ikeda et al., 2007). Both published papers are specific to green tea, and that spectral feature variable screening algorithms are not applied and discussed nor were the comparisons of both benchtop and handheld NIRS instruments used to predigest the models.
Besides, no study has performed an experiment to distinguish black tea of eight grades by using the self-developed system because of higher rankings denote, smaller differences, and thus greater difficulty in tea identification (Zhu et al., 2019).
In this work, the main goal is to design an identification model to discriminate CDBT grades by utilizing a low-cost handheld near-infrared sensor combined with chemometric methods (Figure 1). A comparison for the handheld and benchtop devices were executed to verify the feasibility of establishing a reliable and inexpensive discriminant model. PLS-DA and support vector machine (SVM) combined with genetic algorithm (GA) and successive projections algorithm (SPA) methods were applied comparatively in order to select the optimal recognition model and provide an advisable method for the portable NIRS system.

| Sample preparation
A total of 240 CDBT samples were collected from Yunnan province in southwest China. All samples processed into eight quality grades were harvested in 2018 within the tea planting base of Yunnan Dianhong Group Co. Ltd. The number of samples from each grade contained 30. Every sample was ground for 10 s by a high-speed grinder (Beijing ever briGht medical treatment instrument co., Ltd, model FW100). The ground tea (50 g) was sieved with through a 40 mesh and then used for spectra acquisition, and the sieved samples were packed into kraft paper bags and stored in an airtight and dark place for further analysis.

| Spectra acquisition
The data were acquired by Bruker MPA Fourier transform (FT) nearinfrared spectrometer (Bruker Optik GmbH) and a handheld system by independent development, respectively. The benchtop NIRS instrument with an integrating sphere was used to record the diffuse reflectance spectra of the samples between the wavelengths of 900 and 1,700 nm at 8 cm −1 resolution by 32 scans. The data were measured by 3.86 cm −1 interval, which resulted in 1,354 variables in each spectrum. For each sample, 3.0 ± 0.1 g of tea powder was packed into the quartz cell (35 mm diameter), as is the standard procedure for the bulk density of materials. Every sample was measured three times after intercalated 120° cup rotations . The handheld near-infrared spectrometer recorded spectra in diffuse reflection mode in the range between 11,100 cm −1 and 5,880 cm −1 (900-1,700 nm) with a resolution of 5.85 cm −1 , was composed mainly of hardware and software systems. The hardware comprised the spectrometer, electric source, laptop, and the sample cup. The software included USB communication technology.
The spectrometer with digital light procession and single-phase Indium Gallium Arsenide (InGaAs) detector was produced by Texas Instruments. About 3.0 ± 0.1 g of each sample was put into a quartz cell. The spectra were measured consecutively three times with 32 scans, resulting in 512 variables. Performance differences between benchtop and handheld spectrometers were described in Table 1.
The quartz cuvette was rotated manually between measurements.
The mean spectra from each sample were used in the further analysis. The temperature was kept around 25°C, and the humidity was kept at a steady level in the laboratory.

| Spectral data preprocessing
The raw spectra acquired from the NIR spectrometer were prone to be affected by the physical properties of the samples, background information, and noise interferences . Since the dry tea powders were composed of different particle sizes, and the scattering of light was high and variable (Huang, Li, Zhao, Huang, & Chen, 2015), the raw spectral data were subjected to spectral preprocessing to reduce the interference and enhance the contribution of the sample feature attribute before model building.
Standard normal transformation (SNV) was selected as a superior method for the correction of light scatter in this study, which is a mathematical transformation means of the log (1/R) spectra used for removing slope variation and correcting for scattering effects . Each spectrum was corrected individually by two steps: first, centering the spectral values and, then second, scaling by the standard deviation calculated on individual spectral values (Barnes, Dhanoa, & Lister, 1989;Chen, Tan, Lin, & Wu, 2018).

| Multivariate data analysis
Principal component analysis (PCA) is widely employed to identify and eliminate outliers, reduced the dimensionality of the existing F I G U R E 1 Schematic representation of both benchtop and handheld NIRS systems combined with chemometric algorithms. NIRS, near-infrared spectroscopy; PLS-DA, partial least-squares discriminant analysis; RMSECV, root mean square error of cross-validation; SVM, support vector machine TA B L E 1 Performance differences between the benchtop and the handheld near-infrared spectroscopy spectrometers data set, and extracted important information (Chen, Tan, & Lin, 2019;Li, Zhang, Zhao, Huang, & Wang, 2017). The analysis can simplify the data of raw spectra into several variables and eliminate collinearity and reduce the machine learning time while retaining the spectral information correlated with the black tea grades (Subbuthai, Periasamy, & Muruganand, 2012).
Partial least-squares discriminant analysis (PLS-DA) is widely applied as a simple, fast, relative good performance, and linear discrimination method for qualitative analysis (Costa, Uchida, Miguel, Duarte, & Lima, 2017 SVM is based on VC dimension theory and structural risk minimization principle, which is devoted to improving the generalization ability (Wang et al., 2020). At present, SVM has been extensively utilized in practical application. In general, the algorithm is dependent on the optimal combination of two primary parameters (viz. penalty parameter and kernel function) to obtain satisfactory predicting results (Chen, Zhao, Fang, & Wang, 2007;Smola & Schölkopf, 2004). Penalty parameter (c) is employed to evaluate between minimizing the training error and model complexity. Kernel function parameter (g) defines the nonlinear mapping from input space to certain high-dimensional feature space (Zhang, Liu, & Wang, 2008).

| Variables selection method
Two different variable selection algorithms were employed to extract characteristic variables information and reduce the complexity of the classification model in this study, namely, GA and SPA. GA is utilized as a heuristic search algorithm to implement an automated wavelength selection procedure for building multivariate calibration models based on partial least-squares regression with self-organization abilities and high robustness. The method aims to seek the optimum parameters from a population of candidate solutions by variable selection and select representative characteristic variables and improved the accuracy of the models while optimizing the outcomes (Ning et al., 2018;. SPA is proposed as a flex-

| Software
All the algorithms were implemented in Matlab R2014a (Mathworks) and SIMCA 14.1 software (Umetrics) under Windows 8.1 in data processing.

| Results of PCA
The pretreated spectra data of SNV could be reduced to several variables by PCA analysis. The analysis procedure investigated the accumulated variance contribution rate of the top two principal F I G U R E 3 Spectral curves of the standard normal transformation method with the handheld system (a) and benchtop spectrometer (b) F I G U R E 4 Principal component analysis score cluster plots for Chinese dianhong black tea of eight grades based on the handheld system (a) and benchtop spectrometer (b) components (PCs) of the spectral data . Figure 4a shows the PCA classification results for CDBT of eight grades by

| Optimal results of discriminant models via the two NIRS systems
All 240 Table 2. From the comparison of the both algorithms on benchtop NIRS data, the results of SVM models were a satisfactory strategy with the correct discriminant rate (CDR) of 100% for the training and testing sets. PLS-DA modeling failed to give a completely effective performance based on the self-developed system, and the CDR of the model only exceeded 80%. As can be seen from Table 2, the identification models obtained with the SVM approach resulted in calibration and prediction outcomes were superior to PLS-DA. The possible reason was that the nonlinear SVM method could availably simplify the interaction between several variables well. For the above results, the identical excellent model performances were proposed from both desktop and portable devices. The handheld sensor was obviously used as a low-cost and easy tool to achieve the high discriminant rates of CDBT grades with a potential for application industry-wide.

| Optimal results from different variables selection methods
In this study, two selection approaches applying for wavelength variable, which were GA and SPA, were used to select specific spectral   Table 4 shows the predictive results of the optimal PLS-DA and SVM models based on within 100 runs of both variables selection methods via the handheld sensor. As can be seen from Table 4 The SVM method adopted a kernel function to map the data into a high-dimensional space, where it could separate variable classes . The evaluation for CDBT quality ranking was a complex course involving internal physicochemical component transformation and composition metabolism, making recognition, and discrimination challenging.

| Comparisons and discussion of optimal results from different modeling methods
Third, the results of classification models based on GA algorithm exhibited a higher CDR than that of the models using SPA method.
Even though SPA combined with PLS-DA or SVM reducing the signal-to-noise ratio could minimize information overlap and redundancy, and preferred variables (Diniz, Gomes, Pistonesi, Band, & de Araújo, 2014). The identification accuracy by applying the SPA algorithm was greatly affected with a low applicability. Actually, GA was more suitable for screening characteristic wavelengths related to tea grades in this paper. Therefore, the nonlinear GA-SVM algorithm can achieve the best predictive results on the spectra via the developed system wherein low variables can obtain better generalization performance.

| CON CLUS ION
In this study, a developed handheld NIRS system combined with chemometric tools was presented for the rapid discrimination of CDBT discrimination method has a high potential in the identification of CDBT grades, and it is promising to establish a specific economical portable NIRS sensor for in situ quality assurance of CDBT.

ACK N OWLED G M ENTS
This work has been financially supported by the National Key Research and Development Program of China (Project No: 2017YFD0400805).

CO N FLI C T O F I NTE R E S T
The authors have declared no conflicts of interest for this article.

E TH I C A L A PPROVA L
This study does not involve any human or animal testing.

I N FO R M E D CO N S E NT
Written informed consent was obtained from all study participants. Abbrevaitions: GA, genetic algorithm; NIRS, near-infrared spectroscopy; PC, principal component; PLS-DA, partial least-squares discriminant analysis; SPA, successive projections algorithm; SVM, support vector machine.

TA B L E 4
Results of the optimal PLS-DA and SVM discrimination models based on different variables selection methods via the handheld NIRS sensor