Moisture contents and product quality prediction of Pu‐erh tea in sun‐drying process with image information and environmental parameters

Abstract In this study, moisture contents and product quality of Pu‐erh tea were predicted with deep learning‐based methods. Images were captured continuously in the sun‐drying process. Environmental parameters (EP) of air humidity, air temperature, global radiation, wind speed, and ultraviolet radiation were collected with a portable meteorological station. Sensory scores of aroma, flavor, liquor color, residue, and total scores were given by a trained panel. Convolutional neural network (CNN) and gated recurrent unit (GRU) models were constructed based on image information and EP, which were selected in advance using the neighborhood component analysis (NCA) algorithm. The evolved models based on deep‐learning methods achieved satisfactory results, with RMSE of 0.4332, 0.2669, 0.7508 (also with R 2 of .9997, .9882, .9986, with RPD of 53.5894, 13.1646, 26.3513) for moisture contents prediction in each batch of tea, tea at different sampling periods, the overall samples, respectively; and with RMSE of 0.291, 0.2815, 0.162, 0.1574, 0.3931 (also with R 2 of .9688, .9772, .9752, .9741, .8906, with RPD of 5.6073, 6.5912, 6.352, 6.1428, 4.0045) for final quality prediction of aroma, flavor, liquor color, residue, total score, respectively. By analyzing and comparing the RMSE values, the most significant environmental parameters (EP) were selected. The proposed combinations of different EP can also provide a valuable reference in the development of a new sun‐drying system.

to achieve the best product quality. However, the moisture content was difficult to estimate instantly with traditional methods. Also, with the loss of water from the tea leaves, the total flavonoid, phenolic, vitamin C content, chlorophyll II, antioxidant activity, ascorbic acid equivalent are changing significantly (Chan et al., 2009;Roshanak et al., 2016). These changes are not only determined by the tea characters themselves but also related to the environmental parameters (EP) such as air temperature, air humidity, global radiation, light intensity, wind speed, and ultraviolet radiation. Hence, online monitoring of the EP and fast prediction of the moisture content are necessities in the PT industry.
Traditionally, during the sun-drying process of PT, the estimation of moisture content was conducted by the tea makers with their eyes and hands, and the evaluation of product quality was upon the sensory scoring performed by a panel of trained tasters after sun drying. The estimation and evaluation accuracy was based on people's experience, mood, and mental state, which was not stable and consistent (Zhi et al., 2017). Recently, some sophisticated instruments, such as near-infrared spectroscopy (Huang et al., 2021;Zhang et al., 2020), hyperspectral imaging (Wei et al., 2019), micro-NIRS , electronic nose (Tudu et al., 2009), and electronic tongue (He et al., 2009), have been used for tea moisture and quality prediction. These methods can provide good results, but require professional knowledge and expensive equipment. Hence, a rapid and convenient evaluation method is still expected.
Computer vision (CV) is an engineering technology that combines electromagnetic sensing, mechanics, digital video, and image processing technology (Zareiforoush et al., 2015). Evidence has proved that CV is suitable for food quality evaluation (Wu & Sun, 2013), such as moisture detection of black tea in the withering process , prediction of moisture content for Congou black tea , rapid identification of tea quality , quality monitoring during black tea processing , determination of black tea's fermentation quality , evaluation of black tea fermentation degree (Jin et al., 2020), and identification of tea category (Zhang et al., 2016). The above studies mainly used color histogram, wavelet transforms, and gray-level co-occurrence matrix as image information extraction methods, and used support vector machines (SVM), multilayer perceptron (MLP), and radial basis function (RBF) neural network to fit the data for quality prediction. Although these methods can reduce the amount of calculation and improve the program's execution speed, they still have the disadvantage of accuracy decreasing in most situations. Recently, with the development of highperformance computing (HPC) and Graphics processing unit (GPU), deep learning (DL) methods have become a promising approach in various food quality evaluations. Among these technologies, convolutional neural network (CNN) and Gated recurrent unit (GRU) combination has been successfully applied in tea leaf disease recognition (Chen et al., 2019;, apple flower detection (Dias et al., 2018), fishery pond dissolved oxygen prediction , and univariant time series forecasting (Saini et al., 2020). The increased computing ability on HPC and GPU allowed people to efficiently process high-dimensional variables, which further promotes their practical applications.
Limited studies have reported the quantitative evaluation of PT moisture content and product quality in sun-drying process using DL-connected CV techniques. Thus, this study aimed to assess the feasibility of using the image and environmental information based on DL-connected CV techniques for moisture content and product quality prediction. The specific objectives of this study were to: 1. use an industrial camera and a meteorological station to collect image information and EP during sun-drying process; 2. establish prediction models based on the image information of tea leaves to predict its moisture content; 3. construct prediction models based on the image information and EP of sun-dried tea to predict the final product quality; and 4. select the most influential EP by comparing the change rate of RMSE for optimization of the sun-drying process of PT for future studies.

| Sample preparation
The fresh leaves of PT were collected from large-leaf tea trees in Simao District, Pu-erh City, Yunnan Province, China. Grade III samples were used in this experiment according to the Chinese national standard GB/T 22111-2008: More than 50% of samples were one bud with two leaves or one bud with three leaves, and the others were buds or leaves with the same tenderness. A total of 100 batches of PT were used in the current study. The spreading, fixation, and rolling were all done by machines, and the processing conditions were set to be the same for all batches to reduce the impact of uncertain factors. During the sun-drying process, sample tests (image capture and moisture content measurement) were performed every 2 h, and moisture detection was performed immediately after the images of leaves were captured to reduce the measurement errors. A total of 546 sample tests were conducted during the sun drying of the 100 batches of tea leaves. The EP of each batch of tea were collected for analysis. After the sun-drying processes, the sensory scores of the tea products were given by a well-trained panel.

| Image information collection
A machine vision system in a separate room beside the drying shelter was used to capture the images of tea leaves during the drying process. The system consists of an industrial color camera (MV-CE120-10GC, HIKROBOT TECHNOLOGY CO., LTD., Hangzhou, China) with 8 million pixels and a C-mount Varro-focal lens (MVL-MF0828M-8MP, HIKROBOT TECHNOLOGY CO., LTD.), as well as a uniform light source which can be adjusted on an articulating arm boom. The uniform light emitted by the light source in the room can provide better tea leaf images without shadows. The interior wall of the room was painted white to achieve a uniform diffuse reflection.
Image capture included three steps: (1) 15 ± 0.5 g of PT leaves in the sun-drying process was placed in a glass vessel and spread uniformly, (2) the glass vessel was moved to a separate room and placed on a sampling platform in the machine vision system, and (3) the images of tea leaves were captured. The parameters of the machine vision system were optimized as follows: (1) The industrial camera's white balance was set to automatic mode to correctly display the color of tea and reduce the negative effect by illumination changes. (2) The aperture was adjusted to make the photo's brightness moderate so that the tea leaves could be clearly visible. (3) The working distance of the camera was fixed at 200 mm. After image capture, the raw digital image was saved as bmp files based on RGB color system.
In the whole sun-drying process, the original images of tea leaves were collected every 2 h. It is worth noting that the last sampling intervals may be <2 h due to a possible early finishing of the drying process. The end time of the sun-drying process was judged by the experts in the factory. Since the total drying duration required for different batches of tea was different, different numbers of images of 100, 100, 100, 89, 41, 16, and 100 were collected at the zeroth, the second, the fourth, the sixth, the eighth, the tenth, and the ending hour, respectively. A total of 546 images of all the above samples were used for moisture analysis, but only the images of the final products were used for product quality evaluation.

| Moisture detection
In each sample test, the 300-g PT leaves was collected and mixed evenly. Three grams was used for the moisture content measurement by a halogen moisture analyzer (AMTAST MB65, Amtast USA Inc.). Referencing the Chinese standards of GB/T 8304-2013, the tea samples were placed in a drying vessel and heated to 120℃ for 1 h. The weight loss before and after drying was recorded. The ratio of the changed value to the original weight was the moisture content (wet based).

| Monitoring of EP
Pu-erh tea production specifications (Lv et al., 2013) were followed for the selection of the experimental site. The experimental area was located in a tea factory. A drying shed was used in the experiment.
To maximize sunshine duration in the daytime, the shed was finally placed on the top of a factory building. As illustrated in Figure 1, an intelligent plastic roof was installed on the shed, which can monitor air pressure and rainfall, and automatically close or open before and after rains. There are ditches around the shed to prevent rainwater from flowing into the experimental area. The shed roof was usually open and free of obstructions to ensure better ventilation, in case of no rain. An automatic meteorological station (RS-QXZM-M3-Y-4G, Shandong Renke Control Technology Co., Ltd.) was mounted in the center of the sample area to collect the EP, including air temperature (℃), air humidity (%RH), total radiation (W/m 2 ), light intensity (lux), wind speed (m/s), and ultraviolet radiation (mW/cm 2 ). The measurement error of the temperature sensor was ±0.4℃, and the humidity sensor error was ±2%RH. The range of measurement is 0-60℃ for the air temperature sensor and 1%-100%RH for the air humidity sensor.
In the meteorological station, global radiation was measured with a pyranometer, the measuring range of the pyranometer was 0-1800 W/m 2 , and the resolution was 1 W/m 2 . Furthermore, light intensity was measured by the light sensor. The light sensor is a lightdependent resistor (LDR) that works based on the semiconductor photoelectric effect. The light sensor has a resistance that varies with ambient light intensity. By determining the corresponding relationship between resistance and illumination, the light intensity can F I G U R E 1 Drying shed with environmental parameter monitoring system

| Sensory evaluation
After drying, the sensory quality was assessed (according to the The residues were sniffed three times for aroma evaluation. Then, the liquor was first evaluated for intensity, clarity, and brightness. When the temperature dropped to 40℃, 5-10 ml liquor was drunk and swirled continuously by the tasters with their tongue tips. For the tea liquor taste, the aroma was expelled through the nose when the top of the tongue was swirling the liquor (Wang & Ruan, 2009).
The evaluation of the tea leaf was then followed immediately.
According to the weighting of each sensory attribute for green tea provided by the Chinese National Standard (GBT23776-2018), the score of the overall quality was calculated by the following formula: appearance × 25% + aroma × 25% + flavor × 30% + liquor color × 10% + residue × 10%. The final scores were the mean values of the three experts. In this study, all PT samples were picked from similar tea trees and had the same tenderness, and hence, their appearance scores were 20 points from all the panelists.

| Quantitative prediction models with image information
Different models were constructed to predict moisture content during the sun-drying process and to predict the sensory scores after drying as well. As shown in Figure  More accurate prediction models have larger R-squared and RPD values, and smaller RMSE values. Based on RPD, prediction models are classified into three categories: Category A (RPD > 2), Category B (1.4 < RPD < 2), Category C (RPD < 1.4). Prediction models which successfully categorized A and B were presumed to have the potential to achieve satisfactory results (Chang et al., 2001).

| Wavelet scattering
Wavelet scattering is a null-parameter, handcrafted convolution network originally proposed by Mallat (2012)   Compared with wavelet scattering, Fourier transform also can produce invariants, but its power spectrum depends on the secondorder moments (Bruna & Mallat, 2013). The information on higher order moments is collected by scattering transform, which improves discrimination for scattering representation. Consequently, employing scattering transform for feature learning has advantages in texture feature extraction.

| CNN image feature extractor
With the help of GPU, the training process of CNN can be effectively accelerated. In the case of a large amount of training data input, compared with the generic (low-level) image feature extraction technique, CNN can extract high-level features in the pictures and achieve higher accuracy. Consequently, CNN was used in this study as the feature extractor to compensate for the inefficiency and low accuracy of generic methods.
Convolution is a shift-invariant operation, including the performance of locally weighted combinations across all the input images.
The convolution layer is composed of several convolution kernels which are used to compute different feature maps. Depending on the set of chosen weights, various input features are revealed.
Mathematically, the feature value at location (i, j) in the k-th feature map of l-th layer, Z l i,j,k is calculated by, where W l k and b l k are the weight vector and bias term, and X l i.j is the input patch centered at location (i, j) of the l-th layer. The weight of the kernel W l k is shared by filters across the entire visual field, which can reduce the model complexity and make the CNN easier to be trained (Aloysius & Geetha, 2017).
In general, the traditional CNN consists of convolutional layers, activation layers, downsampling layers, fully connected layers (FC), and loss function. A FC is usually used to collect feature information extracted in the filtering stage. The earlier layers of a CNN tend to learn more low-level elements such as edges and contours, which are then combined by the last layers to recognize complex high-level image features of task-specific objects (Dhillon & Verma, 2020). In short, by careful manipulation, an excellent CNN structure can effectively turn complex information into simple features.
In recent years, Mobilenet V2 has become one of the most popular CNN models that researchers and practitioners frequently use.
It is based on an inverted residual structure, which provides more efficient and lightweight architecture. RexNet can be improved with  (Sandler et al., 2018). As shown in Figure 5, RexNet was used as a CNN image feature extractor in this study.

Images of 546 samples and corresponding moisture content labels
were input to the RexNet in the training process. PT images were augmented to increase the number of training data. The augmentation helped to improve the generalization of classifiers and reduce the possibility of overfitting. The augmentation methods included combinations of adding white noise, random reflection, random rotation, and random scaling in this study.
In the feature extraction process, the weights of the RexNet layers were frozen. All pictures were passed through the CNN feature extractor, and the FC's output vectors were obtained simultaneously. Finally, the multiplication of the FC feature vectors and 546 moisture content was used as the CNN image features. The expression is shown in (3): Where Z is the CNN features of a single image (1 × 546), X is the fully connected vector (546 × 1), and Y is the 546 moisture content label vector (546 × 1).  Features extraction process

| Neighborhood components analysis feature selection
The objective function for minimization is: The NCA method is used to select feature variables with higher weights as the input of the prediction model to improve the training speed of the model while ensuring prediction accuracy.

| MLP and GRU predictor
The recurrent neural network (RNN) is an extension of a conventional feedforward neural network. However, RNN cannot avoid the gradient explosion problem (Yu et al., 2019), which limits its application. The long short-term memory (LSTM) network not only uses a complex structure to overcome this challenge but it also causes slow computing speed. In contrast, a GRU network has only two gate structures, including an update gate and a reset gate. It can reduce computation as much as possible while solving the gradient explosion problems. At the same time, a multilayer GRU stacking could increase its prediction ability. In fact, in recent years, many methods have been reported to be used in deep CNNs, such as ReLU activation function, batch normalization layer, and shortcut connection (He et al., 2016). As illustrated in Figure 6, the above methods were used to deepen the GRU model further while obtaining a smaller loss value in this study.
To improve the prediction accuracy of GRU, the rectified linear unit (ReLU) was used to clip negative values to zero and keep the positive value unchanged (Hara et al., 2015). Activation functions can be used to combine the weighted sum of input and biases, in order to decide if a neuron could be fired. Therefore, the overall speed of neural network computation can be enhanced by avoiding computer exponentials and divisions: It simply outputs 0 when X < 0 and outputs a linear function when X ≥ 0, where X refers to the input vectors.
Multilayer CNNs are highly nonlinear as it is a cascade of several nonlinear operations. Therefore, Batchnorm was developed to improve the training process of neural networks by stabilizing the distributors of inputs (Ioffe & Szegedy, 2015), which plays an essential role in rectifying nonlinearity in CNN. It has been used in most DL models as a default setting. The shortcut connections were first used in residual network (He et al., 2016), which skips layers in the forward step of an input. This milestone architecture solves the problem that deep neural networks were slow in the training process and resulted in a better performance than similar counterparts.
As shown in Figure

| Moisture content prediction with image processing
Experiments were performed and moisture prediction models were established for the sun-drying process. The low-level image feature extractor (including HSV color histogram, L a* b* color moments, RGB color autocorrelogram, and wavelet scattering) and high-level image feature extractor (CNN) were used to extract color and textural features of tea leaves. In order to simplify the modeling process and improve the model performance, the NCA feature selection method was applied to select crucial variables essential for moisture prediction. MLP and GRU were established as the moisture predictor. The trend curves of color features that varied with moisture contents are shown in Figure 7a,b. In the sun-drying process, the tea leaves fade and color changed gradually. In order to extract color features, RGB images were converted into HSV and L a* b* color spaces. The (0, 1) normalization was conducted on the original data to investigate the dynamic principle of color and texture features.
The coefficients between color features and moisture contents were obtained from the Spearman two-tailed test (Zar, 1972). The coefficient of the blue channel's value is 0.094, which is not signifi- During the sun-drying process, the leaf texture also changed from spreading to shrinking form. The trend curves of texture features which varied with moisture contents are shown in Figure 7c,d. The texture features were extracted using the gray level co-occurrence matrix (Haralick et al., 1973). The Spearman correlation coefficient of homogeneity is 0.095, which is not significant at the level of 0.05.
The coefficients of contrast, energy, correlation, sum average, and sum variance are −0.354, −0.286, 0.558, 0.577, and 0.580, respectively, which are significant at the level of 0.01. Figure 7, a* presented an upward trend with decreased moisture content, and R, G, H, S, V, L, b* demonstrated a declining linear trend. Contrast, energy, correlation, sum average, and sum variance of texture features presented a decreasing trend with reducing moisture content. Consequently, the color and texture features of PT leaves changed considerably with the loss of water during the sun-drying process, which indicated that the moisture content of tea leaves could be predicted by color and textural features.

| Generic (low-level) image feature extraction
The extraction method of low-level image features requires less computing resources, but it is easy to cause insufficient feature extraction and reduced prediction accuracy. To extract the generic (low-level) features, the HSV color histogram, L a* b* color moments, and RGB color autocorrelogram of tea leaves' images during the sundrying process were calculated. The low-level image features were extracted by the following steps: 1. The hue, saturation, and value (HSV) histogram of tea leaf images was calculated. The photos were transformed into HSV color space with 8 × 2 × 2 equal bins. The HSV histogram feature includes a 1 × 32 vector, which was suggested by the literature (Barman & Choudhury, 2020).
2. Color moments is a practical, robust method to describe the image color features. The tea leaves' mean, standard deviation, skewness value of L, a*, b* channel were considered and formed a 1 × 9 vector, which was suggested by the same literature (Barman & Choudhury, 2020).
3. The color autocorrelogram was used to find the spatial correlation of the identical pixels. The input images were transformed F I G U R E 7 Changes in color (a and b) and texture (c and d) features with the variation of moisture contents into 64 colors in RGB as 4 × 4 × 4 color space, and calculated in a 1 × 64 color autocorrelogram feature with a distance set of 1,3,5,7, which was also suggested by the literature (Barman & Choudhury, 2020). Figure 8, also referring to Section 2.6.1, the mean values along the second and third dimensions were calculated to obtain 391 element feature vectors for each image. This resulted in a significant reduction of data from 65,536 elements down to 391. The wavelet scattering method was applied for the extraction of textural features. As illustrated in Figure 7, the textural features extracted by wavelets with multiple angels have significant differences, allowing wavelet scattering to analyze tea leaf images' texture features comprehensively.

| CNN (high-level) image feature extraction
The extraction method of CNN image features has been improved with the development of GPU. With the assistance of GPU, compared with the low-level image feature extraction methods, the training time of the CNN extractor has been shortened and the inference time of each picture has been reduced. The tea leaves with different moisture contents were photographed and the images were then used as the input of the CNN extractor to extract the CNN (high-level) image features. The CNN extractor was mainly composed of a RexNet CNN.
As described in Section 2.6.2, the multiplication values of the FC's vector and moisture content were used as CNN features. The training options of CNN are shown in Table 1. Few training epochs have a poor classification effect during the training procedures, but more epochs lead to wastage of time. In this study, only part of the data were used for the determination of the suitable training epochs. After observation, it was found that 300 epochs can make RexNet achieve a stable accuracy. Consequently, 300 training epochs were selected under a comprehensive consideration.
The 10-fold cross-validation method was applied to reflect the feature extraction capabilities of the RexNet mode. Finally, afterthe 10-fold cross-validation was performed, the accuracy of the training set was 99.45% and that of the validation set was 98.82%.
However, the feature extraction process was only implemented on the validation set. In this study, all experiments were carried out under the PyTorch 1.4 framework with Ubuntu 18.04 operating system.
In order to illustrate the essential information for the determination of the category, Grad-CAMs and saliency maps were developed. The feature maps in Table 2 show that the first convolutional layer focused on extracting edges of leaves and branches in tea leaf pictures. As shown in feature maps, branches and leaves are currently distinguished. The residues that cannot be used for classification are identified at the same time, which proves the advantages of CNN as an image feature extractor. As illustrated in Grad-CAM and saliency map, the RexNet tended to analyze the leaf area changes caused by water loss when predicting the moisture contents of different tea leaves. As shown in Grad-CAM (b) and saliency map (b), since the shape of the branches is not sensitive to changes in moisture content, water loss has less influence on the shape of branches than on the shape of leaves. Consequently, the branch area was blue in Grad-CAM and black in the saliency map. On the other hand, the leaf area was red in Grad-CAM and white in the saliency map, which proved that RexNet can be used to reflect the moisture content of tea leaf. In sample (c), the red area in Grad-CAM was smaller than the white area in the saliency map and correct regions in the original picture; hence, the red area in Grad-CAM (c) should be larger.
As illustrated in the literature (Ju et al., 2021), the main reason for this phenomenon may be as follows: in Grad-CAM, the heat map is generated by features from the shallow layers of CNN, which causes the presentation of high semantic features and loss of part of spatial The results of the NCA feature selection are illustrated in Figure S1. The 10% features were selected from the original datasets. The numbers of image features were significantly decreased by 619 for low-level image features and 490 for high-level image features, suggesting the advantages of the NCA method. According to the results, color moments, color histogram, and wavelet scattering had a higher priority in predicting tea leaves' moisture content.
However, the CNN features of a sample with moisture content <10% and more than 60% were relatively hard to select, indicating that the CNN features with the tea moisture between 10% and 60% have a more significant impact on the accuracy of the moisture prediction.
Datasets selected by NCA were used to develop moisture content prediction models to accelerate the model's training speed and improve the model accuracy. The MLP is a multilayer feedforward network. It has a simple structure and can be trained quickly, but the prediction performance of MLP is usually worse than that of GRU. The data were divided into training and testing datasets using a random number generator to implement the 10-fold crossvalidation in the neural network model. The size of hidden layers is 10. By using the optimized algorithm of the gradient descent method, the network was trained for 200 epochs to achieve a stable loss value and optimize the model performance. The same method used in Section 2.6.4 was also applied to determine the GRU training parameters. The training parameters of the GRU model are shown in Table 3.
Two image feature extractors (generic and CNN feature extractor) and two predictors (MLP and GRU) were combined to form four models. Only the image information was applied for moisture content prediction during the whole sun-drying process. Moisture content prediction models were built in three situations, including the prediction for each batch of tea sample, tea sample in the same batch but at different sampling periods, and the overall samples.
The R-Square, RMSE, and RPD values of each model were applied to measure the prediction accuracy.
As illustrated in Figures S2 and Fig. S3 Consequently, the point with the most significant prediction error in the CNN-GRU model can still meet the relatively high moisture prediction accuracy for each batch of tea. The batch with the most significant error is shown in Figure S2, and the errors of other batches were smaller than the 74th batch.
As illustrated in Figures S4 and S5 relatively larger prediction error and achieves satisfactory prediction accuracy (R 2 > .9 and RPD > 2). Therefore, the high-level image features + GRU model is suitable for detecting various sampling times' moisture contents.
For moisture content prediction for the overall samples, the data of the above two parts were merged and predictions were conducted. As shown in Figure S6 and Table 4, the combination of CNN image features and GRU predictor achieved the best performance, followed by high-level image features + MLP, low-level image features + GRU, and low-level image features + MLP.
In summary, as illustrated in Table 4, although the values of RMSE are different in the above three cases of moisture content prediction, the RMSE rankings of the four models were consistent.
The values of R 2 are >.9, and RPD values are >2. Consequently, the CNN-GRU model can achieve good prediction accuracy under various conditions.

| Sensory quality evaluation of sun-dried PT
Experimental data were selected and analyzed to build more effective prediction models for sensory quality prediction. As reported in the literature (Chan et al., 2009;Roshanak et al., 2016), during the sun drying of PT, the chemical substances' changes are closely related to the final sensory quality, and moisture content prediction models are helpful in the evaluation of the sensory scores. The image features (reflecting moisture content) and the EP were input into the MLP and GRU prediction models. The results of RMSE were then compared for the selection of the most effective combination to reduce the total input parameters, improve model accuracy, and find the most critical EP that affect the sensory quality mostly.

Validation method Tenfold cross-validation
Optimizer Adaptive moment estimation Total training epochs 300 epochs

Number of GRU hidden units 100
Number of shortcut block 2

| Correlation between image information, EP, and sensory scores
As mentioned in Section 3.1.1, the correlation coefficients between image information, EP, and sensory scores were calculated with the Spearman two-tailed test. The larger the correlation coefficient is, the brighter the grid appears. The EP used here are the average values of the whole sun-drying process.
As illustrated in Figure 9, air humidity is significantly but negatively correlated with sensory scores. Air temperature, global radiation, light intensity, wind speed, and ultraviolet significantly are positively correlated with sensory scores. As illustrated in The reasons for the above phenomena may be as follows: (1) too high air humidity reduced the quality of PT in the sun-drying process, and a reasonable increase in air temperature, global radiation, light intensity, wind speed, and ultraviolet helped to improve sensory scores.
(2) Image features and EP can reflect the changes in sensory scores.
(3) The correlations between EP and image features are not significant. The reason might be that the image features were affected by multiple EP in the sun-drying process, so a single environmental parameter could not fully reflect the changes in image features.

| Image features' extraction and sensory scores' prediction
In order to fit the sensory scores of the sun-dried PT, both EP and image information were used to evaluate the aroma, flavor, liquor color, residue, and total score of sun-dried PT. After adding various combinations of EP to the models, the significance of EP was measured. By comparing the change rate of RMSE, the most compelling environmental parameter combination is proposed, which uses fewer inputs to achieve higher accuracy.
The CNN-GRU model with the highest accuracy had been used in moisture content prediction to evaluate the sensory scores but did not incorporate EP into the model. As illustrated in Figure S7, sensory scores cannot be evaluated when the image information only was input into the GRU model. Subsequently, EP (including air humidity, air temperature, global radiation, light intensity, wind speed, and ultraviolet radiation) were used as GRU input variables. The NCA feature selection method was used to decrease the number of inputs and improve the model performance.
As illustrated in Figure 11a,b, with EP' join, the NCA weights of air temperature, air humidity, light intensity, global radiation, wind speed, and ultraviolet were higher than any single image feature,  Furthermore, as illustrated in Figure S8, combinations of different EP were separately input into the CNN-GRU model to attempt the prediction possibility. As shown in Figures S9-S11  high-level image features + EP + MLP, and low-level image features + EP + MLP. The above sequence is different from the ranking of moisture content prediction. The possible reason is as follows: when predicting the moisture content, the color and texture of images were closely related to water, and the degree of image information extraction has a more significant impact on the prediction performance of models. Therefore, whether to use CNN as an image feature extractor is a crucial factor. In contrast, when evaluating sensory scores, EP are in a more critical position compared to image information, so the advantages of GRU were more easily reflected.
However, in all cases, the CNN-GRU model achieved the highest prediction accuracy, which fully proved the advancement and superiority of the deep learning-based moisture content prediction and sensory quality determination model. In order to achieve a high prediction accuracy and minimize the number of required EP, the highest accuracy points of different numbers of EP are analyzed and shown in Figures S9-S11 and Table 5.
For example, when only three EP were input to the model, for evaluation of aroma and flavor, the CNN-GRU model using air temperature + air humidity + light intensity as the EP got the local minimum RMSE value. For liquor color, residue, and total score, the best is air temperature + wind speed + ultraviolet.
The RMSE value and change rate of the points in Tables S1-S3 are shown in Table 6. As illustrated in Table 7, the points with the fastest RMSE reduction were selected as the most efficient combination of EP. The combinations in Table 7 were used to evaluate the sensory scores of PT after the sun-drying process, and the scattergram results are shown in Figure 12. It is evident that usage of the recommended environment parameters can accurately evaluate the aroma, flavor, liquor color, residue, and total score of PT. This conclusion is important for the optimization of the sun-drying process and improvement of the sensory quality of PT. improve the sensory quality and insure the consistency of the sensory scores of different batches of tea products.
Tea category identification using a novel fractional fourier entropy and jaya algorithm. Entropy, 18 (3)