Fast detection of moisture content and freshness for loquats using optical fiber spectroscopy

Abstract Detection of the moisture content (MC) and freshness for loquats is crucial for achieving optimal taste and economic efficiency. Traditional methods for evaluating the MC and freshness of loquats have disadvantages such as destructive sampling and time‐consuming. To investigate the feasibility of rapid and non‐destructive detection of the MC and freshness for loquats, optical fiber spectroscopy in the range of 200–1000 nm was used in this study. The full spectra were pre‐processed using standard normal variate method, and then, the effective wavelengths were selected using competitive adaptive weighting sampling (CARS) and random frog algorithms. Based on the selected effective wavelengths, prediction models for MC were developed using partial least squares regression (PLSR), multiple linear regression, extreme learning machine, and back‐propagation neural network. Furthermore, freshness level discrimination models were established using simplified k nearest neighbor, support vector machine (SVM), and partial least squares discriminant analysis. Regarding the prediction models, the CARS‐PLSR model performed relatively better than the other models for predicting the MC, with R 2 P and RPD values of 0.84 and 2.51, respectively. Additionally, the CARS‐SVM model obtained superior discrimination performance, with 100% accuracy for both calibration and prediction sets. The results demonstrated that optical fiber spectroscopy technology is an effective tool to fast detect the MC and freshness for loquats.

water loss, leading to wilting, shriveling, and browning, which reduces their shelf-life and causes significant economic loss (Lufu et al., 2020;Pareek et al., 2014;Shah et al., 2023;Tian et al., 2011).Therefore, assessing the moisture content (MC) of loquats is a crucial step in ensuring their freshness and commercial value.
Detection of the MC by traditional instrumental techniques is often destructive.Consequently, the development of nondestructive techniques is essential for determining the MC and freshness level of loquats (Li, Huang, et al., 2022;Shang et al., 2023).Non-destructive detection methods that combine spectroscopic techniques with chemometrics offer several advantages, including fast analysis, no environmental pollution, and no damage (Li, Zhang, & Wang, 2022).In recent years, this method has gained widespread use in assessing the quality of fresh fruit (Ye et al., 2022;Yildiz et al., 2022).At present, researchers have conducted studies on quality detection in various fruits, including kiwifruit (Xu et al., 2023), apple (Pissard et al., 2021), grape (Kanchanomai et al., 2020), watermelon (Ibrahim et al., 2022), strawberry (Zhao et al., 2023), and others.Minas et al. (2021) conducted accurate assessment of peach internal quality and physiological maturity using near infrared spectroscopy (NIRS), which is helpful to facilitate the wider application of NIRS throughout the tree fruit supply chain.Sun et al. (2020) developed a nondestructive detection method for determining blackheart pear and soluble solids content.The partial least square discriminate analysis model achieved a discrimination rate of 96.88% for identifying blackheart pears, while the partial least squares model calibrated with healthy pears exhibited improved performance with a root mean square error in prediction set of 0.45.Alhamdan and Atia (2017) investigated the application of near infrared spectroscopy for assessing the quality of Barhi dates at different maturity stages.The proposed models achieved the coefficient of determination values of 0.97, 0.94, and 0.64 for total soluble solids, MC, and b* color, respectively.Tantinantrakun et al. (2023) demonstrated that both transmittance short wavelength near infrared spectroscopy (SW-NIRS) and reflectance near infrared hyperspectral imaging (NIR-HSI) could determine the maturity index in pineapples.The SW-NIRS achieved a coefficient of determination considering cross-validation (R 2 cv ) of 0.70, while the NIR-HSI showed excellent prediction with an R 2 cv of 0.72.Specifically, optical fiber spectroscopy, with its low cost and rapid measurement, has become a widely used technology for assessing fruit quality (Li, Sun, & Cheng, 2016).Tewari et al. (2008) combined optical fiber spectroscopy with machine learning algorithms to determine the origin and sugars of citrus fruits.Kawano et al. (1992) investigated the feasibility of fiber optics in interactance mode to predict sugar content in intact peaches.Guthrie and Walsh (1997) proposed modified partial least squares (MPLS) regression for analyzing pineapple juice Brix and mango flesh dry matter by near infrared spectroscopy.The MPLS models achieved a multiple coefficient of determination of 0.91 for pineapple juice Brix and 0.98 for mango flesh dry matter.These studies indicated that the feasibility of determining fruit quality using optical fiber spectroscopy techniques.However, there are no studies reporting the simultaneous determination of the MC and freshness for loquats using optical fiber spectroscopy technology.
In this study, optical fiber spectroscopy techniques combined with chemometrics was applied to explore the feasibility of determining the MC and freshness for loquats.The main objectives were (1) to establish prediction models for determining the MC of loquats including partial least squares regression (PLSR), multiple linear regression (MLR), extreme learning machine (ELM), and back-propagation (BP) neural network, (2) to develop discrimination models for different freshness levels of loquats including simplified k nearest neighbor (SKNN), support vector machine (SVM), and partial least square discrimination analysis (PLS-DA), (3) to select feature variables from full spectra using competitive adaptive reweighted sampling (CARS) and random frog (RF) methods for designing simplified models.

| Sample preparation
The fresh loquat samples were harvested from the commercial or-
The light source was preheated for 30 minutes prior to spectrometry.Experiments were conducted to adjust parameters, including integration time and averaging times, based on the spectral intensity of the test standard reflectance whiteboard.The parameters of the system were: the integration time of 110 ms, the average number of scans of 8, and the average sliding width of 1.The reflection probe was attached to the RPH-1 using the RPH-ADP.The surface of RPH-1 was positioned 1 cm away from the loquat sample.Subsequently, the tested loquat sample was tightly placed on the surface of RPH-1, and spectra were collected from equatorial positions in each sample.

| Moisture content measurement
The loquats were sliced meticulously into thin slices with an approximate thickness of 3 mm using a sharp knife.The slices were evenly distributed in a weighing dish, and the initial weight of each sample was continuously recorded using a highly accurate balance (accuracy ±0.001 g).Subsequently, the weighing dish was then placed in a drying oven (101-2, Tianjin Taisite Instrument Co., Ltd., China) and heated at a temperature of 75°C for 8 h.The dish was then cooled to room temperature after the heating period, in preparation for subsequent weighing.Additionally, the fruit slices were heated for an additional 2 hours, cooled, and then weighed.The process was repeated until the absolute difference between two consecutive weighings did not exceed 0.001 g, signifying that the fruit had achieved a state of constant weight.The MC was calculated by the following equation (Crichton et al., 2018): where, X represents the MC of loquats.w 0 denotes the weight of the empty weighing dish in g. w 1 signifies the weight of the weighing dish and the sample before oven drying in g. w 2 indicates the weight of the weighing dish and the sample after oven drying in g.

| Spectra pre-processing algorithm
Spectral data acquired from the spectrometer not only contains valuable sample information but also includes background information and noise.To eliminate interference information and enhance the predictive ability, it is essential to pre-process spectral data before modeling (Zhang et al., 2022).In this study, the obtained reflectance spectra were pre-processed using standard normal variate (SNV), a row-oriented transformation that centers and scales individual spectra (Guo et al., 2016).SNV possesses the capability to mitigate the influence of particle size, surface scattering, and optical path variations on the spectra (Xiao et al., 2022).The spectra are preprocessed using the following formula: where, Z ij represents spectra after pre-processing, x ij is raw spectrum, − x i represents average spectrum, S i represents the standard deviation of the i sample spectral data, and p is number of spectral points.

| Kennard-Stone algorithm
In this study, samples were divided into a calibration set and a prediction set using the Kennard-Stone (KS) algorithm.The KS algorithm starts by selecting a pair of samples with the largest Euclidean distance.Sequentially, a sample is chosen, which exhibits the greatest minimum Euclidean distance between the previously selected samples and the remaining samples.This process continues until the desired number of samples is achieved (Wang & Wang, 2022).

| Effective variables selection algorithms
Each spectrum within the dataset consist of 1024 variables.Several studies have demonstrated that efficient wavelength selection can simplify calibration modeling and enhance the accuracy and robustness of the model (Li & Chen, 2017).This study employed CARS and RF to select effective variables.
Competitive adaptive weighting sampling algorithm (Tang et al., 2021) is a variable selection method that applies the principle of "survival of the fittest".It employs the adaptive reweighted sampling technique to address collinear and non-information variables.
The variables are determined based on the absolute value of the PLSR regression coefficient.Individuals with larger regression coefficients are retained, while those with smaller coefficients are eliminated during the selection process.The optimal subset of variables is determined through the cross-validation method, which aims to minimize the root mean square error of cross-validation (RMSECV) (Li et al., 2021).
RF algorithm is initially proposed for the analysis of gene expression in diseases.It is similarities with reversible jump Markov chain Monte Carlo, which simulates a Markov chain in the model space to calculate variable weights according to a steady-state distribution.
The greater the importance of a variable to the model, the higher its probability of being selected.As a result, it is possible to rank the selection probabilities of all variables and choose the one with the highest probability as the characteristic wavelength.To mitigate the influence of random factors, multiple runs are necessary, and the results are tallied (Zhang et al., 2019).

| Quantitative modeling and evaluation
Four quantitative models, namely PLSR, MLR, ELM, and BP neural network, were utilized to predict the MC of loquats using the selected characteristic spectra.
Partial least squares regression is a highly reliable linear method widely used for establishing predicted quality parameters in food products.It is capable of addressing multiple covariance issues and handling a larger number of variables than the sample size, demonstrating its predictive potential (Shao et al., 2022).MLR is a statistical method used for regression analysis, offering the advantages of simplicity, low computational complexity, and high accuracy in fitting (McCann et al., 2010).It aims to establish a linear regression model by correlating two or more explanatory variables with a response variable (Kamruzzaman et al., 2013).ELM algorithm is designed for training single hidden layer feedforward neural networks (Wang et al., 2022).It exhibits rapid learning and superior generalization performance compared to traditional feedforward neural network algorithms (Li et al., 2019).BP neural network is a multilayer feedforward neural network trained iteratively using the error back-propagation algorithm.Classical BP neural networks consist of input, hidden, and output layers (Qi et al., 2021).These networks employ forward propagation and backward error propagation to calculate activation values and update the weights of neurons in each layer (Buscema, 1998).
The performance of the calibration model was assessed using the calibration determination coefficient (R 2 C ) and root mean square errors (RMSEC).To evaluate the prediction ability of the model, the prediction determination coefficient (R 2 P ), root mean square errors (RMSEP), and residual prediction deviation (RPD) were calculated.Usually, a well-performing model exhibits higher values of R 2 C , R 2 P , and RPD, and lower values of RMSEC and RMSEP.A model is considered poor-performing when the RPD is below 1.5, whereas an RPD ranging from 1.5 to 2 indicates moderate performance.An RPD between 2 and 2.5 suggests good performance, while an RPD exceeding 2.5 indicates excellent performance (Askari et al., 2015).
where, n c is the number of samples in the calibration set, n p is the number of samples in the prediction set.y act and y mean are the referenced and average values, respectively.y cal and y pre are the predicted values in the calibration and prediction sets, respectively.SD is the standard deviation of the referenced values in the prediction set.

| Qualitative modeling and evaluation
Three qualitative models, namely SKNN, SVM, and PLS-DA, were established to identify freshness levels of loquats using the selected characteristic spectra.
Simplified k nearest neighbor operates by initially computing the center of gravity of each category in the calibration set and then calculates the distance between each sample in the prediction set and the center of gravity of each respective category.The smaller the distance, the more likely it is to fall into the category.SVM is a popular algorithm widely employed in pattern recognition (Ning et al., 2017).Its objective is to build a hyperplane or set of hyperplanes in a high-dimensional space to achieve optimal separation between different sample classes (Sanz et al., 2016).PLS-DA combines the properties of PLS regression with a classification technique in order to establish a relationship between the spectral data variables in the X block and the represented classes in the y vector (Bonifazi et al., 2019).
The evaluation of discrimination results used a confusion matrix to evaluate the performance of the discrimination models.The discrimination rate was also used to estimate the performance of the discrimination models, as follows (Teye et al., 2013): where, I R represents the discrimination rate, N 1 represents the number of correctly discriminated samples, and N 2 represents the total number of samples.

| Spectral analysis
The raw and pre-processed spectra of all loquat samples in the range of 200-1000 nm were shown in Figure 3.The spectra presented a consistent tendency, but their reflection intensities were different.After SNV pre-processing, the differences in spectral curves between the loquat samples were reduced and the overall curves became smoother.The absorption peak at 675 nm was primarily attributed to chlorophyll absorption on the surface of the loquat, resulting in the color appearance of loquats (Munera et al., 2021).Furthermore, the absorption peak at 980 nm was primarily attributed to water absorption in the loquat, indicating its water content (Camps & Christen, 2009).

| Statistics of moisture content
The Moisture content of loquat samples at different storage time was shown in Figure 4, represented as the values of mean ± SD (standard deviation).A decreasing trend was observed in the MC of loquat samples with the increasing of storage time.
The KS algorithm was used to divide 120 fresh loquats into the calibration set (n = 90) and the prediction set (n = 30) with a 3:1 ratio.
Table 1 showed the statistics of the MC of all loquat samples.As can be seen, the calibration set exhibited a wider range (82.51-90.59%)for the MC compared to the prediction set.This finding suggests the reasonableness of the results and the high representativeness of the selected modeling samples.

| Modeling based on feature variables by CARS
For CARS, 50 Monte Carlo sampling runs were conducted to determine the optimal feature wavelengths by using 5-fold crossvalidation.Uninformative wavelengths were removed, while the effective wavelengths were remained.Figure 5a-c presented the changes in the number of sampled wavelengths, 5-fold RMSECV values, and regression coefficients for each wavelength as the number of sampling runs increases.
In Figure 5a, there was a rapid drop followed by a slower decline in the sampling wavelength, indicating the implementation of both rapid and fine selection in CARS.In Figure 5b, the RMSECV values gradually declined from sampling runs 1 to 27 as the number of sampling runs increased, mainly because uninformative wavelengths were removed.However, the subsequent increasing in RMSECV values could be attributed to the elimination of key wavelengths.The optimal subset of wavelengths, marked with an asterisk, was chosen based on the minimum 5-fold RMSECV value.In Figure 5c, each line represents the coefficient recorded for each wavelength in the different sampling runs.
A larger absolute coefficient indicated a higher probability of selecting the corresponding wavelength.Therefore, a subset of wavelengths, along with their regression coefficients, could be picked up from each sampling run.The optimal subset, which corresponding to the minimum 5-fold RMSECV value, was marked with an asterisk on the vertical line.As calculated by CARS, the RMSECV achieved its minimum value at 27 sampling runs, corresponding to a total of 37 wavelengths (accounting for 3.61% of the full spectra).The feature wavelengths chosen for predicting the MC by CARS were presented in Table 2.And the performance of PLSR, MLR, ELM, and BP models, developed using the selected effective wavelengths, was presented in Table 3.
As shown in

| Modeling based on feature variables by RF
For RF, a higher selection probability indicated a greater significance of the corresponding wavelengths.In this study, the RF algorithm was set to run 1000 operations, the number of potential variables was 14, and the initial number of sampled variables was 2. The probability of selecting each wavelength by RF for assessing the MC of loquats was shown in Figure 6a.The selected variables, marked as hollow red squares, were shown in Figure 6b.
wavelengths.As shown in Table 3, the RF-ELM model outperformed the PLSR, MLR, and BP neural network models constructed using feature variables by RF, with corresponding values of 0.69, 0.62, and 1.84 for R 2 P , RMSEP, and RPD, respectively.And then comparing the models established based on the variable selected by CARS and RF, the performance of the prediction models built on CARS outperformed those constructed using RF.Particularly, CARS-PLSR model for predicting MC exhibited the best prediction performance, with R 2 P and RPD values of 0.84 and 2.51, respectively.The RPD value exceeding 2.5 indicated the excellent performance of the CARS-PLSR model.
Figure 7 presented the scatter plots between the measured and predicted values.The formulae presenting the optimal CARS-PLSR detection model for the MC of loquats was as follows (Equation 9).
where, Y MC is the predicted values for the MC.λ i is the reflectance at the characteristic wavelength, with the subscript i denoting the wavelength (nm).
(9)   and RF, respectively.The optimal wavelengths for freshness classification selected by CARS and RF are given in Table 4.As shown in Figure 8, the RMSECV achieved minimum value at 26 sampling runs, corresponding to a total of 42 wavelengths (4.10% of the full spectra).As shown in Figure 9, a total of 16 effective wavelengths were selected (1.56% of the full spectra) using RF.Table 4 presented the specific wavelengths chosen for freshness classification selected by CARS and RF.
As shown in Table 5, discrimination models built on CARS exhibited higher accuracy compared to those constructed using RF.Especially, the CARS-SVM model had a highest discrimination accuracy for the

CO N FLI C T O F I NTE R E S T S TATE M E NT
All the authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.

E TH I C S A PPROVA L S TATE M E NT
This study does not involve any human or animal testing.

PR AC TI C A L A PPLI C ATI O N S
The MC and freshness are critical factors in determining the quality of loquats.Conventional methods for assessing the MC and freshness are typically destructive and time-consuming.Due to the remarkably short shelf-life of loquats, it is crucial to rapidly detect their MC and freshness for evaluating their commercial value and shelf-life.Optical fiber spectroscopy technology presents itself as an attractive non-destructive technique for assessing the quality of food products.The presented findings indicate that optical fiber spectroscopy has significant potential for online monitoring of the MC and freshness of loquats.

R E FE R E N C E S
chards (Loquat Green Planting Demonstration Garden of Kaiyang County) located in Guizhou Province, China, on June 7, 2022.A total of 120 loquats were collected for the experiment after removing defective loquats with bruises and deformities.The collected samples were then numbered and stored in a laboratory environment at 23 ± 2°C.The experiment commenced 1 day later and continued for 4 days.Each day, 30 samples were used for the experiment.During the experiment, the spectra of each sample was acquired, and their MC was subsequently measured.The images of the loquat samples at the four different storage time were shown in Figure 1.F I G U R E 1 Images of the loquat samples stored for 1 (a), 2 (b), 3 (c) and 4 (d) days.| 4821 MENG et al.
Reflectance curves of raw spectra (a) and SNV pre-processed spectra (b).

F
Statistics of the MC of loquats.

F
I G U R E 5 Processing of CARS for the MC.The change trend of the variables number (a), 5-fold RMSECV values (b), and the regression coefficient path (c) with the increase of Monte-Carlo sampling runs.

F
Processing of RF for the MC.(a) The probability of selecting each wavelength, (b) The selected variables.Scatter plots of the modeling results of the CARS-PLSR model.
. As shown in Section (Statistics of moisture content), the mean value of the MC decreased with prolonged storage time, indicating a gradual decline in the freshness of the loquat fruit.In this study, the freshness of loquats was defined as four levels (Level I, Level II, Level III, and Level IV) according to storage time.A total of 120 loquats were divided into the calibration set (n = 90) and the prediction set (n = 30) with a 3:1 F I G U R E 8 Processing of CARS for the classification.The change trend of the variables number (a), 5-fold RMSECV values (b), and the regression coefficient path (c) with the increase of Monte-Carlo sampling runs.F I G U R E 9 Processing of RF for the classification.(a) The probability of selecting each wavelength, (b) The selected variables.
ratio using the KS Freshness recognition models, including SKNN, SVM, and PLS-DA, were established using characteristic variables selected by CARS and RF.The classification results were summarized in Table prediction set.As illustrated in Figure 10, both the CARS-SKNN and CARS-PLS-DA models misclassified one Level III sample in the prediction set as Level II, resulting in one misclassifications.In contrast, the SVM model exhibited no classification errors.In summary, the CARS-SVM model demonstrated superior performance compared to the other models in determining the freshness of loquats, with a discrimination accuracy of 100% for both the calibration and prediction sets.TA B L E 5 Classification results of freshness of loquats by SKNN, SVM, and PLS-DA models.

F
Confusion matrix in the prediction set for SKNN (a), SVM (b), and PLS-DA models (c).A novel based on optical fiber spectroscopy in the range of 200-1000 nm was employed to determine the MC and the freshness level in loquats.CARS and RF was used to identify the optimal band combination capable of reflecting changes in sample characteristic information within spectral curves affected by overlapping and substantial noise.On this basis, prediction models for the MC were developed using PLSR, MLR, ELM, and BP neural network algorithms, while freshness level discrimination models were established using SKNN, SVM, and PLS-DA methods.The results showed that the model built using characteristic variables selected by CARS exhibits superior performance in comparison to the model built using characteristic variables selected by RF.And the CARS-PLSR model achieved satisfactory prediction results for predicting the MC using only approximately 3.61% variables of the full spectrum, with R 2 P of 0.84 and RPD of 2.51.Regarding the discrimination models, the CARS-SVM model obtained relatively best discrimination performance, which achieving 100% accuracy for both the calibration and prediction sets.The study indicates that the use of optical fiber spectroscopy technology combined with chemometrics for rapid and non-destructive determination of the MC and freshness for loquats is feasible.AUTH O R CO NTR I B UTI O N S Qinglong Meng: Conceptualization (equal); funding acquisition (equal); project administration (equal); validation (equal); writingoriginal draft (equal); writing -review and editing (equal).Shunan Feng: Conceptualization (equal); data curation (equal); investigation (equal); methodology (equal); writing -original draft (equal); writing -review and editing (equal).Tao Tan: Data curation (equal); formal analysis (equal); methodology (equal); software (equal).Qingchun Wen: Data curation (equal); formal analysis (equal); investigation (equal).Jing Shang: Funding acquisition (equal); project administration (equal); resources (equal); supervision (equal); validation (equal); writing -review and editing (equal).ACK N OWLED G M ENTS This study was funded by the Fund Project of the Central Government Guide Local Science and Technology Department (QKZYD[2022]4050), the Fund Project of Guiyang Science and Technology Bureau (ZKHT[2021]43-15; ZKHT-GCC[2023]022), the Special Project of Academic New Seedling Cultivation and Free Exploration and Innovation of Guizhou Provincial Science and Technology Department (2023), and the Project of Undergraduate Innovation and Entrepreneurship Training (S202210976079).

Table 3
, comparing the performances of the PLSR, MLR, ELM, and BP neural network models established based on feature variables by CARS, the CARS-PLSR model exhibited better results for predicting the MC, with R 2 P , RMSEP, and RPD values of 0.84, 0.46, and 2.51, respectively.