Determination of soybean routine quality parameters using near‐infrared spectroscopy

Abstract Large differences in quality existed between soybean samples. In order to rapidly detect soybean quality between samples from different areas, we have developed near‐infrared spectroscopy (NIRS) models for the moisture, crude fat, and protein content of soybeans, based on 360 soybean samples collected from different areas. Compared with whole kernels, soybean powder with particle sizes of 60 mesh was more suitable for modeling of moisture, crude fat, and protein content. To increase the reproducibility of the prediction model, uniform particle sizes of soybeans were prepared by grinding and sieving soybeans with different sizes and colors. Modeling analysis showed that the internal cross‐validation correlation coefficients (R cv) for the moisture, crude fat, and protein content of soybeans were .965, .941, and .949, respectively, and the determination coefficients (R 2) were .966, .958, and .958. NIRS performed well as a rapid method for the determination of routine quality parameters and provided reference data for the analysis of soybean quality using FT‐NIRS.

in samples can be rapidly quantified using NIRS by taking advantage of the vibrational absorption modes of the compounds in the NIR region of the spectrum (Martelovidal & Vazquez, 2014). The frequency doubling and combination bands of various hydrogencontaining groups in moisture, protein, fat, and carbohydrate all fall within the NIR region, and the characteristic vibrational information of the hydrogen-containing groups in these organic molecules can be used to determine the chemical composition of mixtures (Givens, De Boever, & Deaville, 1997).
NIRS is nondestructive, fast and needs no complicated sample pretreatment. Because of these advantages, the technique has been evaluated as a method for the analysis of many agricultural products, including beef, eggs, apples, and tomatoes (Mitsumoto, Maeda, Mitsuhashi, & Ozawa, 1991;Peirs, Scheerlinck, De Baerdemaeker, & Nicolai, 2003;Slaughter, Barrett, & Boersig, 1996;Uddin & Okazaki, 2004;Wehling et al., 1988). As to the evaluation of the quality of agricultural products, including rice, wheat, corn, rape, and soybean, this technology is widely used (Agelet et al., 2012;Baianu, You, Guo, Costescu, & Prisecaru, 2011;Bao, Cai, & Corke, 2001;Barton, Shenk, Westerhaus, & Funk, 2000;Dowell et al., 2006;Kovalenko, Rippke, & Hurburgh, 2006;Liu et al., 2008;Peiris, Bockus, & Dowell, 2015;Peiris, Dong, Bockus, & Dowell, 2014;Peiris et al., 2010). For the evaluation of soybean quality, AACC International (formerly the American Association of Cereal Chemists) currently recommends near-infrared reflectance method for protein, crude fat, and moisture content analysis in soybean based on intact seed (International, 2010b). However, a number of factors, including sophistication of instruments, sample particle size, moisture content, temperature, and color will affect the outcomes of experiments (Fernandezahumada et al., 2006). Sample particle size and uniformity have been shown to be the main factors affecting the accuracy of NIR analysis and wellcontrolled particle size and uniformity of samples thus provide the basis for the establishment of a good model (Williams & Thompson, 1978). In our study, we found sharp differences among soybeans in terms of grain size and color, especially for the complex and diverse Chinese soybeans, which means that there is a requirement to investigate soybeans qualities in China for the purpose of Chinese standard updating. We decided, therefore, to crush the whole grains of soybeans to determine appropriate particle sizes for the establishment of a diffuse reflectance Fourier transform NIRS (FT-NIRS) prediction model. Soybean quality index models were established using uniform particle sizes to avoid the problems of poor reproducibility and accuracy caused by different varieties, different growing regions, and different grain sizes of soybean samples and to provide reference data for the analysis of soybean quality using FT-NIRS.

| Reagents and apparatus
Concentrated sulfuric acid, sodium hydroxide, boric acid, hydrochloric acid, petroleum ether, bromocresol green, methyl red, anhydrous sodium carbonate, potassium sulfate, copper sulfate, and ethanol were all analytical grade (AR) reagents and were purchased from Sigma-Aldrich Shanghai Trading Co Ltd (Shanghai, China). MB3600 FT-NIR spectrometer was purchased from ABB-Bomem (Quebec, Canada).

| Sample Collection
The 360

| Preparation of soybean sample sets and classification of model samples
The preparation of soybean sample has 90 samples. Each soybean sample was 500 g, and each sample was divided into two equal parts. One part was stored for future use, and the other part was divided into five equal parts. The soybean samples were crushed using a high-speed multifunction mill and screened through mesh sizes of 10, 20, 40, 60, and 80 to provide particles with diameters of 2, 0.9, 0.45, 0.3, and 0.2 mm, respectively. When more than 95% of the particles had passed through the mesh sieve, the individual powders were thoroughly mixed and scanned to determine the best particle size for modeling. The remaining 270 soybean samples were crushed and screened using the best method according comprehensive modeling of the first 90 samples and then divided into a calibration set (n = 216) and an external validation set (n = 54) to establish the best model for determination of moisture, crude fat, and protein content.

| Chemical analysis of soybean samples
The moisture content of the soybeans was determined according to AACC Method 44-15.02 (International, 2010c), the crude fat content was determined according to AACC Method 30-25.01 (International, 2010a), and the protein content was determined according to AACC Method 46-11.02 (protein was determined by the combustion method, with a protein correction factor of %N × 6.25) (International, 2010d). Each sample was analyzed three times, and the final results are presented as mean values.

| Collection of near-infrared spectra
To ensure consistency of the samples used for NIR scanning, the sample thickness was maintained at 2 cm. A high efficiency MB3600 FT-NIR spectrometer, with a scanning spectral range of 3700-15,000/cm and built-in Horizon MB stoichiometric modeling software, was used to collect the spectra of the soybean samples. The spectrometer was turned on and allowed to warm up for 30 min and the spectra were then collected over the range 4000-12,600/cm, at a resolution of 16/cm with 60 scan number, which containing the absorbance regions of the traits of interest (4000-9000/cm for protein, moisture, and fat). Each sample was scanned three times to eliminate differences caused by objective factors.

| Evaluation of the NIR model
The performance of the prediction model was evaluated using an internal cross-validation method, which incorporates root mean square error of calibration (RMSEC), standard error of cross-validation (SECV), and correlation coefficient of cross-validation (R cv ). Smaller values of RMSEC and SECV and higher values of R cv indicate better performance of the prediction model (Ferreira, Galão, Pallone, & Poppi, 2014). External validation is the evaluation of the predictive performance of the calibration model in the validation sample set.
The predictive performance of the model can be evaluated using the determination coefficient (R 2 ) and the statistical probability (p value).
Higher values of R 2 and p values <0.05 indicate better performance of the prediction model.

| Analysis of soybean components
The quality indices, moisture, crude fat, and protein content, for soybeans samples used in the study, are presented in Table 1. For the 90 selected samples, moisture, crude fat, and protein content were 8.47%-10.67%, 17.71%-25.14%, and 37.37%-43.20%, respectively.
For the 216 samples in the calibration set, moisture, crude fat, and protein content were 7.42%-13.71%, 15.78%-25.57%, and 37.37%-43.21%, respectively. For the 54 samples in the external validation set, moisture, crude fat, and protein content were 6.92%-11.24%, 17.75%-25.39%, and 37.04%-43.56%, respectively. The number of samples used for modeling was much higher than 50, which is the minimum sample size proposed for NIRS modeling (Williams et al., 1985). The quality indices, moisture, crude fat, and protein content were widely distributed and were representative of sample composition, providing favorable conditions for the establishment of quality models.

| Spectrogram of soybean samples
NIRS analysis is based on the characteristic absorption bands from combination vibrational frequencies of NH, CH, OH, and CO in chemical components of samples in the NIR region (Martin, 1992).
The position of the absorption bands provides information about the chemical composition of the components, and the strength of the absorption band is proportional to the amount of the hydrogencontaining group that is present. The NIR spectra of soybean samples can be used as a basis for quantitative analysis of quality indices.
The distribution pattern of the sample group under investigation is not accurately reflected if the sample size is too small or too large and useful information may be obscured because irrelevant statistical differences are emphasized. As a result, the performance of the model is greatly reduced. Fewer, but more valuable, samples should thus be chosen to ensure the establishment of a model with the best predictive power. Variations in the intensity of the absorption bands for five samples of the same soybean with different particle sizes at different wavelengths in the spectral region 4000-12,600/ cm are shown in Figure 1. The intensity of the absorbance showed a tendency to increase with increasing particle size. Spectral variation was also greater at higher wavelengths, thus affecting the reliability of the NIR prediction model.

| Selection of optimal particle size for soybean modeling
Many studies that describe models for evaluation soybean qual- In our paper, we firstly analyzed the moisture, crude fat, and protein content of whole kernels using NIRS technology, and the results indicated that the R cv of the moisture content model was .971.
However, the R cv values of the crude fat and protein models, which were .520 and .495, respectively, showed the predication ability of these two models was low. So modeling analysis of 90 crushed soybean samples with different size was performed using Horizon MB stoichiometric software, combined with partial least squares (PLS) analysis. Samples were pretreated and the data were then processed using appropriate spectral mathematical procedures, including multiple scattering correction, derivation, detrending, normalization, offset correction, and standard normal variate, to determine the optimal particle size for modeling the moisture, protein, and crude fat content of soybeans. Ninety samples of soybean crushed samples were sieved, the particle size were 0, 20, 40, 60, 80 mesh, the establishment of the appropriate model to find the best modeling particle size, the experiment using Horizon MB stoichiometric software modeling results in Table 2.
As shown in

| Establishment of NIR calibration model
In this step, the calibration model using 216 soybean samples which were crushed to optimal particles size. The data were then processed using appropriate spectral mathematical procedures mentioned before, including multiple scattering correction (MSC), derivative, detrending, normalization, offset correction, and standard normal variate. The internal cross-validation method is used to evaluate the predictive performance of the model. The internal cross-validation verifies the superiority of the detection model by RMSEC, SECV, and cross-correlation coefficient R cv . The smaller the RMSEC and SECV, the larger the R cv , and the better the model predictive performance.
In this study, Horizon MB stoichiometric software modeling and analysis of near-infrared instrument were used to pretreat the calibration sample. After proper mathematical treatment, it can be seen from Table 3 that for the soybean crushed particles, 40 mesh water RMSEC and SECV were the smallest, R cv was the largest, RMSEC was 0.451, SECV was 0.203, and cross-validation correlation coefficient was 0.965. As can be seen from Table 4, the calibration curve of the normalized normalized multiple scattering of 60-mesh crude fat for soybean smash particles is the best, RMSEC and SECV are the smallest, R cv is the largest, RMSEC is 0.735, the SECV was 0.540, and the cross-validation correlation coefficient R cv was .922. As can be seen from Under the best processing method, the R cv values for moisture, crude fat, and protein content were .965, .922, and .920 shown in Table 6. However, some outliers will inevitably occur in the establishment of a prediction model using NIR spectral data, and the presence of these outliers will seriously affect the accuracy of the prediction model. To avoid the elimination of outliers by mistake, the soybean quality value and spectrum of outliers were measured again.
If it is still an outlier, it is permanently removed from the calibration set; otherwise, the sample is retained. Results of the corrected NIR calibration model are shown in Table 6 and the data based on the corrected model are shown in Figure 3.

| External validation of NIR soybean model
SPSS linear regression analysis was performed on selected soybean samples as externally validated data and experimentally determined chemical values. The Anovab variance table was mainly used for the F test of regression linearity. The statistics F means square regression and mean residual sum of square. If the F value is too small, indicating that the explanatory power of the independent variables to the dependent variable is very poor, fitting the regression line is meaningless. If the smaller the probability value sig, the more obvious the linear correlation is.
As can be seen from

| CON CLUS ION
In this paper, we concluded that achieving a uniform particle size by crushing and sieving provides a good solution to the problem of poor reproducibility of prediction models for soybean quality indices caused by individual differences in samples among varieties. We have studied optimal particle sizes for NIR models of moisture, crude fat, and protein content of soybeans, using FT-NIRS technology. The external validation results of calibration models using soybean samples from Northeast China and the Yangtze River Basin indicated that models with unified particle sizes showed significant predictive ability for various components in soybean samples of different varieties, from different regions, and with different sizes. Such models showed very high prediction accuracy and reproducibility for soybean moisture, crude fat, and protein content, with external validation R 2 values of .966, .958, and .958, respectively. Both internal cross-validation and external validation were performed for these models. The predictive performance of the models, established using the soybean calibration set on samples of the external validation set, was found to be credible, indicating that the NIRS detection models for the determination of the main soybean components are feasible and can be used for rapid determination of the components of soybean.

CO N FLI C T O F I NTE R E S T
There was no conflict of interest.