A Comparison of Spectral Preprocessing Methods and Their Effects on Nutritional Traits in Cowpea Germplasm

Cowpea (Vigna unguiculata L. (Walp)) is a multipurpose legume, which has good nutritional properties. Nutritional parameters assessed conventionally can be labour intensive, costly and time taking for germplasm screening. Near‐infrared reflectance spectroscopy (NIRS) is a rapid and nondestructive method, which can facilitate high‐throughput germplasm screening. In our study, estimation of amylose and sugars has been done using NIRS. Two preprocessing methods, that is, SNV‐DT (standard normal variate with detrending) and MSC (multiplicative scatter correction), were performed for optimization of the original spectra. Subsequently, MPLS (modified partial least square) regression method was employed to construct the prediction models. In amylose, the best RSQexternal (coefficient of determination) (0.962) was found in SNV‐DT with mathematical treatment 3,8,8,2. The same result was shown in sugar where the best RSQexternal (0.914) was found in SNV‐DT with mathematical treatment 3,4,4,1. Overall, in the case of amylose and sugars, SNV‐DT was found to be a good preprocessing treatment than MSC. Paired t‐test values in all the treatments for both the preprocessing methods were > 0.05 indicating their reliability. High RSQexternal values for both the traits imply the applicability of the prediction models. Thus, these models can facilitate high‐throughput germplasm screening in different national and international crop improvement programmes focusing on quality traits.

the Fabaceae family, which can facilitate to provide sustainable benefits.This crop is well known as a nutrient-rich indigenous vegetable in Africa and has the potential to enhance food and nutrition security in developing nations like India.Originally from West and Central Africa, cowpeas have spread to Latin America and South East Asia through cultivation and production (Ano and Ubochi 2008;Edeh and Igberi 2012).Cowpea is a legume that stands out for its elevated protein content (ranging from 19 to 27.3 g/100 g) (Padhi, Bartwal, et al. 2022) and lowfat content (1 to 1.2 g/100 g) (Jayathilake et al. 2018) along with good amount of carbohydrates, dietary fibre and micronutrients.Among carbohydrates, it has good quantities of amylose and sugars where amylose content affects the physicochemical and processing properties of pulse.Apart from these factors, it also affects the tendency to retrogradation in legume starch (Singh 2017).This property of legume starch leads to their resistance against digestive enzyme activity, which, in turn, causes a decrease in the glycaemic index (Singh, Dartois, and Kaur 2010).Hence, it is helpful for the diabetic patients who consume legume starch.Amylose content in cowpea generally ranges from 9 to 25 g/100 g (Adebooye and Singh 2008;Hamid et al. 2015;Hoover et al. 2010;Padhi, Bartwal, et al. 2022).Sugar plays a vital role in enhancing tolerance to abiotic stress, improving storability and providing energy for the human body.Additionally, it contributes to the desirable taste and texture experienced during cooking.The presence of soluble sugars is closely associated with carbohydrate metabolism, photosynthesis and overall seed development (Wilcox 2001).In previous studies, it ranged from 1.1 to 8.73 g/100 g (Nassourou et al. 2017;Omueti and Singh 1987;Padhi, Bartwal, et al. 2022;Weng et al. 2018).Apart from having these many multiple benefits of sugar, the presence of oligosaccharides limits the consumption.Due to the presence of α-galactosidic bonds, these oligosaccharides remain undigested in the human body, which causes flatulence and discomfort (Singh 2017).The current trend in the consumption of plant-based protein, along with the considerable yield, warrants a closer examination of the other important nutritional factors like sugar and amylose, which generally affects the physiochemical properties and palatability of cowpea.Understanding these aspects can have several advantages, including the ability to develop value-added products with accurate nutritional claims in the food industry and facilitating the selection of cowpea accessions with higher improved quality in cowpea breeding programmes.However, analysing the nutritional traits manually or conventionally is labour intensive, time consuming, costly and sometimes complicated.These factors can become a large drawback for screening huge germplasm in a quality breeding programme.
Near-infrared reflectance spectroscopy (NIRS) can overcome the above disadvantages for analysing the nutritional composition as it is a rapid, nondestructive, cost-effective, noninvasive, safe, simple method, finding increasing applications across various disciplines and matrices on a global scale and can screen huge germplasm (Hang et al. 2022;Johnson, Walsh, and Naiker 2020).This technique is based on various functional groups and bonds present in the sample absorbing electromagnetic radiation in the near-infrared (NIR) region, specifically within the wavelength range of 780-2500 nm (Osborne 2006).The presence of multiple overlapping NIR absorption bands in biological samples results from a combination of vibrations and overtones of functional groups such as S-H, O-H, N-H and C-H (Tomar et al. 2021).These methods prove valuable in analysing molecular interactions among functional groups and obtaining chemical insights into the material (Shi et al. 2019).An issue commonly encountered in the practical implementation of the NIR technique is the interference caused by various factors during the signal acquisition process, which can disturb the spectra (both baseline shifts and nonlinearities).The various factors are spectral molecular vibration, mathematical treatments and statistical methods.Preprocessing the spectral data constitutes a critical step in developing a reliable prediction model, which results in an improved signal-to-noise ratio, increased signal variation and the elimination of irrelevant sources not related to the property of interest (Rinnan, Van Den Berg, and Engelsen 2009).Given the complexity of samples in food crops and the real measurement conditions, distinguishing interferences from spectra can be challenging.Spectral interferences can be perceived as a blend of numerous additive components, multiplicative elements, polynomial baseline shifts and spectral noise (Bi et al. 2016).Hence, empirical methods like derivatives, multiplicative scatter correction (MSC) and standard normal variate (SNV) are widely used for spectral preprocessing (Barnes, Dhanoa, and Lister 1989;Bellon-Maurel et al. 2010).A comparison of preprocessing treatments has been done for prediction of soil organic matter (Carvalho et al. 2022), soluble solid content in hard kiwi (Sarkar et al. 2020) and soil organic carbon (Dotto et al. 2018).Various types of derivatives effectively enhance the spectral resolution of a biological matrix like cowpea.This enhanced resolution not only benefits signals from regions with responses but also amplifies spectral areas without responses, which could lead to the models becoming over fitted.MSC and SNV possess the ability to eliminate both additive and multiplicative impacts within spectra.In SNV, each spectrum is centred and subsequently scaled by its corresponding standard deviation (Barnes, Dhanoa, and Lister 1989).The transformation via SNV helps reduce the multiplicative effects of scattering.On the other hand, MSC establishes a reference spectrum, often the mean spectrum of calibration data, and aims to rectify baseline and amplification effects concerning the reference spectrum for each individual spectrum (Geladi, MacDougall, and Martens 1985;Marten, Shenk, and Barton 1985).Postscatter correction and different regression methods such as MPLS (modified partial least square), PLS (partial least square) and PCR (principal component regression) are commonly used to correlate the biochemical components with spectral data, and the best model is selected on the basis of coefficients of determination, majorly, coefficient of determination (RSQ), residual prediction deviation (RPD) and bias values.
Various NIRS-based prediction models have been developed and reported in multiple crops such as rice, pearl millet, amaranth, buckwheat, mung bean, cowpea and faba bean (Bartwal et al. 2023;John et al. 2022;Johnson, Walsh, and Naiker 2021;Padhi, John, et al. 2022;Shruti et al. 2023;Tomar et al. 2021).However, no literature could be traced concerning the development of NIRS calibration models for total amylose and sugar contents in cowpea.Therefore, the present study includes the quantification of amylose and sugar through NIRS prediction model development, simultaneously assessing two preprocessing methods (MSC and SNV) and their effects on each trait.
Based on the statistical results, the constructed models would be efficient in characterising cowpea for its high, low or medium amylose and sugar contents.

| Sample Collection and Selection
A total of 475 cowpea accessions was taken from the mediumterm storage of the National Gene Bank at ICAR-NBPGR, New Delhi, India, comprising both indigenous and exotic collections.The samples in this cowpea diversity panel were derived from various genotypes, exhibiting significant diversity in adaptation and other characteristics.These accessions were cultivated at the Issapur experimental farm in New Delhi, India, following standard agronomic practices using an augmented block design (Padhi, Bartwal, et al. 2022).Once matured, the seeds were collected and thoroughly sun dried to a grain moisture content of 8%-10% and stored at 4°C.All the 475 cowpea accessions were scanned using FOSS NIRS 6500, and the reflectance spectrum was recorded.A novel approach was used by John et al. (2022) where representative samples were selected based on the stratified purposive sampling.Hence, hierarchical clustering analysis was done by Ward's method using Euclidean squared distance of 5, and clusters were made using the reference spectrum.From those clusters and subclusters, 121 diverse cowpea accessions were selected for reference analysis and NIRS scanning.These samples underwent homogenisation, grinding and sieving using a 1-mm sieve in a FOSS Cyclotec machine.The resulting flour was utilised for scanning and wet chemistry analysis.

| Total Amylose Content
The total amylose content was estimated by iodometric method as described by Juliano et al. (1981) with several modifications (John et al. 2023).Fifty milligrams of the sample was weighed and placed in a centrifuge tube.Absolute ethanol (0.5 mL) was added, followed by the addition of 1-N NaOH (4.5 mL), and the mixture was vortexed and incubated at boiling temperature for 15 min.After rinsing with distilled water, the volume was adjusted to 25 mL, and 0.5 mL of the sample was transferred to amber tubes.To this, 100 μL of 1-N glacial acetic acid and 200 μL of iodine solution were added, and the volume was adjusted to 10 mL.The amber tubes were left in the dark for 20 min.The reactions generated a blue-coloured complex, and the absorbance was recorded at 620 nm.

| Total Sugar Content
Total soluble sugars were estimated using anthrone reagent method as described by Dubois et al. (1956).Sample (0.1 g) was taken in falcon tubes followed by the addition of 5 mL ethanol, vortexed and kept in water bath at 80°C for 30 min.The tubes were cooled and centrifuged at 12,000 rpm for 10 min.
The supernatant was collected and extracted two more times following the same method.The supernatant was collected in fresh tubes, and the final volume of the supernatant was made to 10 mL.Extract (0.1 mL) in triplicates was taken from the aliquot, evaporated at 100°C and reconstituted with 1 mL of double distilled water.A blank was prepared with the addition of 1-mL double distilled water.Standard solutions containing varying concentrations of d-glucose were also prepared.Ice-cold anthrone reagent (4 mL) was added to each set of test tubes (sample, blank and standards), followed by incubation in a water bath at 80°C for 8 min.The reaction generated a green-blue colour, spectrophotometrically quantified at 630 nm.The total soluble sugars were expressed as g/100 g.

| Spectral Acquisition
The FOSS NIRS 6500 spectrophotometer, operated with Win ISI III Project Manager Software 1.50, was calibrated using a 100% white reference tile.Approximately 5 g of homogenised sample was placed in a circular ring cup with a quartz window (3.8 cm in diameter and 1 mm in thickness) and scanned 32 times at wavelengths ranging from 400 to 2500 nm.The average spectrum was recorded as log (1/R) with increments of 2 nm, where R denotes the respective reflectance.

| Outlier Detection
The outlier detection method involved the use of neighbourhood Mahalanobis distance (NH < 0.6) and global H (GH), which represents the spectral distance from the mean spectrum of the population (GH > 2.5).NH calculates the proximity of each sample to all other samples in the population.Samples with scanning errors produce abrupt spectra, making them outliers for any trait.The removal of superfluous spectra from the calibration population was done by GH (Bartwal et al. 2023).Before calibration development, the sample set without having any outliers was carried forward to make calibration and validation set.

| Selection of the Calibration and Validation Set
The spectral data (without any outliers) and data from reference methods of 121 samples were imported in the WinISI III project manager 1.50 (Windows Infra Soft.International, USA), which was used to perform spectral data preprocessing, build calibration and validation models.The samples were arranged in an ascending order, and every second value was taken out to make the validation set.Hence, the calibration and validation set were set in a ratio of 2:1 after the arrangement ensuring uniform variability in the set (John et al. 2022).The samples were divided into two sets, that is, 80 in the training set and 41 in the validation set.

| Preprocessing of the Spectra
The preprocessing of spectral data plays a crucial role in eliminating or diminishing unwanted artefacts in the spectra while maintaining the linear connections between multivariate signals (reflectance) and soil characteristics.These mathematical methods encompass several stages, including data cleansing, weighting, standardisation, scatter correction, elimination of nonlinear trends, smoothing and the calculation of derivatives.During the analytical stage of data processing, the spectra undergo a smoothing process to minimise noise or measurement inaccuracies.Following the removal of the noisy segment, the remaining spectra were then subjected to a smoothing procedure using the Savitzky-Golay transformation (Savitsky and Golay 1964) with a second-order polynomial applied over a sliding window of seven smoothing points.Several preprocessing techniques are there to improve the prediction of calibrated models like MSC, inverse MSC, extended MSC, detrending, SNV, normalisation and so forth.To evaluate which is better, MSC and SNV-DT were compared in our study as MSC followed by SNV is the most widely used preprocessing technique (Rinnan, Van Den Berg, and Engelsen 2009).The MSC preprocessing technique, first introduced by Marten, Shenk, and Barton (1985), alleviates nonlinearities present in spectral data due to the influence of both additive and multiplicative scattering effects caused by particles in the samples.This approach aligns each spectrum with a reference spectrum in a way that ensures consistent baseline and amplification effects, with their averages being identical in each spectrum (Isaksson and Naes 1988).SNV, a technique applied for mitigating scattering effects, achieves this by centring and scaling each spectrum, as outlined by Barnes, Dhanoa, and Lister (1989).In contrast, DT is a transformation designed to eliminate the mean value or linear trends within spectroscopic data.

| Calibration of the Model
WinISI project manager software v 1.5 was used to develop calibration equations using multivariate analysis by regressing spectral values with reference values.For both the preprocessing methods, MPLS regression was used to develop calibration equation with full spectra.Different mathematical treatments were used for SNV-DT and MSC '1,4,4,1', '1,8,8,2', '2,4,4,1', '2,8,8,2', '3,4,4,1' and '3,8,8,2'.The first digit corresponds to the derivative where the first, second and third derivatives were used.The second digit corresponds to the gap (data calculated by the specific deviation) where the fourth and eighth derivatives have been used.The third and fourth data points correspond to smoothening 1 (S1) and 2 (S2).Various parameters, including coefficient of determination (RSQ internal ), standard error of cross-validation (SEC(V)), standard deviation (SD) and one minus variance ratio (1 − VR), were used to evaluate the developed calibration equations.

| Model Validation and Accuracy
The validation set contained 41 samples, which were used to evaluate the predictive performance of the calibration equations.The accuracy of amylose and sugar prediction and the effectiveness of various preprocessing techniques were assessed based on the coefficient of determination (RSQ external ), SEP, RPD and bias values.

| Statistical Analysis
Win ISI III Project Manager software 1.50 was utilised to perform all calibrations and predictions by applying various mathematical treatments to both spectral and analysed data.Coefficient of determination (RSQ internal/external ) for reference versus predicted values of amylose and sugar other than WIN ISI was confirmed separately using MS Excel.To evaluate the prediction accuracy of the model, a paired t-test was conducted using Jamovi software at a 95% confidence interval.The result of the paired t-test was expressed in the form of a p-value.

| NIR Spectral Characteristics
The raw spectra of 121 homogenised cowpea samples are given in Figure 1A over the spectral range of 1100-2400 nm.The absorption phenomena in the NIR region comprise overtone and combination bands arising from various molecular groups like C-H, O-H, N-H, C=O, S-H and so forth (Williams, Manley, and Antoniszyn 2019).The NIR spectrum can be categorised into three distinct regions.The initial region, labelled as Region I, spans from 800 to 1200 nm.This portion, also referred to as the 'short-wave NIR region (SWNIR)', 'near-NIR region (NNIR)' or 'the Herschel region', showcases bands originating from electronic transitions, overtones and combinations of modes.Region II encompasses the range from 1200 to 1800 nm and encompasses the first overtones of XH (X = C, O, N), stretching vibrations and diverse types of combination modes.Lastly, Region III (1800-2500 nm) is characterised by combination modes.Numerous applications make use of Regions II and III for their analysis (Ozaki 2012).NIR spectroscopy effectively demonstrated accurate outcomes concerning the distinct absorption bands of sugars, specifically at wavelengths of 1200, 1437, 2074 and 2320 nm (López, García-González, and Franco-Robles 2017).Distinctive absorption peaks were observed approximately at 1930 nm, indicating the combined effects of bending and stretching of O-H in amylose.Additionally, a peak in the vicinity of 1566 nm corresponds to the combined symmetric stretching of O-H in amylose (Tomar et al. 2021).

| Descriptive Statistics
The statistics and histogram generated by the reference values are given in Table 1 and Figure 2A,B, respectively.In order to establish robust calibrations, Williams, Manley, and Antoniszyn (2019) suggested that an even distribution of chemical measurements is preferable over a Gaussian distribution.This is because an equal representation of all values within the calibration set is deemed more favourable as done by Sánchez-Carnerero Callado et al. (2018).Hence, the effect by which the results of future analyses show to regress towards the mean will be avoided.The histograms of frequencies showed in general a continuous distribution of the reference values across the concentration intervals.Broad range of variability is shown in the statistics ensuring the complete representativeness.The values of amylose ranged from 10.02 to 21.7 g/100 g having a mean value of 15.7 g/100 g.The higher amylose values in our study correspond to the results obtained by Hamid et al. (2015) for red and black cowpea cultivars.Sugar content in our study ranged from 2.54 to 8.13 g/100 g with a mean value of 5.38 g/100 g.The results were in agreement with the work done in 113 cowpea samples by Weng et al. ( 2018) (3.2-8.6 g/100 g).interference and signal overlap.To mitigate the impact of scattering and systematic noise, various preprocessing methods were employed in the experimental data.Figure 1D presents spectra that have undergone multiple scatter correction (MSC), resulting in a significant reduction of light scattering interferences compared to the original experimental spectra.In Figure 1E, SNV spectra is displayed, where, similar to derivative spectra, most baseline offsets have been eliminated.

| Application of the Two Preprocessing Treatments on the Data
Notably, the advantage here is that there is no requirement for preselecting a wavelength interval, as is necessary when applying Savitsky-Golay smoothing in derivative calculations.Furthermore, there are no additional peaks introduced, as occurred in the case of the initial and secondary derivative spectra.Consequently, the direct interpretation of spectra is more straightforward for SNV spectra (Figure 1E) in comparison to the first and second derivative spectra (Figure 1B,C).When we visually compare MSC and SNV spectra (Figure 1D,E), we observe a decrease in the slope of the SNV spectra, which is associated with scatter-related multiplicative interferences and variations in particle size, as compared to the slopes of the raw and MSC spectra.Additionally, SNV spectra does not depend on the spectral mean values.

| Comparison of Calibration Data of the Two Preprocessing Treatments
Preprocessing of NIR spectral data has become an integral part of NIRS prediction modelling.The objective of the preprocessing is to remove physical phenomena in the spectra in order to improve the subsequent multivariate regression, classification model or exploratory analysis (Rinnan, Van Den Berg, and Engelsen 2009).Several physical factors may involve the size and organisation of the particle causing a variation in the pathlength (Barnes, Dhanoa, and Lister 1989).The application of different preprocessing techniques on spectral data before modelling serves multiple purposes.It not only helps reduce the impact of noise and external interference but also enhances the spectral features associated with the properties of interest.As a result, this improvement in spectral information leads to increased prediction accuracy for calibrated models (Nocita et al. 2015).First derivative can remove a constant baseline (offset), and a second derivative can additionally eliminate slope.To mitigate noise, it is often necessary to apply smoothing before performing the derivation process.While dealing with complicate interferences or if incorrect smoothing parameters are applied, the outcome of the derivative process could become ineffective (Bi et al. 2016).A total of 24 equations for predicting amylose and sugars was generated by combining six spectral derivative mathematical treatments, that is, '1,4,4,1', '1,8,8,2', '2,4,4,1', '2,8,8,2', '3,4,4,1' and '3,8,8,2' along with two scatter correction algorithms (SNV-DT and MSC).Individual MPLS calibrations were conducted for each parameter, that is, amylose and sugar, utilising the calibration set consisting of 80 samples.Several statistics (SEC(V), RSQ internal ) were used to describe the performance of the calibration equation as given in Tables 2 and  3, respectively.For amylose, the highest RSQ internal (0.793) was obtained in the case of MSC in the mathematical treatment of 2,4,4,1 followed by RSQ internal (0.792), which was found in the case of SNV-DT in the mathematical treatment of 1,4,4,1.In both these treatments, the principal components (PCs) were 3 (Table 2).Unlike amylose, for sugars, the highest RSQ internal (0.943) was found in the case of SNV-DT in the mathematical treatment of 3,4,4,1 while in MSC, the highest RSQ internal (0.864) was found in the mathematical treatment of 2,4,4,1 having PCs 8 in both the cases (Table 3).Overall, in calibration set, SNV-DT provided a better result than MSC while in amylose, MSC has provided better results than SNV-DT.

| Comparison of Validation Data of the Two Preprocessing Treatments
The validation was done on 41 samples for the given traits in two preprocessing treatments.We have tried not to remove any outliers in the validation process ensuring robustness of the model.Different statistical parameters for amylose and sugars were used to choose best fit models like RSQ external , RPD and SEP(C) as given in Tables 4 and 5, respectively.In the case of amylose, the highest RSQ external (0.962) was found in third derivative treatment in SNV-DT (Ano and Ubochi 2008;Bagchi, Sharma, and Chattopadhyay 2016;Callado et al. 2018).Whereas in the case of MSC pretreatment, the highest RSQ external (0.959) was found in the third derivative (Ano and Ubochi 2008; Bagchi, Sharma, and Chattopadhyay 2016;      4).Previous studies have been carried out for amylose prediction in rice with best model in the second derivative using SNV-DT, that is, RSQ external = 0.540 (Bagchi, Sharma, and Chattopadhyay 2016), while Xie et al. (2014) found the best equation in MSC than SNV-DT for both brown (RSQ = 0.920) and milled flour (RSQ = 0.920) in the second-order derivative.These discrepancies may be found due to spectral characteristics, the spectral measurement methods, preprocessing techniques and calibration methods.However, as compared to these studies, we have got the high RSQ external in both SNV-DT and MSC.In the case of sugars, the best RSQ external (0.914) was found in the third derivative treatment (Adebooye and Singh 2008;Bagchi, Sharma, and Chattopadhyay 2016;Barnes, Dhanoa, and Lister 1989) in SNV-DT, while for MSC, the best RSQ external (0.830) was found in the second derivative (Adebooye and Singh 2008;Ano and Ubochi 2008;Barnes, Dhanoa, and Lister 1989) (Table 5).
Overall, in the case of amylose and sugars, SNV-DT was found to be a good preprocessing treatment but in higher derivatives.The limitation of MSC lies in the fact that it attempts to match a spectrum to an idealised reference spectrum.This can pose challenges when dealing with spectra from diverse fractions that need to conform to the same reference spectrum (De Groot et al. 2001).Similar studies of comparison of preprocessing treatment have been done by Miloš, Bensa, and Japundžić-Palenkić (2022) in studying soil organic carbon, cation exchange capacity and clay.This study has found DT more robust than MSC in all the parameters as it removes the mean value or linear trends in spectroscopic data of densely packed solids more efficiently than MSC (Barnes, Dhanoa, and Lister 1989).Kho et al. (2020) did a similar kind of study with two regression algorithms, that is, PLS and SVM, and found SNV-DT preprocessing better than MSC along with least LVs, that is, 2. One more reason with respect to reference and baseline correction is that, in MSC, both corrections are applied simultaneously not consecutively.Hence, MSC will generally give a smaller baseline correction than SNV-DT (Rinnan, Van Den Berg, and Engelsen 2009).
Detrending is known to reduce the curve linearity of powdered and packed samples along with correction of baseline shifts.Simultaneously, SNV-coupled detrend could also compensate additional baselines shifts.SNV-DT was found to be superior for prediction modelling in the case of sugars.The regression plots of best prediction models for both the traits and preprocessing treatments are given in Figure 3A-D.Only RSQ external would not be sufficient for testing the prediction accuracy, and RPD is used for checking the robustness of the models.Based on their RPD values, it is suggested that values ranging from 2.5 to 2.9 are classified as fair and suitable for preliminary assessment, while values between 3 and 3.4 are deemed suitable for quality control.Those exceeding 3.5 are considered fitting for process control, and those surpassing 4.1 are regarded as excellent (Chadalavada et al. 2022).Conversely, calibrations with RPD values below 2 are labelled as inadequate and not advisable.In this study, for mathematical treatments '2,4,4,1', '2,8,8,2' and '3,8,8,2' in amylose for SNV-DT preprocessing, RPD values were found to be more than 4 indicating excellent prediction models, while, for MSC preprocessing, mathematical treatments '2,8,8,2' and '3,8,8,2' RPD values were found to be more than 4 indicating excellent prediction models while for the rest of treatments, the RPD values exceeds 3.5, which will be considered fitting for process control.
However, in the case of sugars in SNV-DT preprocessing for mathematical treatment '3,4,4,1', the RPD value was in between 3 and 3.4, which is deemed suitable for quality control.
For mathematical treatments '2,4,4,1','2,8,8,2' and '3,8,8,2',the RPD values come in between 2 to 3.0, which is considered as fair and suitable for screening purpose.For the rest of mathematical treatments, the RPD was less than 2, which was not advisable.While, in case of MSC, the RPD values were not good as SNV-DT in sugars, for mathematical treatments '2,4,4,1' and '3,8,8,2',the RPD values were in between 2 and 3, which is considered as fair and suitable for screening purpose.And for the rest of the treatments, RPD values were less than 2, which is not considered advisable.Clearly in the case of sugars, SNV-DT got excellent results than MSC for prediction.
Paired t-test has been done to determine if the average of a dependent variable matches the analytical and predicted values of the tested biochemical parameters, at a 95% confidence level.In our investigation, the resulting p-value exceeded 0.05, affirming the precision and dependability of the models.The p values of amylose and sugars have been given in Tables 4  and 5, respectively.A p-value greater than 0.05 indicates the rejection of the null hypothesis, suggesting that the difference between the means of predicted and reference values is not significantly different, while p-values below 0.05 indicate otherwise.

| Conclusion
In the present study, the ability of NIRS to quantify amylose and total sugars in cowpea was assessed.Through the course of present study, it can be said that NIRS chemometrics coupled with SNV-DT were proved to be efficient in extracting the spectral features of cowpea spectra.The spectroscopic models of amylose and sugars in cowpea performed well in external validation as evident by their respective coefficients of determinations.Spectral preprocessing proved to be the major contributor in the development of robust prediction models for both the traits.
In the future, it will be required to select specific wavelength regions, multivariate regression techniques and coupled scatter correction techniques, which would increase the prediction ability of NIRS models.

Figure
Figure1B,C depicts the first and second derivatives derived from the unprocessed absorbance spectra of the calibration samples given in Figure1A.These two figures evidently illustrate a general enhancement in addressing baseline

FIGURE 1
FIGURE 1 | (A) Raw spectra of 121 cowpea homogenised samples.(B) First derivative spectra without preprocessing.(C) Second derivative spectra without preprocessing.(D) Raw spectra of 121 cowpea samples with MSC preprocessing treatment.(E) Raw spectra of 121 cowpea samples with SNV-DT preprocessing treatment.

FIGURE 3
FIGURE 3 | (A-D) Regression plot for amylose and sugar.

TABLE 1 |
Descriptive statistics of reference values for developing NIRS prediction models.
Note: All values are expressed in g/100 g.FIGURE 1 | (Continued)

TABLE 4 |
Validation statistics for amylose content.

TABLE 3 |
Calibration statistics for sugar content.

TABLE 2 |
Calibration statistics for amylose content.

TABLE 5 |
Validation statistics for sugar content.
Callado et al. 2018 multiplicative scatter correction; RPD, residual prediction deviation; RSQ, coefficient of determination; SEP(C), standard error of prediction; SNV-DT, standard normal variate with detrend.Callado et al. 2018) (Table Williams, Manley, and Antoniszyn (2019)have given a suitable criterion to know the accuracy of model prediction.According toWilliams, Manley, and Antoniszyn (2019), models are deemed excellent if their RSQ external value surpasses 0.91, and they are considered good for prediction if the RSQ external falls between 0.82 and 0.90.RSQ external values ranging from 0.66 to 0.81 suggest approximate quantitative predictions, while values between 0.50 and 0.65 indicate that over 50% of the variance in Y is explained by variance in X, enabling differentiation between high and low concentrations.So, in case of amylose, RSQ external surpasses 0.91 for both SNV-DT and MSC indicating the excellent prediction capacity of the models.However, in case of sugars, RSQ external surpasses 0.91 for SNV-DT only indicating the excellent prediction capacity in this pretreatment, but in the case of MSC, RSQ external falls between 0.82 and 0.90 indicating a good prediction model.