Assessment of bioactive compounds in faba bean using infrared spectroscopy

Faba bean (Vicia faba) is growing in popularity in Australia, partly due to its higher levels of health‐benefiting compounds compared to other grain crops. This study investigated infrared spectroscopy for predicting levels of bioactive compounds such as antioxidants and phenolics in faba bean flour. Calibration models were performed on 60 samples of faba bean, comprising 10 varieties grown across two field locations in 1 year. For model validation, an independent test set comprising the same varieties grown in a different year was utilised. Near‐infrared spectroscopy (NIRS) showed promise for the prediction of total phenolic content, with an R2pred of 0.66 and root mean square error of prediction (RMSEP) of 76 mg/100 g. Similarly, prediction of ferric reducing antioxidant power, a measure of antioxidant activity, gave an R2pred of 0.59 and RMSEP of 87 mg/100 g. Additionally, moving window optimisation was used to determine the most important wavelength region for the prediction of these analytes. Fourier transform infrared spectroscopy did not yield any suitable models for the analytes investigated. Although the NIRS models developed were not capable of exactly quantifying phenolic or antioxidant content, infrared spectroscopy appears useful for rapidly discriminating between samples containing high and low levels of phenolics or antioxidant compounds. With further refinement, this technique could potentially be applied for the quality assurance of phenolic content or antioxidant capacity in faba bean seeds.


| INTRODUCTION
Faba beans (Vicia faba L.) are growing in popularity in Australia, in part due to their significant potential for the development of value-added foods and ingredients.Marketed partly based on their enhanced nutrition and health benefits compared to common grains such as wheat (Triticum aestivum L.), faba beans comprise 10-15% of total pulse production in Australia (AEGIC, 2021;Australian Export Grains Innovation Centre, AEGIC, 2017).In addition to meeting or exceeding dietary requirements for all essential minerals except calcium (AEGIC, 2021), they contain relatively high levels of beneficial antioxidant and phenolic compounds, which may provide protection against reactive oxygen species, as well as anti-hypertensive and anti-cancer effects (Martineau-Côté, Achouri, Karboune, et al., 2022;Martineau-Côté, Achouri, Wanasundara, et al., 2022;Siah et al., 2012;Turco et al., 2016).However, the presence of anti-nutritional factors, principally the pyrimidine glycosides vicine and convicine, can also negatively impact on the health benefits associated with this crop (Duc et al., 1999;Khazaei et al., 2019;Wang & Ueberschär, 1990) and cause haemolytic anaemia in individuals with favism (Cappellini & Fiorelli, 2008;Pulkkinen et al., 2016;Sheikh et al., 2021).Consequently, there is a widespread need for closely monitoring the levels of both health-benefitting and anti-nutritive compounds in faba bean.
Traditionally, the concentrations of such compounds have been determined through time-consuming, expensive chemical methods.
Near-infrared spectroscopy (NIRS), which uses wavelengths between 750 to 2,500 nm (Pasquini, 2003), has been successfully used in the quality determination of food products for decades (Bureau et al., 2019;Pallone et al., 2018;Subramanian & Rodriguez-Saona, 2009).Infrared (IR) light is emitted by the instrument and penetrates the sample, where certain IR wavelengths are absorbed by chemical bonds present within the sample.The remaining wavelengths are reflected (or transmitted) back to the instrument.Consequently, it is possible to determine the identity and relative quantity of specific chemical bonds present in the sample based on the location of peaks and their relative absorbance across the infrared spectrum.
A similar technique to NIRS is mid-infrared spectroscopy (MIRS), which works on the same principles as NIRS, but utilises wavelengths between 2,500 and 25,000 nm (usually described as wavenumbers between 4,000 and 400 cm À1 for convenience).Although MIRS offers greater chemical sensitivity due to the wider range of characteristic absorption peaks for common functional groups, its drawbacks include reduced sample penetration and lower reproducibility (Johnson, 2022).
A number of studies have previously applied NIRS to the quality analysis of the faba bean crop (Johnson, Walsh, & Naiker, 2020), including the analysis of protein (El-Sherbeeny & Robertson, 1992;Williams et al., 1978), tannins (de Haro et al., 1988) and starch and oil content (Wang et al., 2014).However, there have been very few studies investigating the prediction of bioactive compounds in this matrix.Wang et al. (2014) did use NIRS for the prediction of total phenolic content in Chinese faba bean, although the study only used a dependent test set and the reported model accuracy is therefore likely to be significantly higher than its true performance if applied to an independent 'realworld' test set.Furthermore, there is no published literature on the prediction of antioxidant capacity in faba bean.Similarly, no studies have explored the use of MIRS for analysing bioactive compounds in this crop (Johnson, Walsh, & Naiker, 2020).Due to the relatively high concentrations of alkaloids in the faba bean matrix (including vicine and convicine) (Skylas et al., 2019), these may interfere with the IR signal of other phenolic or antioxidant compounds, meaning that NIR/MIR models reported for other pulse or grain matrices may not be directly applicable to faba bean.Hence, this study aimed to advance the state of research in this field through several specific objectives: • To investigate the potential of IR spectroscopy for rapid screening of bioactive compounds in faba bean flour.
• To investigate the potential of IR spectroscopy for rapid screening of anti-nutritive compounds in faba bean flour.
• To compare the relative performance of NIRS and MIRS for these purposes.
This paper expands on our previous work on this topic, which was briefly presented in Johnson et al. (2021).This work may pave the way for the potential application of NIRS/MIRS for the rapid, nondestructive screening of bioactive constituents in faba bean in industrial/commercial settings.and moisture content determined as previously described (Skylas et al., 2019).All measurements were expressed on a dry weight basis.
The MIR spectra were collected using from the faba bean flour using a Bruker Alpha FTIR (Fourier transformed infrared) spectrophotometer (Bruker Optics Gmbh, Ettlingen, Germany) between 4,000 and 400 cm À1 , as previously described (Johnson, Collins, et al., 2020).
Five spectra were collected from each sample, repacking the instrument with fresh flour each time.The mean of these replicate scans used in subsequent analysis.
The NIR spectra were collected in reflectance mode, using the integrating sphere with a sample cup on a Thermo Scientific Antaris II FT-NIR Analyzer.The spectra were collected between 1,000 and 2,500 nm (10,000-4,000 cm À1 ), as the mean of 32 scans (resolution of 8 cm À1 ).Spectra were collected in triplicate, repacking the sample cup with fresh flour each time.Spectra were exported in *.csv format, with the mean of the triplicate spectra for each sample used in subsequent analysis.
Chemometric analysis of the NIR spectra was conducted in R Studio running R 4.0.5 (R Core Team, 2023), using the spectrolab and prospectr packages.Spectra were pre-processed using the first and second derivative method using a Savitzky-Golay algorithm (Savitzky & Golay, 1964) with varying numbers of smoothing points (5, 11, 15 or 21).
For model development, the 2017 samples (n = 60) were used for the calibration set, while the 2016 samples (n = 40) were used as an independent test set.A maximum of 10 components were considered for each PLS-R model.
Moving window analysis was conducted on the NIR spectra in R Studio using a custom script (Johnson, Mani, et al., 2023), with a 10 nm interval.

| Descriptive statistics
The descriptive statistics for the calibration and test sets are provided in Table 1.Although the samples were sourced from 10 different varieties and two different growing sites, the level of variation in many of the analytes was relatively low.For example, there was only $1% coefficient of variation in the protein content for the calibration set The raw NIR spectra (a) and SNVprocessed spectra (b) of the faba bean flour samples.
(2017 samples).Other analytes such as starch, amylose and amylopectin also showed minimal variation, which could lead to difficulty in creating predictive models.The amount of variation in the FRAP and TP contents was moderately high, with coefficients of variation of 34% and 31%, respectively.

| Near-infrared spectra
The raw and pre-processed NIR spectra are shown in T A B L E 2 Optimum PLS-R models found for the prediction of the specified analytes using NIR spectroscopy.explanation is that the analyte signal was unable to be detected in the NIR spectra.Similarly, the models for the TMA, amylose and amylopectin contents showed no predictive power.
Despite the low range of protein contents found among the calibration samples, the PLS-R model for protein showed a high level of linearity (R 2 ) for both the calibration and test sets (Figure 3).The RMSEP of the test set was 0.35, indicating that the model could predict the protein content in independently sourced faba bean samples with ±0.35% absolute error.As shown in Figure 3b, the model was slightly biassed towards under-predicting protein contents (bias = À0.26), which appears to be due to the lack of high-protein content samples found in the calibration set (Figure 3a).Examination of the loadings plot for the protein prediction model (Figure 4) revealed the strongest influence at 1,898 nm, apparently corresponding to the shoulder of the amide A/II region (Gergely & Salg o, 2007).
Compared to the protein model, the PLS-R model for the prediction of TP content did not perform as well (Figure 5), as anticipated for analytes that are present at lower concentrations.Although the cross-validation statistics showed a high linearity (R 2 cv of 0.88), the RMSEP (76 mg GAE/100 g) was almost twice that of the RMSECV (35 mg GAE/100 g), indicating relatively poorer performance of the model on independently sourced samples.This RMSEP value was also considerably higher than the standard laboratory error of the reference spectrophotometric method (mean SD of 8.3 mg GAE/100 g for n = 100 samples analysed in duplicate).The TP content of most samples was under-predicted by the PLS-R model, giving it an overall negative bias of À54 mg GAE/100 g.It is worthwhile noting that despite the broad range of genotypes and growing sites represented in the faba bean samples, there were relatively few samples containing moderate or high TP levels, which is likely the source of bias in this model.Nevertheless, the model could still discriminate between samples with low and high TP contents.
Interestingly, the loadings plot showed a strong influence of wavelengths around 1,901 nm, which may be attributable to the amide bond region, as previously discussed for protein.This may hint that the model is using a secondary correlation between protein and TP content in order to estimate the TP content of the unknown samples.
Other influential wavelengths were 1,960 and 2,149 nm, which may correspond to the first overtones of OH stretch and CH stretch, respectively (Toledo-Martín et al., 2018).These wavelengths were found by previous authors to be important in the prediction of TP content in raspberries using NIRS (Toledo-Martín et al., 2018).
The calibration and test set results for the FRAP PLS-R model (Figure 6) were quite similar to those found for the TP model.Again, the model did not perform as well on the independent test set, with the results limited to discriminating between samples with high or low FRAP values.
The loading plot was almost identical to that observed for the TP prediction (cf.Figures 7 and 8), with the predominating wavelengths being 1,901, 1,960 and 2,149 nm.This indicates that the same analyte(s) were being measured by the TP and FRAP models, which is a logical outcome if the phenolic compounds present in faba bean are primarily responsible for its antioxidant activity.

| Moving window analysis
Given the promising results found using NIR spectroscopy for the prediction of several analytes (principally protein, TP and FRAP), further investigation was conducted into the key NIR wavelengths required for the accurate prediction of these analytes.
The results of the moving window analysis are shown in Similar to previously reported results (Anderson et al., 2020;Johnson, Mani, et al., 2023), comparison of the optimised and F I G U R E 9 Results of the moving window optimisation for the prediction of protein using the NIR spectra.Note that the figure shows RMSECV values.non-optimised wavelength models (Table 3) showed that wavelength optimisation provided a moderate improvement in R 2 and RMSECV values, while retaining a comparable number of model factors (principal components).The optimal wavelength windows generally mirrored the results seen in the loadings plot (Figures 4, 7 and 8).Notably, the optimal wavelength range for both protein and FRAP was within $1,900-2,300 nm, demonstrating that the spectral information above $2,300 nm (which showed significant contributions in the loadings plots for these analytes) was not needed for their accurate prediction.
On the other hand, the optimal wavelength window for TP prediction was much larger (1,360-2,440 nm), which was slightly unexpected given its very similar loading plot to FRAP (compared Figures 7 and 8).
The high accuracy of the calibration results with a narrowed NIR window also suggests that NIR instruments with a limited spectral width (often the case with portable instruments) could nevertheless provide accurate predictions of FRAP and protein if they covered the 1,900-2,300 nm range.

| Mid-infrared spectra
The raw and SNV-processed MIR spectra of the faba bean flour samples are shown in Figure 12.The spectra were more complex compared to the NIR region (Figure 12a), with major peaks centred around 3,250 cm À1 (attributable to OH stretch from moisture), 3,000-2,850 cm À1 (CH 2 and CH 3 stretch), 1,640 cm À1 (C=O stretch of amides or other carbonyl-containing compounds), 1,550-1,200 cm À1 (various amide and phenol bonds) and 1,000 cm À1 (aromatic rings in cellulose) (Abbas et al., 2017;Dufour, 2009;Johnson, Collins, et al., 2020;Karoui et al., 2010;Mecozzi & Sturchio, 2017;Sigma Aldrich, 2019).Again, there was considerable variation in the amplitude of the spectra between different samples, most of which was removed through the SNV algorithm (Figure 12b).
As detailed for the NIR spectra, PLS-R models were created for each analyte, with the optimum pre-processing method determined through the performance statistics for LOO cross-validation.The F I G U R E 1 1 Results of the moving window optimisation for the prediction of FRAP using the NIR spectra.Note that the figure shows RMSECV values.
T A B L E 3 NIR predictions of protein, TP and FRAP, with and without moving window optimisation.Note: The optimal number of factors was selected automatically, not manually as throughout the rest of the study.
a Wavelength optimisation was only performed at 10 nm intervals.
best-performing models for each analyte are detailed in Table 4.
Although the calibration models for moisture and amylopectin (and TP to a lesser extent) showed some potential signal, the models had no predictive power when applied to the independent test set.For this reason, no calibration plots or loading plots are shown.

| DISCUSSION
Overall, the appearance of the NIR spectra from the faba bean flour were quite comparable to that reported by Wafula et al. ( 2020) from common beans (Phaseolus vulgaris L.), while the MIR spectra were similar to those previously reported for faba bean (Johnson, Collins, et al., 2020) and other pulse crops (Carbas et al., 2020;Johnson et al., 2019).The NIRS model for the prediction of protein content performed acceptably on the independent test set, confirming the suitability of this method for the rapid assessment of proximate analysis, as reported by numerous previous authors (Peiris et al., 2019;Wang et al., 2014).The model performance on the independent test set (R 2 pred = 0.86; RMSEP = 0.35%) was quite comparable to the (dependent) test set results reported by Wang et al. (2014)    Although the NIRS models for TP content and FRAP did not perform as accurately as it did for the protein content, they showed some potential for the estimation of these parameters.Several of the key predictive wavelengths (1,960 and 2,149 nm) appeared to correspond to the first overtones of OH stretch and CH stretch of phenolcontaining compounds, while the 1,901 nm wavelength also showed a large contribution.This may indicate that at least part of the predictive power of the model was due to a potential secondary correlation between protein and TP/FRAP content.
There does not appear to be any previous work reporting the non-invasive prediction of antioxidant capacity in faba bean; however, Wang et al. (2014) used NIRS to create prediction models for the TP content in Chinese faba bean.The reported performance on the (dependent) test set was somewhat better than that reported here, with a R 2 pred of 0.78 and RMSEP of 37 mg/100 g.However, this reported accuracy is likely somewhat over-optimistic, as the test set was not sourced independently from the calibration samples.
One of the challenges in creating accurate prediction models for TP and FRAP from the present dataset may stem from the lack of samples containing intermediate TP or FRAP values.This was despite the fact that this dataset included samples from 10 different genotypes and two different growing locations, which would be anticipated to provide a wide range of phytochemical diversity.Consequently, the models developed here would appear to be only useful for screening purposes (e.g., classifying samples into high or low phenolic or FRAP contents) and not the absolute quantification of these analytes.The failure to create a highly accurate and robust model for the prediction of TP content and antioxidant capacity is not unique to this study.Moderate to poor results for the prediction of TP content were reported in cocoa bean (Hernández-Hernández et al., 2021) and in blackberry fruit (Toledo-Martín et al., 2018).
Although MIRS has previously been reported for the prediction of TP content in common bean flour (Carbas et al., 2020) and other sample matrices (Johnson, Mani, et al., 2020), no accurate calibration models could be developed from the MIR spectra in this study for any of the analytes investigated.This concurs with a general dearth of MIR studies reporting the prediction of TP content or antioxidant capacity in any matrix (but particularly grains) among the recent literature (Johnson, Walsh, et al., 2023).Although several of the analytes showed moderate R 2 values in the calibration models, none showed acceptable performance when applied to the independent test set.This is likely due to the difficulty in applying a consistent level of pressure between the sample and the interface of the MIR instrument across all of the samples analysed.In turn, the amount of pressure applied has a strong influence on the signal amplitude and sensitivity.
Consequently, the lack of predictive power observed here appears to be due to the lack of reproducibility in the MIR spectra.The NIR instrument used in this work does not suffer from the same drawback, as the sample can be presented in a sample cup that is placed on top of the instrument, ensuring consistent presentation between different samples.Some authors have solved this issue with MIRS through the use of internal standards, which are mixed with the sample (Bekiaris et al., 2020;Sastre Toraño & Hattum, 2001); however, this would seem to defeat the purpose of IR spectroscopy as a rapid analytical technique with no required sample preparation.Other authors have reported using ATR-MIR for the analysis of bioactive compounds without any special modifications to the instrument or sample matrices (Amanah et al., 2020;Carbas et al., 2020;Cozzolino et al., 2020;Johnson, Mani, et al., 2020), although it should be noted that nearly all of these studies did not confirm the model performance using independent test sets.

| CONCLUSION
This study investigated the potential of NIR and MIR spectroscopy for the rapid prediction of key analytes present in faba bean flour, including protein, TP, FRAP, vicine and convicine contents.Although most of the analytes could not be predicted using NIRS in this study, good performance was found for the prediction of protein content and acceptable performance for the estimation (i.e., high or low content) of TP and FRAP values.Moving window analysis and examination of the loadings plots were used to identify the key wavelengths contributing to the models.None of the MIR models showed acceptable results for any analyte.The results suggest that NIRS may be used for the rapid approximation of TP and FRAP contents in faba bean-alongside its current use for protein determination.
Figure 1.Major peaks were located at 1,200, 1,470 and 1,936 nm, corresponding to the CH second overtone from structural carbohydrates, OH second overtone from moisture and OH first overtone, respectively (Manley, 2014).More minor peaks were observed between 1,700 and 1,840 nm (attributed to the first overtones of CH and CH 2 ), 2,050-2,200 nm (first overtones of the amide A/II and amide I/III bonds in protein) and 2,270-2,370 nm (combination bands of CH, CH 2 and CH 3 bonds)(Gergely & Salg o, 2007).There was some variation in the spectral amplitude between samples (Figure1a), which was successfully removed following application of the standard normal variate (SNV) algorithm (Figure1b).PLS-R models were developed for each of the 11 analytes specified in Table1, trialling 18 different pre-processing methods for each analyte.The following combinations of pre-processing methods were examined: none (raw spectra), SNV smoothed, 1d5 (first derivative with 5 smoothing points); 1d11, 1d15, 1d21 and 2d5 (second derivative with 5 smoothing points); and 2d11, 2d15, 2d21, SNV + 1d5, SNV + 1d11, SNV + 1d15, SNV + 1d21, SNV + 2d5, SNV + 2d11, SNV + 2d15 and SNV + 2d21.For each analyte, the optimum model was selected from the R 2 , RMSECV and ratio of performance to deviation (RPD) values from cross-validation.The optimum number of components ('factors') was identified using scree plots, which show the RMSECV plotted against the number of components (see the example shown in Figure2).The optimised PLS-R models for each analyte are shown in Table2, along with their corresponding figures of merit.A summary of the results obtained on the independent test set is also included in this table, namely, the R 2 pred coefficient, root mean square error of F I G U R E 2 Scree plot showing the selection of the optimum number of components (7) for the PLS-R model for the prediction of TP content.
prediction (RMSEP) and the bias, slope and intercept of the calibration model when applied to the independent test set.The best performing model was found for protein content, followed by TP content, FRAP and then convicine.Certain other analytes (e.g., moisture, starch, amylopectin, vicine and total vicine/convicine) showed reasonable cross-validation statistics for the calibration but much poorer results for the independent test set.This is generally indicative of an over-fitted model or a model that is overly-specific to the calibration data set.As the number of model components was limited to 10 to avoid over-fitting in this study, the most likely F I G U R E 3 (a) Actual versus NIRSpredicted protein contents for the calibration set (n = 60).(b) Actual versus predicted protein contents for the test set (n = 40).F I G U R E 4 Loadings plot for the prediction of protein content in faba bean flour.F I G U R E 5 Actual versus predicted TP contents for the calibration set (a) and test set (b).

F
I G U R E 6 Actual versus predicted FRAP values for the calibration set (a) and test set (b). F I G U R E 7 Loadings plot for the prediction of TP content in faba bean flour.F I G U R E 8 Loadings plot for the prediction of FRAP in faba bean flour.

F
I G U R E 1 0 Results of the moving window optimisation for the prediction of TP using the NIR spectra.Note that the figure shows RMSECV values.
(R 2 pred of 0.94 and RMSECV of 0.33%).However, Wang et al. (2014) used a F I G U R E 1 2 The raw MIR spectra (a) and SNV-processed spectra (b) of the faba bean flour samples.T A B L E 4 Optimum PLS-R models found for the prediction of various analytes using MIR spectroscopy much broader range of protein contents for calibration(23.8-33.1%)compared to those used here (26.5-30.2%),indicating that the model accuracy could be retained when using a narrower calibration range for this analyte.Examination of the model loadings plot confirmed that the selected wavelengths (principally 1898 nm) corresponded to the absorbance of amide bonds, which are found in protein.In other words, the model was looking in the correct region(s) of the NIR spectrum to be able to detect protein.This was comparable toWu et al. (2023), who used this wavelength region in the NIRS prediction of protein content of corn.
: Conceptualization; data curation; formal analysis; investigation; methodology; resources; software; validation; visualization; writing-original draft; writing-review and editing.Kerry B. Walsh: Project administration; supervision; writing-review and editing.Mani Naiker: Funding acquisition; project administration; resources; supervision; writing-review and editing.ACKNOWLEDGEMENTS Thanks to Daniel Skylas and Ken Quail from the Australian Export Grains Innovation Centre (AEGIC) for supplying the faba bean samples used in this study.Open access publishing facilitated by Central Queensland University, as part of the Wiley -Central Queensland University agreement via the Council of Australian University Librarians.
Bias, slope and intercept were not calculated for any analyte, as no acceptable prediction results could be made for the test set.