Raman spectroscopic assessment of fibers and seeds of six cotton genotypes

Raman spectroscopy (RS) is a vibrational spectroscopy. This work reported the RS spectral characteristics of fiber and seed of six cotton (Gossypium sp.) genotypes differing in fiber length. While the RS spectra of fiber samples were dominated by the cellulose‐related peaks, the spectra of cottonseed samples were featured by the bands related to oil, protein, carbohydrate, and lignin components. Principal component analysis (PCA) revealed that the first two principal components (PCs) accounted for >87% of the total variation of the two types of samples. The PC1 versus PC2 plot classified the six fiber samples into three groups, but their cottonseeds into four groups. This experimental evidence implied the possibility of RS combined with PCA for rapid fiber phenotyping of cotton as well as for evaluating cottonseed nutrient information.


INTRODUCTION
Like infrared spectroscopy (IR), Raman spectroscopy (RS) is a vibrational spectroscopic technique. While IR is an excellent tool for structural and quantitative analysis of natural products (Di Meo et al., 2022;He & Liu, 2021;He & Ohno, 2012), RS is able to provide some additional structural information due to the different detection mechanisms (Johnston & Aochi, 1996;Makarem et al., 2019). For example, unlike IR, RS is a good technique for analyzing alkyl chains since the C─C and C─H bonding can be determined in the presence of water (Özgenç et al., 2017). RS has shown the capability of confirmatory identification of plant species and their varieties, as well as RS-based analysis of the nutrition value of seeds (Payne & Kurouski, 2021). Farber et al. (2020) (Cabrales et al., 2014;da Mata et al., 2022) even though various IR techniques have been applied into a wide array of cotton biomass characterization (He & Liu, 2021;He, Liu et al., 2022;Liu et al., 2016). Principal component analysis (PCA) is a classic multivariate technique that analyses a data table where observations are described by several inter-correlated quantitative dependent variables. It has been frequently applied to analyze complicated spectroscopic results including IR and RS data (He et al., 2018;Liu et al., 2016). Calculated from the covariance matrix of the original data set, this analysis is performed to understand the factors affecting the spectral variation across the samples, and it is calculated from the covariance matrix of the original data set. In other words, the algorithm program is an axis rotation technique that aligns a new set of axes, called principal components (PCs), with the maximal directions of variance within a dataset containing the scores, the loadings, and the residuals. While the score plot shows whether the samples are different, the corresponding loading plot shows the difference in the spectra and different polymorphs of the samples. Typically, two or three PCs could explain the most influential variables of the spectral variations (da Mata et al., 2022;Liu et al., 2015;Pećinar et al., 2021;Radulescu et al., 2021).
Previously, the fiber component of six cotton genotypes, a wild type (WT) and three fiber-length mutant lines (Li 1 , Li 2 -short, Li 2 -long, Li 2 -mix, and li y ) was characterized by attenuated total reflectance Fourier transform infrared (FT-IR) spectroscopy (He, Nam et al., 2021). We hypothesized that RS, as a spectroscopic tool complementary to FT-IR, would provide additional structural and compositional information of the fiber and seed components of these cotton samples. The purpose of this research is to demonstrate the potential of RS for the identification of fiber traits of different genotypes and value-added traits of cotton crops in a reliable, fast, and simple way.

MATERIALS AND METHODS
The background information and fieldwork of the six cotton phenotypes were reported in an earlier report (He, Nam et al., 2021). Briefly, WT is cotton line Delta Pine DP5690. Both Li 1 and Li 2 are monogenic mutants that exhibit an early cessation of fiber elongation. Li 2 mutant was further separated into three types: Li 2 -short, Li 2 -long, and Li 2 -mix per the fiber traits. Both Li 1 and Li 2 mutants used in this experiment are near-isogenic to the WT DP5690. The li y mutant was created by treating the seeds of the cotton line MD15 with ethyl methanesulfonate. The six cotton lines were selected due to their differing fiber lengths which have been studied previously in various ways He, Nam et al., 2021). From about 30 bolls of 10 plants of each genotype, the fiber and seed parts of 5-10 cottonseeds were manually separated in the laboratory. The cottonseeds were dehulled, and the kernels were ground manually by mortar and pestle. The intact fiber bundles and the ground seed kernels were subject to RS. The RS spectra were collected using Raman spectroscope (DXR2, Thermo Fisher Scientific Inc.). The instrumental settings involved a 785 nm laser with an output power of 10 mW, a 20X confocal microscope objective with a 50-μm slit width for 10 s integration time. Each spectrum was collected over the range of 3400-250 cm −1 by

Core Ideas
• Cotton fiber and seed samples were evaluated by Raman spectroscopy (RS) with principal component analysis (PCA). • RS spectra of fiber samples dominated by the cellulose-related peaks. • RS spectra of cottonseed samples featured by the bands related to oil, protein, carbohydrate, and lignin components. • PCA plot classified the RS data of six genotypes into three groups based on fiber traits. • PCA plot of RS data revealed the six cottonseed samples clustered into four groups.
focusing the laser spot on the individual samples. For each sample, triplicate measurements were conducted, and their average of absorbance was presented. PCA was performed using OriginLab software. It was carried out in the covariance matrix. The n variables for PCA analysis were 1349 as the data points were taken from the signature range of 1650-350 cm −1 with 0.9635 cm −1 interval. Prior to the multivariate analysis, RS spectra were baseline corrected and normalized with the band at 1096 and 1444 cm −1 for fiber and seed samples, respectively. Figure 1A shows the RS of the fiber samples in the signature range of 1650-350 cm −1 . The outstanding RS bands were typical for cotton fiber, mainly contributed by cellulose components (Agarwal et al., 2021;Cabrales et al., 2014;Lee et al., 2015). For example, the bands at 1381, 1153, 1100, and 974 cm −1 could be assigned to CH 2 or C─C ring stretching of cellulose. Those bands at or near 1124 and 521 cm −1 should be due to the glycosidic bond stretching. The bands at 436 and 380 cm −1 could be attributed to cellulosic ring deformations so that the relative intensity of RS bands at 380 and 1096 cm −1 is an indicative of the crystallinity of cellulose (Cabrales et al., 2014). The intensity ratios of the two bands of the six fiber samples were 0.42 (Li 1 ) ≤ 0.44 (li y ) < 0.48 (Li 2 -long) ≤ 0.50 (WT) < 0.76 (Li 2 -short) ≤ 0.84 (Li 2 -mix). Compared to the similar FT-IR spectra of the six samples (He, Nam et al., 2021), there were more apparent differences in the RS features (i.e., band intensity and shift) between these samples as some peaks (e.g., 1096, 1123, 1381, and 1480 cm −1 ) are relatively stronger in RS, but weak in IR (Makarem et al., 2019). Especially, it is noticeable that extra bands (e.g., 1603 and 653 cm −1 ) were observed in li y fiber. The strong band at 1603 cm −1 was an indicative of greater lignin component in li y fiber (Özgenç et al., 2017). The band at 653 cm −1 assigned to C─C─O bonds (Pećinar et al., 2021) implied that the li y fiber might contain more noncellulosic or abnormal cellulosic component as this band has not reported in most cellulose.

Raman spectroscopy analysis of fiber samples
The purpose of PCA was to interpret the complex RS spectra of the six samples by revealing differences between the samples (expressed as so-called "scores") and relating them to differences in the variables (i.e., "loadings") defining a sample. PCA results revealed that the first two PCs accounted for 89.6% of the total variation, with PC1 explaining 71.2% and PC2 explaining 18.4% of the spectral variation, respectively ( Figure 1B,C). The loading plots ( Figure 1B) showed a general negative effect of the fiber phenotyping to PC1 grouped in three regions around 1380, 1096, and 400 cm −1 . Especially, the highest negative value of PC1 was observed with the RS peak at 1096 cm −1 . This observation indicated that the PC1 variable was dominated by unique COC glycosidic ring breathing of cellulose structure (Cabrales et al., 2014). On the other hand, PC2 varied along the two RS regions (i.e., >550 cm −1 , and <550 cm −1 ) with negative and positive contributions, respectively. PC2 corresponded positively the multiple RS bands around 1600, 1480, 1381-1294, 1153-1100, 900-800, and 700 cm −1 . This observation of association of multiple RS bands to PC2 suggested that more comprehensive RS vibrating features contributed to PC2. The score plot ( Figure 1C) suggested that WT, Li 2 -mix, and Li 1 consisted of a cluster along PC2 axis with low scores of both PC1 and PC2. Li 2 -long and Li 2 -short were together at the upper right corner of the plot. li y was along at the lower right corner with low PC1 score but high negative score of PC2. This means that there were clear differentiations in RS profile not only between the three fiber samples, but also distinctly from the cluster of other samples.

Raman spectroscopy analysis of cottonseed samples
The RS spectra of the cottonseed exhibited strong peaks only in the range >800 cm −1 with few featureless weak bands at the Raman shift <800 cm −1 (Figure 2A). These RS features were greatly different from those of cotton fiber samples ( Figure 1A) as the cottonseed samples contained multiple macro components (i.e., protein, carbohydrate, oil, and lignin) He et al., 2020;He et al., 2014) while the latter was cellulose-dominated (He, Nam et al., 2022;He et al., 2017). Indeed, the RS spectra of the cottonseed samples were very similar to those other seed samples (e.g, peanut and maize kernels) (Farber et al., 2020;Krimmer et al., 2019). Specifically, the intensities of 1004, 1304, 1442, and 1608 cm −1 vibrational peaks could be assigned to evaluate the relative content of protein, carbohydrate, oil, and fiber materials (Farber et al., 2020). PCA modeling results showed that a two-component model was able to explain 87.0% of data variation ( Figure 2B,C). The PC1 accounted for 78.4% of the overall data variance with PC2 for 8.6%. The PCA identified the broadband range from 1200 to 1500 cm −1 as the major positive contributor of PC1 ( Figure 2B). In other words, the content of oil and carbohydrate had a dominating influence on PC1. On the other hand, PC2 scores were influenced positively by the RS signals at 1304 and 1608 cm −1 , but negatively by the RS signals at 1081 and 1442 cm −1 . The variables of RS band lower than 800 cm −1 seemed also impacting the PC2 even though the intensity of those bands were weak in RS spectra (Figure 2A). The score plot ( Figure 2C) shows that five of the six cottonseed samples with similar PC1 score between 0 and 500 with the outlier li y (PC1 score, 1200). For PC2, there were four samples with near-zero score around the PC2 axis. WT and Li 2 -short had the high positive and negative PC2 scores (≥250), respectively. These differences classified the six samples into four groups, that is, Li 1 , Li 2mix and Li 2 -long clustered together whereas each of the other three WT, li y and Li 2 -short stood alone. The PCA grouping of the six cottonseed samples was different from that of the six fiber samples, indicating the irrelevance of the fiber and seed composition of the six cotton phenotypes, so did the genetic control of their biosynthesis.

CONCLUSION
This comparative work demonstrated the potential of RS as a fast and simple method for characterization of cotton fiber and seed based on their spectroscopic signatures. Cotton fiber showed band features across the examined range from 350 to 1650 cm −1 , contributed mainly by the cellulose structures. On the other hand, cottonseed samples showed characteristic bands mainly with the Raman shifts >800 cm −1 , contributed by oil, protein, carbohydrate, and lignin components. PCA results classified the lint (fiber) of the six cotton genotypes into three groups, but their seed chemical compositions into four groups. This experimental evidence implied the possibility of RS combined with PCA for rapid fiber phenotyping of cotton as well as for evaluating cottonseed nutrient information.

C O N F L I C T O F I N T E R E S T S T A T E M E N T
The authors declare no conflict of interest.