Phenotyping agronomic and physiological traits in peanut under mid‐season drought stress using UAV‐based hyperspectral imaging and machine learning

Agronomic and physiological traits in peanut (Arachis hypogaea) are important to breeders for selecting high‐yielding and resilient genotypes. However, direct measurement of these traits is labor‐intensive and time‐consuming. This study assessed the feasibility of using unmanned aerial vehicles (UAV)‐based hyperspectral imaging and machine learning (ML) techniques to predict three agronomic traits (biomass, pod count, and yield) and two physiological traits (photosynthesis and stomatal conductance) in peanut under drought stress. Two different approaches were evaluated. The first approach employed eighty narrowband vegetation indices as input features for an ensemble model that included K‐nearest neighbors, support vector regression, random forest, and multi‐layer perceptron (MLP). The second approach utilized mean and standard deviation of canopy spectral reflectance per band. The resultant 400 features were used to train a deep learning (DL) model consisting of one‐dimensional convolutional layers followed by an MLP regressor. Predictions of the agronomic traits obtained using feature learning and DL (R2 = 0.45–0.73; symmetric mean absolute percentage error [sMAPE] = 24%–51%) outperformed those obtained using feature engineering and conventional ML models (R2 = 0.44–0.61, sMAPE = 27%–59%). In contrast, the ensemble model had a slightly better performance in predicting physiological traits (R2 = 0.35–0.57; sMAPE = 37%–70%) compared to the results obtained from the DL model (R2 = 0.36–0.52; sMAPE = 47%–64%). The results showed that the combination of UAV‐based hyperspectral imaging and ML techniques have the potential to assist breeders in rapid screening of genotypes for improved yield and drought tolerance in peanut.


INTRODUCTION
The peanut (Arachis hypogaea) is one of the most important cash crops in the United States, valued at over 1 billion USD.Over 3 million tons of peanuts were harvested in 2020 from approximately 1.6 million acres in the United States (USDA-NASS, 2020).Peanuts are grown in many Southern states in the United States and around the world.For this reason, peanut breeding programs aim to develop cultivars that have desirable and improved traits that can be adapted to their respective environments.Moreover, as droughts are becoming more frequent, severe, and widespread, drought-tolerant cultivars need to be developed for regions affected by drought.In a breeding program, a breeder may need to measure multiple traits for hundreds to thousands of peanut genotypes at multiple field locations every year.Typical agronomic traits in peanut include biomass, pod count, and pod yield, which quantify how peanut plants convert energy and nutrients into different yield components.Measuring the three agronomic traits is normally done manually, which involves drying, weighing, counting, and shelling.Physiological traits such as photosynthesis rate and stomatal conductance can indicate whether a plant is under drought stress (Buezo et al., 2019;Zhang et al., 2022).These traits are measured by using a portable infrared gas analyzer that detects the plant's CO 2 fixation and the water liberated through stomata.Both procedures are labor-intensive and time-consuming, especially at large scales (Baslam et al., 2020).High-throughput plant phenotyping (HTPP) offers solutions to alleviate the phenotyping bottleneck in breeding programs.Remote sensing techniques have made it possible to monitor crop phenotypes in a nondestructive and efficient manner and are thus a valuable tool for estimating agronomic traits.The use of unmanned aerial vehicles (UAVs) for precision agriculture has recently gained significant attention because of their greater flexibility in mission scheduling, and the possibility of mounting different high-resolution sensors on the platform (Araus & Cairns, 2014;Araus et al., 2018).High-resolution sensors such as red-green-blue (RGB), multispectral, and hyperspectral cameras have become available to researchers and consumers and can provide valuable information regarding plant phenotypes.The images collected by these sensors can be used to calculate vegetation indices (VIs), which are mathematical combinations of two or more spectral bands to highlight vegetation properties, plant health, or stress.Therefore, UAV imagery can rapidly reveal information about the health of plants in a large area.RGB cameras are the most accessible and common type of sensor utilized on UAVs.They are often used to assess plant physiological and agronomical traits.Examples include a study by Choudhary et al. (2021), where VIs obtained from an RGB camera were used to assess the nitrogen status of wheat (Triticum aestivum).Bendig et al. ( 2014) also used UAV-based RGB

Core Ideas
• A drought experiment was conducted for an F1 population of peanut lines at the pod-filling stage.• Unmanned aerial vehicles-based hyperspectral imagery data were collected 14, 18, and 29 days after drought.• Machine learning models were used to predict three agronomic and two physiological traits.• Both machine and deep learning methods explained ∼50% of the variation in the dataset.• The overall best prediction accuracy occurred 18 days after drought stress imposition.
imagery to estimate the biomass of barley (Hordeum vulgare).
Due to strong reflectivity of plant canopy at near-infrared (NIR) wavelengths, multispectral sensors incorporating NIR channels are becoming more popular.NIR gives information about the cellular structure within leaves and when combined with a band such as red or red-edge (RE), it gives VIs such as normalized difference vegetation index (NDVI) and normalized difference red edge index (NDRE), which provide measurements for overall plant health.These VIs have been applied in numerous studies to estimate aboveground biomass, leaf area index (LAI), water stress, and yield prediction (Maresma et al., 2016;Romero et al., 2018;Su et al., 2019).Qi et al. (2021) also used VIs from a multispectral camera to monitor chlorophyll content in peanut leaves.Hyperspectral cameras are more complicated and costly compared to RGB and multispectral sensors, and the resulting images from these cameras require large amounts of storage.Despite their complexity, they provide invaluable information about crops' reflectance in hundreds of narrow spectral bands.This allows a more advanced analysis of plant characteristics in high-throughput phenotyping applications.Fenghua et al. (2017) used a hyperspectral camera mounted on a UAV for phenotyping LAI, leaf chlorophyll content (Cab), canopy water content (Cw), and dry matter content (Cdm) of rice.Yield and biomass predictions are also a common use of these sensors (Feng et al., 2020;Moghimi et al., 2020).Hyperspectral sensors have also been used for assessing photosynthetic attributes in several studies.Kanning et al. (2018) estimated the chlorophyll content and LAI from UAV-based hyperspectral data.To the best of our knowledge, there is no previous work on estimating peanut photosynthetic rate and stomatal conductance from aerial hyperspectral imagery.A more common method for assessing these traits in the literature is to measure individual leaves using a spectrometer.Buchaillot et al. (2022) estimated peanut and soybean (Glycine max) photosynthetic traits such as midday photosynthesis, maximum rubisco capacity (Vcmax) and maximum RuBP regeneration capacity (Jmax) using leaf spectral reflectance obtained by a handheld spectrometer (Field Spec Hi-Res 4, Malven Analytics).Qi et al. (2020) employed the same device for measuring peanut leaves chlorophyll content.However, handheld spectrometers do not allow high-throughput screening and are time consuming, as the reflectance of individual plants needs to be assessed manually.Aerial hyperspectral imaging, on the other hand, can capture a large area of plants in a far more efficient and automated manner.
As UAVs and spectral imagers become more viable, compact and affordable, the focus of HTPP has shifted from data collection to data analytics.A common approach for analyzing the data is correlating extracted VIs with crop phenotypes or diseases using statistical methods.Patrick et al. (2017) assessed the correlation of several VIs such as NDRE, green difference vegetation index with tomato spot wilt disease in peanut using linear regression.Balota and Oakes (2017) compared VIs derived from the ground and aerial sensor data with leaf wilting, pod yield, and crop value in peanut based on Pearson correlation results.However, as sensor data becomes more complex, such as hyperspectral data, more advanced methods are required to determine the underlying patterns between sensor data and phenotypes.
Machine learning (ML) and deep learning (DL) models have been shown to be highly capable of extracting information from complex and high-dimensional data and for that reason, they have become a popular data analytics method for HTPP.A common approach is feeding the extracted VIs as the input to an ML model such as K-nearest neighbors (KNN), support vector machine, random forest (RF), and so forth (Eugenio et al., 2020;Maimaitijiang et al., 2017;Qi et al., 2021;Sankaran et al., 2021;Wang et al., 2021).Feng et al. (2020) showed that ensemble models are more powerful than individual ML models.They developed an ensemble model by combining ML models and trained the model on narrowband VIs derived from aerial hyperspectral imagery for alfalfa yield prediction.Instead of using predetermined wavelengths for the VIs, Feng et al. (2020) used analysis of variance (ANOVA), multilayer perception, and reduced sampling to identify the most significant wavelengths, which were then utilized for the construction of new VIs that are able to detect bacterial wilt.Another approach is using the average spectrum at the plot level with no dimension reduction and training a DL model to learn the most significant bands of the spectrum.DeepRWC was developed by Rehman et al. (2020), an end-to-end DL model to predict the relative water content (RWC) of plants directly from mean spectral reflectance.Moghimi et al. (2020) implemented a deep neural network consisting of fully connected layers for high-throughput yield phenotyping in wheat.Both mean and standard deviation (SD) in addition to the area of leaves and spikes were used as the input.
There are many studies on assessing remote sensing and ML techniques for phenotyping major cash crops such as maize (Zea mays), wheat, barley, alfalfa (Medicago sativa), and rice (Oryza sativa).However, there is limited research on high-throughput phenotyping of agronomic and physiological traits in peanut.While several studies reviewed above evaluated UAV-based multispectral imaging for prediction of agronomic traits in peanut under field conditions, the potential of UAV-based hyperspectral imaging was rarely investigated.The main objective of this study was to evaluate the feasibility of predicting pod yield, pod count, biomass, photosynthetic rate, and stomatal conductance in peanut under drought-stressed field conditions using UAV-based hyperspectral imaging and ML methods.Two approaches representing two ML paradigms (i.e., feature engineering and feature learning) were compared.The first approach utilized VIs as the input features to an ensemble model of conventional ML models, while the second approach employed a one-dimensional (1-D) convolutional neural network (CNN) that leveraged the mean and SD of peanut canopy reflectance as input features.Other secondary objectives were to assess the dynamic change in model prediction accuracy at multiple timepoints after drought imposition and identify the top-performing features.

Experimental design
The field experiment was conducted at the U.S. Department of Agriculture-Agricultural Research Service National Peanut Research Laboratory in Dawson, GA (31.759875793753956, −84.43488756104786).The field was divided into four blocks and each block was equipped with an automatic rainout shelter (Blankenship et al., 1989).Each metal shelter covers a ground area of 5.5 m × 12.2 m and automatically closes by moving on two trolley rails when a rain detector (Agrowtek IR Digital Rain Sensor, Agroetek) is triggered and opens otherwise.Each shelterable area was planted as a common garden experiment and was further divided into 16 rows and 4 columns, resulting in 64 plots.Two of the four rainout shelters were employed to impose drought treatments while the others were maintained under well-irrigated conditions.Each plot was 0.609 m × 0.762 m in dimensions.A single peanut plant was grown in each plot following a generalized randomized block design.The plant materials were PI502120, AU-NPL 17, Ga-Green, AP-3, x587, C76-16, AT3085RO, Line-8, and TifRunner parent cultivars as well as the F1 population of crossing of Tifrunner with the other parent lines.
Each parent cultivar and F1 descendant was replicated three times per shelter.PI502120, AU-NPL 17, and Line-8 are of high drought tolerance; C76-16, TifRunner, and x587 are of moderate drought tolerance; and AP-3, Ga-Green, and AT3085RO are drought sensitive (Zhang et al., 2022).A set of Watermark soil moisture sensors (Irrometer) were placed in the center of each block in the field at depths of 0.1 m and 0.2 m.Each automated rainout shelter was also equipped with an irrigation system to act as a linear-move irrigation system.Irrigation was triggered when the soil water potential was under −60 KPa before the drought was imposed.During the drought, the two irrigated blocks followed the same regime but the two blocks under the drought treatment did not receive any water.The drought was imposed on July 26, 2021 and terminated with rewatering after 6 weeks.Note that the minor environmental difference between plants growing with the rainout shelters and those in an open field is the lighting condition during rainfall and irrigation.During rainfall, plants under the drought treatment are covered, but the ambient lighting condition outside the shelters would be low.When soil moisture is below the irrigation threshold, the shelters move to irrigate the plants under the well-irrigated treatment and thus cover the plants.However, since the footprint of the shelters is relatively small, the duration of the plants in the shade is only several minutes.Therefore, the shading caused by using the rainout shelters was not expected to impact the crop growth and development.
The peanuts were harvested on September 23, 2021.During this period, UAV-based hyperspectral images were collected on August 9, August 13, and August 24, which are 14, 18, and 29 days after drought (DAD), respectively.Biomass, pod yield, and pod count were measured after harvest.Photosynthesis and stomatal conductance were measured on the same image collection dates using four LI-6400 systems (LI-COR Biosciences) at midday (11:00 to 13:30).Measurements were performed on fully expanded young leaves corresponding with the second/third leaf from the top of the main stem.The LI-6400 chambers were set to display the same environmental conditions (i.e., light, relative humidity, and temperature) as the atmospheric condition varied between measurement days.A summary of the statistics of the measured ground-truth data is shown in Table 1, and an aerial image of the field taken on August 5 is presented in Figure 1.Moreover, the distribution of spectral reflectance across 256 plots and three data collections are shown in Figure 2.

UAV-based hyperspectral imaging and data preprocessing
The UAV platform used was a Matrice 600 Pro hexcopter (Shenzhen DJI Sciences and Technologies Ltd.).Flight missions were planned using UgCS (SPH Engineering) at an aboveground altitude of 20 m with a 40% side overlap.The resultant ground sample distance was 12 mm.The camera was faced nadir during the flight and was stabilized using a Ronin- MX gimbal (Shenzhen DJI Sciences and Technologies Ltd.) on the UAV.A push-broom visible-near-infrared (VNIR) hyperspectral camera (Nano-Hyperspec, Headwall Photonics, Inc.) was used for the data collection.This camera has 270 spectral bands ranging from 400 to 1000 nm with a spectral resolution of approximately 2.2 nm.Each line scanned by the camera has 640 spatial bands.Exposure time was determined by imaging a white polyvinyl chloride (PVC) panel facing the sky at a distance of 0.2 m and adjusting the exposure time such that the average spectral intensity reached approximately 75% of the maximum allowed intensity of the imaging sensor.The purpose was to fully utilize the dynamic range of the imaging sensor while keeping some room to avoid sensor saturation.Given the user-defined aboveground altitude and exposure time, the flight speed was automatically determined by the camera software such that each pixel captured a square area on the ground.The hyperspectral images were saved in cubes of 700 frames.After the acquisition of the hyperspectral images, the raw data were radiometrically calibrated using the dark reference collected on the same day before the flight.The dark reference was a single image cube acquired with the lens cap on, with the same exposure settings as the other image cubes.The resulting radiance cubes were then calibrated to reflectance using a 3 m × 3 m calibration tarp with three regions of 56%, 32%, and 11% reflectivity, respectively.This tarp was placed in the field on a flat surface for every data collection.Following the conversion to reflectance, all the images were geometrically corrected.The described post-processing steps were performed using SpectralView, a software program provided by Headwall Photonics, Inc.Subsequently, a hyperspectral orthomosaic was created from the orthorectified images, a grid was overlaid on the orthomosaic in QGIS 3.18.2(QGIS Development Team, 2022), and individual single-plant plot images of 64 pixels × 51 pix- F I G U R E 2 Average canopy spectral reflectance plus and minus standard deviation (SD) at 14, 18, and 29 days after drought (DAD).
els were extracted from the map (Figure 1).Soil pixels were removed from each plot image using thresholding a custom NDVI.NDVI is defined in Equation (1) where R(x) denotes the reflectance at wavelength x.For our dataset, the combination of 804 nm in the NIR region, 693 nm in the red region, and a threshold of 0.2 for the custom NDVI was found by trial and error to perform satisfactory soil removal.The selection of the three parameters was based on previous studies (Fang et al., 2020;Liang et al., 2015;Moghimi et al., 2020).Additionally, noisy spectral bands above 844 nm were removed, resulting in 200 effective bands.The general workflow for the procedure of the explained procedure is shown in Figure 3.

Machine learning models
Two methods were implemented in this study.The first method employed an Ensemble ML model consisting of four ML models including KNN, support vector regression (SVR), RF, and a multi-layer perceptron (MLP) regressor.The inputs to this model were eighty narrowband VIs.The second method was a DL model with a 1-D CNN followed by an MLP regressor.The inputs to this model were the mean and SD of spectral reflectance per band for each plot.The ensemble model and the DL model were both trained for 2000 epochs and their hyperparameters were tuned with grid search.Crossvalidation with a total of 10-fold was performed to ensure the robustness of the model.There were a total of 256 plants, therefore the initial dataset had 256 data points.After removing eight data points related to diseased plants, 248 data points remained in the final dataset, of which 90% was used for model training and 10% for model testing.

Ensemble model
The hyperspectral orthomosaic had 270 continuous spectral bands and adjacent bands are normally correlated.Instead of using all of the original bands, 80 published VIs (Table 2)   were computed at the plot level.References to these VIs can be found in the study by Feng et al. (2020) that explored them for Alfalfa yield prediction.These VIs included 12 narrowband NDVIs and 19 simple ratio indices (SRIs).NDVI and SRI were examined more precisely due to their capability of characterizing canopy vigor, biomass, and photosynthetic rate.
It has been shown in several studies that ensemble models outperform individual ML models and have more robust results due to their diverse nature and not depending on an individual model's results (Feng et al., 2020;Zhang et al., 2022).Our ensemble model used a voting regressor to give a final prediction from four models: KNN, SVR, RF, and MLP) regressor.The hyperparameters of each ML model were tuned using the grid search method that exhaustively tests a grid of model parameter values and identifies the best-performing combination.KNN regression is a nonparametric supervised ML algorithm that works based on the assumption that similar samples exist in close proximity to K nearest samples in the feature space, where K is a hyperparameter that needs to be tuned for a specific dataset.After ranking samples based on their distance to the unknown (testing) sample, it estimates the response by taking the average of the responses of K nearest neighbors in the training set.K was tuned to 4 for the KNN model.SVR is a supervised ML model that transforms input data into another space using a kernel function.A linear kernel function was selected in our case.RF regression is a combination of regression trees and the final prediction value is the average of all trees.The MLP regressor was configured as one hidden layer with 100 neurons and was trained for 2000 epochs.Adam (adaptive moment estimation) was chosen as the optimizer with a learning rate of 0.1, and exponential decay rates of 0.9 and 0.99 for the first and second moment estimates, respectively.The ensemble model and its estimators were implemented in Python 3.9.7 using the libraries scikit-learn (Pedregosa et al., 2011) and NumPy (Harris et al., 2020).

Deep learning model
With a small dataset with a total of 248 data points, the input data needed to be simplified to reduce the model complexity and the number of trainable parameters.Instead of three-dimensional (3-D) hyperspectral cubes, mean and SD of reflectance were used as the inputs to the DL model, inspired by Moghimi et al. (2020).1-D convolution was chosen as the convolution method for these 1-D inputs.The architecture of the DL model consisted of four 1-D convolution layers and three dense layers, each of which is followed by a batch normalization layer and a dropout layer.The activation function of all layers is a Leaky ReLu (rectified linear unit) (Maas et al., 2013), with α = 0.3 in the Leaky ReLU function (Equation 2).Unlike ReLU, Leaky ReLU allows a small gradient when the unit is not active.
Dropout was added after Leaky ReLU, with dropout rates of 0.5 for the convolution layers, 0.3 and 0.1 for the fully connected layers, as shown in Figure 2. The purpose of dropout was to avoid overfitting and Adam was chosen as the optimizer with a learning rate of 0.1, and exponential decay rates of 0.9 and 0.99 for the first and second moment estimates, respectively.The input to this model was the normalized average and SD of canopy reflectance per plot as a 1-D To evaluate the models, root mean square error (RMSE), coefficient of determination (R 2 ) and symmetric mean absolute percentage error (sMAPE) were used, and the equations of these metrics are shown in Equations ( 3)-( 5), respectively.

Feature importance evaluation
Feature importance of the 80 VIs for the ensemble ML model or the continuous spectral bands for the 1-D CNN model was evaluated using permutation importance (Altmann et al., 2010).This algorithm provides insight into the importance of each feature by assessing the model accuracy decrease when a feature is not available.This would be computationally intensive if performed during training, so instead it was performed during model testing.Because the model expects all features to be present during training and testing, instead of removing each feature, each feature was replaced with random noise.This noise was drawn from the same distribution as the original feature, by shuffling values for a feature and using other examples' feature values.The metric used in this algorithm was R 2 , and the reported permutation importance score is the amount by which that R 2 decreased when a feature was not present.This algorithm was applied on the models used to report the results in Tables 3-6, from the same fold, on the data collected at one of the three timepoints that lead to the highest average R 2 for prediction of the five traits.The rationale is that those models are more capable of identifying the most significant VIs.

RESULTS
Note that 10-fold cross-validation was used in both methods described in the ML models section, and the presented results are the median of the calculated metrics across all folds.The performance of the ensemble model and the DL model on testing data are shown in Tables 3-6 for both agronomic and physiological traits.Overall, both models have results close  to each other, with DL having a slightly higher accuracy for prediction of agronomic traits, and the ensemble model performing marginally better on predictions of physiological traits.Averaging across the three data points, predictions of biomass had R 2 values of 0.60 and 0.51, RMSEs of 49.03 and 61.09 g•plant −1 , and sMAPEs of 26.60% and 29% from the DL and ML model, respectively (Tables 3 and 4).Pod count estimations using the two models yielded the same average R 2 at 0.55, and slightly lower RMSE and sMAPE using the DL model, at 47.29% and 42% compared to the ML model with RMSE and sMAPE values of 53.27% and 53%.Yield predictions achieved an R 2 of 0.6, RMSE of 54.23 g•plant −1 and sMAPE of 50.50% from the DL approach.The same metrics using the ML method for yield predictions were 0.48 (R 2 ), 54.04 g•plant −1 (RMSE), and 38% (sMAPE) (Tables 3  and 4).ML and DL had close RMSE values for photosynthetic rate predictions at 8.33 and 8.54 μmol•m −2 •s −1 , respectively, and the same average sMAPE of 51%.ML yielded a slightly higher R 2 of 0.48, compared to the R 2 from DL at 0.44.Stomatal conductance predictions produced average R 2 values of 0.43 and 0.40, sMAPEs 63% and 79% using the DL and ML methods, respectively.Both methods had an average RMSE of 0.19 mmol•m −2 •s −1 (Tables 5 and 6).
The fold with the closest R 2 to the median R 2 of all folds was chosen for scatter plots shown in Figures 5-8.These plots present ground truth values versus predicted values for each trait.Eighteen DAD has the highest R 2 among most dates using DL, and the highest R 2 across traits corresponds to the predictions of biomass from data collected on this day, 18 DAD.

3.1
Feature importance

3.1.1
The ensemble machine learning model Feature importance was calculated to report the results in Tables 3-6, from the same fold, on the data collected 18 DAD.This dataset (18 DAD) was chosen because it resulted in the highest average R 2 for prediction of the five traits among the three timepoints (0.486 for 14 DAD, 0.550 for 18 DAD, and 0.506 for 29 DAD).The results of this analysis, the top 10 VIs for each model are shown in Figure 9. Gitelson1, SRI [710,750], and Gitelson2 were found to be the most important VIs for photosynthesis, stomatal conductance, and biomass, respectively, and Carte4 was the common most important VI for pod count and pod yield.There are several mutual top VIs across the traits; for example, Gitelson2 and new vegetation index (NVI1) were both among top 10 VIs in the photosynthesis and stomatal conductance models.Overall, Gitelson2, variations of NVI, modified chlorophyll absorption ratio index (MCARI), transformed chlorophyll absorption in reflectance index (TCARI), and red edge position index (REP) were among the most common top features across all models.

The deep learning model
To identify the most important wavelengths in the DL model, the same approach applied for the ensemble model, permutation importance, was used.Again, the models for 18 DAD were used because they resulted in the highest average R 2 for F I G U R E 5 Performance of the deep learning (DL) model on test data for predictions of biomass, pod count, and yield, 14, 18, and 29 days after drought (DAD).The shown data points for these scatter plots are the folds from the test dataset with the closest R 2 to the median values shown in Table 3.
prediction of the five traits (0.504 for 14 DAD, 0.582 for 18 DAD, and 0.478 for 29 DAD).Overall, there were 200 features (wavelengths) from the average reflectance of each plot, and 200 features from the SD of reflectance per band per plot (400 features total).Since many features are adjacent to each other, the top 20 features for each model were selected, and among the adjacent bands within a ±2 nm range, if there were any, the band with the highest permutation importance score was chosen.For example, in case of having 407, 409, and 411 nm in top features, the one with the highest score was chosen as a representative in these charts.Therefore, the number of top features for each model is not the same.The model trained on the data from August 13 was chosen for this analysis, since the highest R 2 was achieved from training the model on this dataset.The retrieved top wavelengths are shown in Figure 10.There were top wavelengths for biomass in blue, green, red, red edge, and NIR, but the highest concentration is seen in the green region.Pod count and pod yield had relatively similar results, with top wavelengths in the blue and RE regions.The top wavelengths in the photosynthesis model were mostly in green and RE, similar to stomatal conductance.Stomatal conductance also had some top features in the blue region.
As discussed before, these features were chosen from plotlevel average reflectance and the SD of reflectance per plot.To explore the variation of plot-level mean and SD of reflectance across all data points, the mean and SD of the reflectance profiles across 256 plots were calculated.As shown in Figure 11, SD has a higher variation within plots.

F I G U R 6
Performance of the ensemble model on testing data for predictions of biomass, pod count, and yield, 14, 18, and 29 days after drought (DAD).The shown data points for these scatter plots are the fold from the test dataset with the closest R 2 to the median values shown in Table 4.

Ensemble ML model versus DL model
On average the DL models had a better performance for predicting biomass, pod yield, and pod count (the agronomic traits) and the ensemble ML models had a superior performance in predicting the physiological traits, photosynthesis rate, and stomatal conductance.The reason for better performance of the two types of models for different traits could be the choice and existence of relevant VIs, and their capability of explaining the studied traits (Abdu et al., 2020).It is possible that the VIs included in the ensemble ML models, were better indicators of the physiological traits and not able to fully explain the variability in the agronomic traits.Therefore, the physiological phenotyping models had less input features and less trainable parameters, and therefore the training gave better results.Assuming the studied VIs were not great indicators of pod count, biomass, and pod yield, the DL models were a more adaptive solution as the 1-D CNN could capture the detailed shape of the canopy reflectance of each plot, and learn which wavelengths are more important during training, unlike the ensemble ML models where the wavelengths chosen for the VIs were predetermined.Moreover, the SD of canopy reflectance within each plot provided the model with information regarding the distribution of pixel reflectance per band per plot.When an additional DL model was trained without the SD in the input, R 2 decreased by 9% on average.Both ensemble ML and DL methods showed that the overall prediction accuracy of the agronomic and physiological traits peaked at 18 DAD.This could be interpreted as follows.Our previous study using the same population and F I G U R E 7 Performance of the deep learning (DL) model on testing data for predictions of photosynthetic rate and stomatal conductance, 14, 18, and 29 days after drought (DAD).The shown data points for these scatter plots are the fold from the test dataset with the closest R 2 to the median values shown in Table 5. rainout shelters showed that the drought-tolerant genotypes maintained higher photosynthesis and stomatal conductance during the drought period and eventually higher yield compared to the drought susceptible genotypes (Zhang et al., 2022).During drought imposition, the differences in photosynthesis and transpiration between genotypes were likely to cause increasing variations in canopy spectral reflectance.These variations could peak near 18 DAD in our experiment, which facilitated the model training and resulted in overall best prediction accuracy at the particular time point for our dataset.As the drought treatment continued after 18 DAD, the water stress could exceed the managing capacity of the drought-tolerant genotypes, consequently diminishing the variations in canopy reflectance among the genotypes and negatively impacting the model prediction accuracy.

Interpretation of the most important features
As shown in Figure 9, Gitelson2, variations of NVI, MCARI, TCARI, and REP were among the most common top features across all the models.Most of these VIs include the spectral response in a wavelength in each of the NIR, RE, green, and red regions, which was expected.Gitelson2 includes wave-lengths from the NIR, red, and RE regions, and NVI includes wavelengths in the blue, RE, and NIR region.MCARI and TCARI leverage reflectance in the green and NIR regions, which normally are used to estimate chlorophyll absorption.REP also includes wavelengths from red and NIR regions.These results confirmed the importance of reflectance in the mentioned regions of the electromagnetic spectrum for rapid plant phenotyping.
Despite the black-box nature of DL models, top features in the DL models were also found using permutation importance.Results from both analyses show that green and RE are the most important regions of the spectrum for predicting biomass.The top VIs from the ML model including Gitelson2, NDVI [471,584], NVI2, and REP2 contain wavelengths from those ranges as well, confirming the importance of these wavelengths.The ranges of 410-430 nm (blue) and 710-740 nm (RE) held the most important wavelengths for prediction of pod count and pod yield.The top VIs for these models also include wavelengths from the RE region (such as Carte4, SRI [675,700], NVI1, and NVI2), but there are not any indices including blue wavelengths.The green and RE ranges were shown to contain the most significant wavelengths for the prediction of photosynthesis and stomatal conductance.Most top VIs for these models also include RE in their formulas, such as NVI, Gitelson, NDVI [734,750], and Vogelmann F I G U R E 8 Performance of the ensemble model on testing for predictions of photosynthetic rate and stomatal conductance, 14, 18, and 29 days after drought (DAD).The shown data points for these scatter plots are the fold from the test dataset with the closest R 2 to the median values shown in Table 6.
index.Considering all top wavelengths assessed, plot-based SD delivered most of the important features.This could be due to the fact that SD has a higher variation between all plots (Figure 11).

Effect of drought on peanut canopy spectral response
Hyperspectral data collected 18 DAD resulted in the best overall prediction accuracy.This could be due to the fact that this date was when the effect of drought was most severe.Since there were drought-tolerant varieties among the peanut genotypes, some experienced wilting and recovered by August 24 (i.e., 29 DAD), therefore the effect of drought is not seen thoroughly in the canopy spectral response.Below are examples of the spectral responses of a drought-tolerant genotype (Line-8) and a drought-sensitive genotype (AP-3), 14, 18, and 29 DAD.As shown in Figure 12, the canopy reflectance of the drought-sensitive variety in the NIR region decreased 18 DAD and maintained a similar level until 29 DAD.However, the drought-tolerant variety's reflectance in NIR was lowest 18 DAD, and surged after about 11 days.Since high reflectance in NIR is an indication of high plant vigor, the rise of spectral response in this region suggests the recovery of the drought-tolerant genotype.The temporal changes of VNIR spectral response of peanut canopy may assist breeders in quantifying recoverability from water stress in peanut.

Comparison to related studies
Our model performance is comparable or better than those in related studies for peanut.Pod count and yield predictions obtained via both models estimated these complex traits with R 2 greater than 0.5 and the highest R 2 being 0.65 for pod count and 0.61 for pod yield.The R 2 value for our pod yield prediction is comparable to the highest R 2 of 0.62 achieved by using multispectral imaging-based NDRE by Patrick et al. (2017).
As a more direct comparison, we computed narrowband NDRE using our hyperspectral data and found that the R 2 values between NDRE and pod yield were significantly lower, with 0.32, 0.44, and 0.34 for 14, 18, and 24 DAD, respectively.The attained R 2 for yield is also greater than the one obtained by Balota and Oakes (2016) where the highest R 2 using color and RGB-derived indices were 0.39 and 0.26, respectively.This may suggest a potential superior performance of using hyperspectral imaging and ML/DL models for peanut yield prediction compared to using broad-band multispectral imaging-based VIs as proxies.In comparison to other crops, and stomatal conductance were 0.56 and 0.57, respectively, using the ensemble ML method.Buchaillot et al. (2022) achieved a higher R 2 of 0.62 for estimation of photosynthetic rate using a handheld 350-2500 nm spectrometer and ML models.Although our accuracy appears relatively lower, there are three major differences between the two studies.First, Buchaillot et al. measured photosynthetic rate and spectral reflectance on the same individual leaves, whereas we imaged canopy reflectance which may not the reflectance of the particular leaf from which the photosynthetic rate was measured.Second, the handheld spectrometer can measure wavelengths beyond NIR into shortwave infrared (SWIR) that our hyperspectral camera does not capture, which could add additional useful features for the ML models to extract.Third, Buchaillot et al. grew the plants in a controlled environment, whereas our experiment was conducted under field conditions.For other crops, El-Hendawy et al. ( 2019) also used a 350-2500 nm spectrometer and ML models to predict photosynthesis and stomatal conductance for wheat plants in a salt stress experiment under field conditions.They achieved moderate to high R 2 values for photosynthesis (0.58-0.98) and stomatal conductance (0.44-0.92).In addition to the potential contribution from the SWIR spectral range, there could be differences in physiological responses between peanut and wheat under the two different types of abiotic stresses.
Our overall prediction accuracy of the peanut agronomic and physiological traits were lower than those achieved for other crops in previous studies.This may be explained by several reasons.For yield in particular, peanut pods are belowground and thus cannot be directly imaged.As discussed above, the pod yield was correlated with the canopy health during water stress, which made ML-based prediction feasible.In contrast, yield components such as alfalfa canopy and wheat heads can be directly observed by aerial hyperspectral imaging and more direct features can be learned by ML models.As for all the traits of interest, the use of single-plant plots might introduce noise to our plot-based hyperspectral data since there could be a small number of leaves from neighboring plants present in a plot image.Another limitation of our study was the small dataset.As a result, the spatial information from the hyperspectral imagery was compressed into mean and SD of canopy reflectance so that a lightweight 1-D CNN DL model could be trained.To improve our model performance, scaling up the experiment from single-plant plots to normal multi-plant plots with clear plot delineation could be essential.This could be economically implemented by using stationary greenhouses with motorized roof opening and closing capacities automatically controlled by rain sensors.Additionally, increasing the dataset based on field trials in multiple years and locations may improve the prediction accuracy of our current lightweight ML/DL models, while enabling the effective training of complex DL models such as a 3-D CNN that can learn both spatial and spectral features from a hyperspectral cube.Although our study did not yield sufficient accuracy to replace the established time-consuming agronomic and physiological phenotyping techniques in a conventional peanut breeding program yet, the proposed high-throughput approach may be acceptable in a breeding strategy where a large number of peanut lines (e.g., hundreds) need to be screened to find a small number of top-performing candidate lines and manual phenotyping is prohibitive.

CONCLUSIONS
In this study, the feasibility of using UAV-based hyperspectral imaging and ML for prediction of biomass, pod count, pod yield, photosynthesis rate, and stomatal conductance in peanut was evaluated.Two common approaches in this domain were compared: ML and feature engineering versus DL and feature learning.Both methods showed promising results; the DL model outperformed the ensemble ML model in predicting the agronomic traits and the ensemble ML model had a better performance in estimating the physiological traits.Moreover, data collected on 14, 18, and 29 days after midseason drought imposition were tested on both models, and 18 days after drought was found to provide the most valuable information to achieve the highest accuracy.Additionally, the most important input features of both the ML and DL model were investigated, and the most effective detection wavelengths were in the visible, near infrared, and RE regions.
The results demonstrated the ability of both ensemble ML and DL models to extract valuable information from hyperspectral imagery for phenotyping the agronomic traits in peanut 30-45 days before harvest, and estimating the physiological traits for same-day measurements.The proposed methods have potential to become a high-throughput screening tool for breeding climate-resilient peanut lines.

F
An aerial image of the experimental field with four rainout shelters open.A grid in red is overlaid on the image to indicate plot boundaries.

F
Deep learning (DL) model architecture: (a) model schematic, (b) detailed DL model layers and their hyperparameters.Conv1D, one-dimensional convolution.FC, fully connected.feature vector.The implementation and training of this model was done using Python 3.9.7,TensorFlow 2.5.0, and Keras 2.5.0 on an NVIDIA GeForce RTX 2080 Max-Q Graphics Processing Unit.The architecture of this model is shown in Figure 4.

F I G U R E 9
Top 10 vegetation indices used in the ensemble model trained for (a) photosynthesis, (b) stomatal conductance, (c) biomass, (d) pod count, and (e) pod yield.CI, curvature index; DCNI, double peak canopy nitrogen index; MCARI, modified chlorophyll absorption ratio index; mND, modified normalized difference vegetation index; mSRI, modified simple ratio index; MTVI, modified triangular vegetation index; NDVI, normalized difference vegetation index; NVI, new vegetation index; PRI, physiological reflectance index; SRI, simple ratio index; REP, red edge position index; TCARI, transformed chlorophyll absorption in reflectance index; TGI, triangular greenness index; VIopt, optimal vegetation index.combingUAV-based hyperspectral imaging and ML-based modeling for agronomic trait prediction achieved comparable or better results than ours.The highest R 2 achieved for aboveground biomass was 0.73 using our DL model.In addition, most of our results fall into the same R 2 range of 0.64-0.89achieved byMasjedi et al. (2020) that employed hyperspectral imaging and LiDAR to estimate sorghum biomass.On the other hand, our yield prediction accuracy was lower than those of other crops.For instance, an R 2 value of 0.87 was obtained using an ensemble ML model for alfalfa yield predictions(Feng et al., 2020).Moghimi et al. (2020) used a deep neural network to predict yield in wheat and achieved R 2 in the range of 0.64-0.81.Physiological traits are known to be challenging to estimate accurately from remote or even proximal sensing methods.Our highest R 2 values for estimations of photosynthetic rate F I G U R E 1 0 Top wavelengths used in the deep learning (DL) model trained for (a) photosynthesis, (b) stomatal conductance, (c) biomass, (d) pod count, and (e) pod yield.SD, standard deviation.

F
I G U R E 1 1 (a) Mean of plot-level standard deviation (SD) across 256 plots ± SD of plot-level SD across 256 plots and (b) mean of plot-level average reflectance across 256 plots ± SD of plot-level average reflectance across 256 plots.F I G U R E 1 2The spectral responses of a drought-tolerant genotype, Line-8 (a) and a drought-sensitive genotype, AP-3 (b).DAD, days after drought.
This project was supported by the Alabama Agricultural Experiment Station, the Hatch program of the U.S. Department of Agriculture National Institute of Food and Agriculture (USDA-NIFA), and the USDA-NIFA AFRI Foundational and Applied Science Program(Grant No. 2020-67013- 32164).Yin Bao https://orcid.org/0000-0002-3548-1823CharlesChen https://orcid.org/0000-0001-6677-7187RE F E R E N C E S Summary of the statistics of the measured agronomic and physiological traits.
Performance of the deep learning (DL) model on test data for each agronomic trait, 14, 18, and 29 days after drought (DAD).Performance of the ensemble machine learning (ML) model on test data for each agronomic trait, 14, 18, and 29 days after drought (DAD).Performance of the deep learning (DL) model on test data for each physiological trait, 14, 18, and 29 days after drought (DAD).Performance of the ensemble machine learning (ML) model on test data for each physiological trait, 14, 18, and 29 days after drought (DAD).
T A B L E 3