Optimization of UAS‐based high‐throughput phenotyping to estimate plant health and grain yield in sorghum

High‐throughput phenotyping (HTP) has enabled the acquisition of vast amounts of data. Therefore, finding the most informative phenological stage(s) and high‐throughput traits could lead to significant optimization of HTP‐assisted selection. An investigation as to when phenotypic data should be collected and how it should be processed from unmanned aerial system (UAS) imagery for the optimization and assessment of two primary traits in grain sorghum [Sorghum bicolor (L). Moench], namely, grain yield and plant health (based on anthracnose scores) was conducted. By evaluating multiple flight dates across the growing season via multispectral UAS‐based imagery, a set of scenarios composed of combinations of flight dates and vegetation indices were constructed for analysis. In this sense, results showed no increase in predictive ability when combining multiple vegetation indices. Hence, using only an index with a higher predictive ability (e.g., normalized difference vegetation index (NDVI) or modified simple ratio (MSR) for plant health with 0.75; and any tested index but chlorophyll index (CIg) for grain yield with ∼0.55) is recommended. Likewise, the combining of multiple flights did not result in a significant increase in predictive ability for either primary trait. Thus, we observed that a single flight for each trait (e.g., 121 d after sowing with 0.81 for plant health; 104 d after sowing with 0.59 for grain yield) was optimal. Concerning, the predictive algorithms examined, partial least squares regression (PLSR) and neural network, results were similar, with PLSR generally outperforming. In addition, we discuss our findings from an application standpoint of a field‐based breeding program and suggest additional optimization options.


INTRODUCTION
Given the advancements in plant genotyping during the last decade, plant phenotyping is now the limiting factor in the ability to determine the genetic control of important phenotypic traits (Araus & Cairns, 2014). This fact has generated interest in determining how compensation for the lack of high-quality phenotypic information can be obtained (Cobb, DeClerck, Greenberg, Clark, & McCouch, 2013). High-throughput phenotyping or simply, phenomics, has thus been deployed to measure features of plants (Araus & Cairns, 2014). Vegetation indices (band ratios) collected with HTP can be used as secondary traits for indirect selection of main or primary traits (e.g., grain yield [GY]). This indirect selection is based on specific features (e.g., biomass via NDVI; chlorophyll content via a CIg) that compose agronomically relevant traits (Li & Shi, 2018). In sorghum [Sorghum bicolor (L.) Moench], multiple indices have been shown to be correlated with overall plant health (Pugh et al., 2018;Stanton et al., 2017), GY (Pugh et al., 2018;Shafian et al., 2018;Stanton et al., 2017), leaf area index (Shafian et al., 2018), senescence (Potgieter et al., 2017), fractional vegetation cover (Shafian et al., 2018), biomass yield, and chlorophyll and nitrogen content (Li & Shi, 2018).
The response of an organism to a set of multidimensional exogenous and endogenous signals that occur in an integrated way throughout its development is termed the phenome (Cobb et al., 2013), which is highly conditional and dynamic. Consequently, the association of primary and secondary traits throughout the life cycle of a plant depends on a series of predictable and unpredictable events. Gene expression is temporal; canopy features and their genetic control change during crop development (Feng et al., 2017). Furthermore, singular events such as droughts can also change this relationship (Shi et al., 2017). Both predictable and unpredictable events condition traits that can be captured by HTP platforms with high temporal resolution. However, with higher temporal resolution, there is an expected increase in the redundancy of information along with the need for better storage and processing equipment to handle the astonishing amount of newly generated data. This raises questions on the usefulness of multiple measures based on the complementarity of secondary traits at multiple phenological stages and its implications for understanding or predicting primary complex traits .
Sorghum figures in the list of the most important grain crops worldwide along with rice (Oryza sativa L.), maize (Zea mays L.), barley (Hordeum vulgare L.), and wheat (Tritcum aestivum L.) (FAO, 2011). Hence, sorghum breeding is an essential approach towards food security. In this sense, GY and anthracnose resistance are two com-

Core Ideas
• Optimization of grain yield and plant health high-throughput phenotyping was possible. • Combining flights or vegetation indices did not increase predictive ability of primary traits. • Using a single flight (121 DAS) and a single index (NDVI or MSR) maximized predictive ability. • PLSR outperformed neural networks for predicting primary traits in most scenarios.
plex traits of high importance. Anthracnose (caused by Colletotrichum sublineola Henn. ex Kabát & Bubák) is a fungal disease that drastically affects plant health and limits yield potential in specific environments (da Costa et al., 2011), with resistance being oligogenic in nature (da Costa et al., 2011;Patil et al., 2017). Symptoms commonly appear as circular lesions with red, orange, tan, or purple margins around straw-colored centers (TeBeest, Kirkpatrick, & Cartwright, 2004). Other symptoms include premature defoliation, limitation of nutrient movement, and reduction of kernel number and size (Mofokeng, Shimelis, Laing, & Shargie, 2017). Grain yield is dependent upon various characteristics and is polygenic in nature (Boyles et al., 2017). As these traits are correlated (Pugh et al., 2018), an identification of the appropriate phenological stage(s) at which the signal of genotype effect is maximized for both would lead to significant improvement of HTP-based selection. The relevant cost of phenotyping for both traits makes it a suitable subject for a proof-of-concept study using HTP, thus producing results of benefit to breeders in the decision-making process. The aims of this study were to: (a) determine the optimal number of evaluation dates needed to predict anthracnose resistance (plant health) and GY in sorghum hybrids; (b) assess the most appropriate (combination of) vegetation indices based on multispectral imagery in the prediction of anthracnose (plant health) and GY in sorghum hybrids; (c) compare algorithms for the prediction of primary traits (plant health and GY); and (d) verify the utility of these tools for sorghum breeding programs.

Germplasm and experimental design
A set of 32 experimental and open-pedigree sorghum hybrids were evaluated during the summer of 2018 in College Station, TX (30 • 32′33.4″ N, 96 • 25′46.6″ W) using a randomized complete block design with three replications. Experimental plots were sown on 3 April (0 d after sowing [DAS]) and consisted of two 5.5-m plots with a row spacing of 0.762 m, an approximate sowing density of 200,000 plants per hectare. Fungal inoculations of anthracnose were administered 45 DAS by placing infested sorghum seeds within the plant whorl. Inoculate was prepared according to the protocol presented by Pugh et al. (2018). The trial was rainfed, and standard agronomic practices for grain sorghum were used; these included preplant fertilization at 0.1 Mg ha −1 and a seed-applied fungicide to prevent early seedling death due to fungal infections. Additional information on weather and climate conditions (Supplemental Figure S1) were obtained using nasapower (Sparks, 2018) under the EnvRtype R library (github.com/allogamous/EnvRtype). Traits evaluated in this experiment hereafter are referred to as primary (ground truth or target) and secondary (surrogate or remotely acquired).

Phenotyping of primary traits
Anthracnose severity was measured using the area under the disease progress curve (AUDPC).To create the AUDPC, anthracnose incidence and severity scores (0-9) were recorded as the proportion of plants in a given plot showing typical symptoms of anthracnose infection, i.e., grey diseased tissue and or necrotic lesions (Pugh et al., 2018). The utilized scale was adapted from the original to include level 0 for no symptoms. Anthracnose measurements were recorded on 11 different dates throughout the growing season (57,65,70,77,83,91,99,104,112,119,and 127 DAS). Anthracnose incidence and severity were then utilized to estimate an anthracnose AUDPC score for the ith plot by using where n is the number of evaluations, k is the time point, y is the score (disease intensity or incidence), and t is the time at which the score was taken. The higher the AUDPC value, the lower the disease resistance. At the end of the field season when grain moisture content was <15%, the GY was measured by mechanically harvesting plots. For reporting purposes, the GY (Mg ha −1 ) of all entries was adjusted to 13% moisture.

Phenotyping of secondary traits
Image acquisition for vegetation index estimation was performed using small unmanned aerial systems equipped with a MicaSense RedEdge multispectral camera (blue: 475-nm center and 20-nm bandwidth; green: 560 and 20 nm; red: 668 and 10 nm; red edge: 71 and 10 nm; near infrared: 840 and 40 nm) or a SlantRange 3P multispectral camera (green: 560 and 40 nm; red: 655 and 35 nm; red edge: 710 and 20 nm; near-infrared: 830 and 110 nm). As different cameras were utilized, flight altitudes were adjusted to obtain similar ground resolutions. Data captured by UAS were collected across five separate dates beginning at 71 DAS and ending on 129 DAS ( Table 1). The flight dates were selected based on the expected and observed expression of disease symptoms. Images were collected ±1-2 h of solar noon, minimizing inconsistencies due to solar angle and light scattering, and were subsequently radiometrically calibrated for comparison across multiple flight dates using reflectance panels and onboard illumination sensors. Flights were projected to achieve image overlapping of 75-80%. Additionally, ground control points were placed around and at the middle of the study site and coordinates recorded using post-processing kinematic (PPK) GPS for precision georeferencing. Additional information regarding data collection is presented in Table 1. Data acquisition and mosaicking were performed by the Unmanned Aerial System (UAS) Remote Sensing, Digital Agriculture Program at the Texas A&M AgriLife Corpus Christi, TX, research and extension center. Image processing was performed in Agisoft Photoscan Pro (Agisoft LLC) Version 1.4.3 (Build 6529). Processing was comprised of alignment and optimization of image position, generation of point clouds, and generation of digital surface models and orthomosaics. Mosaics were imported into QGIS software (Version 3.02 Girona; QGIS Development Team, 2018) where plot boundaries were drawn using v.mkgrid, a function of GRASS plugin (Version 7.4.0; GRASS Development Team, 2018). A negative buffer of 0.15 m was drawn to minimize neighboring effects of plot leaf overlap using the internal Buffer function. Orthomosaics and plot shapefiles were loaded and processed in the R environment (R Development Core Team, 2011) using raster (Hijmans et al., 2015) and sp (Pebesma & Bivand, 2005) packages to extract relevant vegetation indices (Table 2). The index value is given by the median of all pixel values in a given plot.

Phenotypic analysis
Primary and secondary traits were subjected to individual analysis using linear mixed modeling for the TA B L E 1 Days after sowing (DAS) on which flight was performed, accumulated growing degree days (GDD), estimated growth stage, sensor, altitude (relative to ground), ground resolution of orthomosaic, number of ground control points (GCPs), and accuracy of five flights over the sorghum trial. Ground resolution and accuracy were estimated in Agisoft Photoscan Pro averaged across all aligned images  Vanderlip and Reeves (1972) and Gerik, Bean, and Vanderlip (2003).

Index Acronym Formula
Chlorophyll index CIg Red-edge modified simple ratio Gitelson et al. (2003); CIre from Gitelson, Viña, Ciganda, Rundquist, and Arkebauer (2005); MSR from Chen (1996); MSRre from Wu et al. (2008); NDVI from Rouse et al. (1973); NDVIre from Gitelson and Merzlyak (1994). CIre, MSRre, and NDVIre are the red-edge counterparts of CIg, MSR, and NDVI, respectively. estimation of variance components and genotypic adjusted means. Adjusted means were estimated using ordinary least squares, while variance components were estimated using restricted maximum likelihood/best linear unbiased prediction (Henderson, 1975) with the ASReml-R package (Version 3.0; Butler, Cullis, Gilmour, & Gogel, 2007). The model structure was where y is the vector of phenotypic observations (primary or secondary traits); β is the vector of the fixed effects of replication added to the overall mean-for the analysis of secondary traits, the effect of flight date was also added as fixed; r is the vector of row within replication and is regarded as random [ ∼ (0, σ 2 )]; c is the random effect of column [ ∼ (0, σ 2 )]; g is the vector of the adjusted means for hybrids; ε is the vector for error [ε ∼ (0, σ 2 ε )]; and X, H, V, and T are the incidence matrices that relate the independent vectors to the response variable y. Wald's and a likelihood ratio test were utilized for the assessment of significance of fixed and random factors, respectively. To obtain genotypic variance components, the previous model was run treating g as a random effect [ ∼ (0, σ 2 )]. Genotypic correlations (r ps ) between primary and secondary traits were estimated based on covariances obtained by fitting bivariate models following the same factor structure as the aforementioned model. Both treatment and residual variances from the single-trait analysis were fed as initial values to facilitate convergence. Once covariances were obtained, genotypic correlations were estimated using where cov ps is the covariance between primary and secondary traits; σ 2 gp is the genotypic variance of the primary trait; and σ 2 gs is the genotypic variance of the secondary trait.

Phenotyping quality and indirect selection efficiency
To assess overall data quality, both repeatability and experimental accuracy estimates for primary and secondary traits were estimated. Repeatability (R) estimates were conducted at the plot level and on an entry-mean level as where r is the number of replicates. Experimental accuracy was estimated as given that CV R = CV g /CV ε , considering CV = ( √ σ 2 ∕μ)100 and CV ε = ( √ σ 2 ε ∕μ)100, where μ is the grand mean of the trait (de Resende & Duarte, 2007).
Repeatabilities and genotypic correlations were used to estimate the efficiency of indirect selection using where RS s p is the indirect response to selection; RS p is the response to direct selection for the primary trait; R is is the plot level repeatability of the secondary trait; R ip is the plot level repeatability of the primary trait; and r ps is the genotypic correlation between the primary and secondary trait. Additionally, we estimated the coincidence of selection between direct and indirect selection based on rankings. To minimize the influence of the intensity of selection (IS), values ranging from 10 to 50% (IS = {10, 15, . . . , 50%}) were utilized.

Phenomic prediction of primary traits
To determine the utility of vegetation indices, flight dates, and their specific combinations for predicting the primary traits, three prediction structures were analyzed regarding the utilization of the secondary traits: F I G U R E 1 Example structure of the utilized neural network. The structure is composed of an input layer (predictors; black circles), two hidden layers of three and two neurons (gray), and one output layer (red). Input data (ST1, ST2, ST3, ST4) are the values of indices or spectral bands of a plot. Output data (MT) is the network prediction of a primary trait 1. A single index but with multiple flight dates as predictors (6 indices × 31 combinations of dates × 2 traits = 372 scenarios); 2. Combinations of indices in a single flight date as predictors (5 dates × 63 combinations of indices × 2 traits = 630 scenarios); and 3. Spectral bands as predictors in a single flight date (5 dates = 5 scenarios).
These structures generated multiple prediction scenarios, for example, NDVI from flights at 71 and 129 DAS as predictors of AUDPC, or NDVI and MSR from the last flight as predictors of GY. The number of secondary traits utilized for predicting the primary trait varied depending on the use case. To accommodate these three structures, prediction methods included supervised machine-learning (ML) algorithms and partial least squares (PLSR).
The ML method was a neural network with resilient backpropagation and weight backtracking (RPROP algorithm; Riedmiller & Braun, 1994) fit with the R library neuralnet (Version 1.33; Fritsch and Guenther (2016)). The network structure was created using the neuralnet function and was composed of an input layer with varying number of neurons depending on the scenario, two hidden layers of three and two neurons, and one output layer to be compared with the ground-truth (Figure 1). Predictors (input data) were used as the values of indices or spectral bands. The number of hidden layers and their neurons were selected based on preliminary tests (data not shown). A reduction of error between iterations of <1% (threshold of 0.01) as the stopping (convergence) criterion was administered. The entire training population was used in a single batch. Parameters of the model were then optimized using the compute function.
Partial least squares regression was also utilized for the prediction of primary traits when secondary traits were used as predictors. This method was implemented using the pls (Version 2.7-0) R library (Mevik & Wehrens, 2015). The regression model was applied as = + where y is the vector of observations of the primary trait; α is the scalar/vector of regression coefficients of latent variables; ε is the vector of errors [ ∼ (0, σ 2 ε )]; and W is the weight matrix relating the primary and secondary traits, which was obtained by the joint single value decomposition of primary and secondary traits. To determine the optimal number of latent variables (components), a permutation algorithm was utilized as described by Mevik and Wehrens (2015) using the selectNcomp function. These latent variables were utilized as predictors in the validation.
Prior to neural network and PLSR analyses, the data (primary and secondary traits) were normalized to a 0-1 range using R's scale function. The assessment of relevant parameters to the predictive ability of the models in each scenario was completed using the following steps: 1. Plots were randomly allocated in training (75%; TS) and validation populations (25%, VS). 2. The models were trained based on the training population information. 3. Validation population plotŝ= {̂1, … ,̂v s } were predicted. 4. Predictive ability (̂; Pearson's product-moment correlation between predicted and observed values) was estimated. 5. Steps 1-4 were replicated 10 times =1 = { 1 , … , 10 } with common training and validation populations between scenarios, = {VS , TS }, where is the replication number.
The effect of methods and scenarios on predictive abilities was tested using analysis of variance, considering each repetition of the validation as a replication. As the distribution of predictive abilities (correlations) tended to be skewed, they were normalized using Fisher's z transformation, = (1∕2)ln[(1 +̂)∕(1 −̂)], prior to the analysis. Additionally, Tukey's test was used to compare mean predictive abilities of scenarios using the agricolae Version 1.2-8 library (De Mendiburu, 2016). Although the analyses were performed with transformed data, the results are presented with untransformed correlation values.

Exploratory analysis and data description
Anthracnose symptoms were minimal until 70 DAS, after which there was a visual linear increase in incidence and severity until the last rating at 127 DAS (Figure 2). On a plot basis, anthracnose ratings ranged from 0 (no presence of disease) to 8 (severe disease) at 127 DAS. The AUDPC values derived from these recordings varied from 0 to 304.50 with an overall mean of 100.25 (Supplemental Figure S2). Grain yield ranged from 4.95 to 9.73, with a mean of 7.66 Mg ha −1 . Spatial dissemination of the anthracnose progression can be observed from the recorded anthracnose incidence/severity scores with time (Supplemental Figure S3). Hence, preliminary phenotypic analyses were carried out on primary and secondary traits to assess the influence of correlated errors on model fitting (Akaike criterion), which did not result in consistent modeling improvements.
Minimum  Figure S2); however, most indices presented a decrease in mean value throughout the season. This same pattern was not present for the red-edge-based indices, where a meaningful increase from 71 to 93 DAS was observed. Additionally, when this band was utilized for index calculation, lower mean values were obtained in each flight. The adjusted means of primary and secondary traits is also reported (Supplemental Figure S4).

Factor significance testing and data quality
Based on the results from the phenotypic analysis, genotypic variation existed for all traits (Supplemental Table  S1). While replication partitioned some variation from the error term, the row and column factors were not significant. Flight date, a secondary-trait-related source of variation, was significant for all evaluated indices.
Data quality information was assessed using the repeatabilities and experimental accuracy as metrics (Table 3). Repeatabilities on a plot basis were moderate for GY and high for anthracnose (AUDPC). Estimates of repeatability F I G U R E 2 Boxplot of anthracnose scores on sorghum hybrids at 11 time points. Dark red points are the values of plots and solid black points are the scores considered outliers for secondary traits ranged from moderate to high, generally increasing in later stages of evaluation. Overall, the red-edge-based indices had the highest values. Repeatabilities on an entry-mean basis were high for both primary and secondary traits. Experimental accuracy was high for primary and secondary traits regardless of flight date according to the scale proposed by de Resende and Duarte (2007).

Indirect selection based on spectral indices (secondary traits)
Multi-trait (bivariate) modeling revealed genotypic correlations between primary and secondary traits (Figure 3). Spectral indices were positively correlated to GY and negatively correlated to AUDPC. Correlations of spectral indices to AUDPC increased in magnitude as the flight date neared crop maturity. For GY, values were highest at 104 and 121 DAS and decreased thereafter. Maximum correlations to AUDPC were observed at 121 DAS using NDVI (−0.89) and to GY at 104 DAS (0.69) using the same index. Genotypic correlations to AUDPC were generally higher than those to GY, especially after 93 DAS. No index had higher correlations with any primary trait across all measured flight dates, suggesting specific index × flight date interactions. The genotypic correlation between GY and AUDPC traits was −0.55.
Indirect selection using secondary traits appears to be effective in some situations. For GY, CIg (129 DAS flight), MSR, MSRre, NDVI, and NDVIre (121 DAS) resulted in an indirect selection to direct selection efficiency ratio ≥1.00 (Table 4). For AUDPC, this was observed for CIre, MSR, MSRre, NDVI, and NDVIre (121 DAS) and for MSR, MSRre, NDVI, and NDVIre (129 DAS) ( Table 4). The coincidence of selection presented distinct results depending on flight × index combinations and selection intensity. Nevertheless, no combination of index and flight resulted in clear superiority across all selection intensities (Supplemental Figure S5).

Validation of prediction models
Predictive abilities obtained from the validation process are presented by index (Supplemental Figure S6) and by flight (Supplemental Figure S7) for primary traits. These values were regarded as a response variable and modeled in an analysis of variance with method, scenario (which is combinations of indices or flights), index or flight, and double and triple interactions as fixed explanatory variables (Table 5). For clarity purposes, double and triple interactions are presented but not discussed in greater detail due to the number of levels. For the "by index" validation scheme, the method, scenario, index, and method × index effects were statistically determinant in their predictive ability for AUDPC. For GY, similar factors were significant, in addition to the method × scenario interaction. Overall predictive ability means of indices varied from 0.62 to 0.75 for AUDPC and from 0.50 to 0.55 for GY (Table 6). Tukey's test revealed that NDVI and MSR were the best indices for predicting AUDPC, while CIg was the worst predictor. For GY, all indices were statistically similar except for CIg, which had the lowest predictive ability. Statistical differences for GY were minimal for the tested scenarios, while more statistical differences were observed for AUDPC (Supplemental Table S2). It is possible that information from early flights decreases predictive ability and should not be included in the model. Regarding the prediction methods, PLSR outperformed the machine learning method for predicting both AUDPC and GY (Table 7).
In the "by flight" (Table 5)  ability. For GY, flight and method × flight were the only significant factors. Predictive ability means of flights for GY ranged from 0.16 to 0.59, while for AUDPC, they varied from −0.07 to 0.81 (Table 8). Tukey tests indicated that 121 and 104 DAS were the most useful in predicting AUDPC and GY, respectively. Scenarios (combination of spectral indices) examined resulted in statistically similar predictive abilities, except CIg alone for AUDPC, which had the lowest mean (Supplemental Table S3). Concerning the prediction method, PLSR was more effective than the machine learning method for AUDPC but not for GY (Table 7).

Index development and assessment
Multiple scenarios were built to assess the optimum combinations of spectral indices and flight dates to predict GY and AUDPC. The predictive abilities of scenarios with either single or multiple indices were not different in their explanation and prediction of either GY or anthracnose resistance. This corroborates the findings of Li & Shi (2018) on combinations of spectral indices for the prediction of sorghum biomass. Despite being developed for specific applications, the utilized spectral indices invariably target common traits such as chlorophyll content (Babar et al., 2006;Gitelson, Gritz, & Merzlyak, 2003, leaf area TA B L E 4 Efficiency of indirect selection for hybrid sorghum grain yield (GY) and anthracnose area under the disease progress curve (AUDPC) using vegetation indices calculated from five flights (71, 93, 104, 121 and 129 d after sowing  index (Chen & Street, 1995;Xie et al., 2018), biomass (Babar et al., 2006;Rouse, Hass, Schell, & Deering, 1973), and other health and yield component traits. These indices have different saturation points (Wu, Niu, Tang, & Huang, 2008), with potentially different responses to optical and geometrical properties (Chen & Street, 1995), sensitivity to chlorophyll changes (Xie et al., 2018), and atmospheric and environmental influences (Myneni & Asrar, 1994). However, they did not increase the predictive ability when combined. In this study, indices were not complementary under the utilized methods and algorithms. Hence, the most parsimonious option is to identify the single index that provides the higher predictive ability for each primary trait. For AUDPC, statistical differences of predictive abilities of indices were more common than for GY. In validation, NDVI and MSR were superior for the prediction of AUDPC. This is consistent with reports by Pugh et al. (2018), who showed a strong linear relationship TA B L E 7 Tukey's HSD test based validation predictive abilities of anthracnose area under the disease progress curve (AUDPC) and grain yield (GY) using two prediction methods (machine learning [ML] and partial least squares regression [PLSR]) on sorghum hybrids between NDVI and AUDPC scores. Huang et al. (2014) compared a set of indices for monitoring disease in wheat and found that NDVI and MSR were good at classifying healthy and diseased leaves (accuracies of 83.6 and 82.9%, respectively). Within the same species, NDVI has been reported in quantitative trait loci mapping studies to be associated with spot blotch (Kumar et al., 2016) and stripe rust (Pretorius et al., 2017) resistance. For GY, besides CIg, all indices presented similar predictive abilities. This index had the lowest values despite its higher genotypic correlation to the primary trait for two flight dates (121 and 129 DAS). The relationship between NDVI and GY in sorghum has previously been reported (Pugh et al., 2018). Moreover, Shafian et al. (2018) found NDVI to be the best predictor among a range of indices. In maize, Torino, Ortiz, Fulton, Balkcom, and Wood (2014) reported good results using NDVIre and CIre for predictions of GY. Thus, examples of the robustness of these indices at predicting GY with considerable ability are present.
In this study, we used multiple well-known established indices for the prediction of AUDPC and GY. Nevertheless, band values (by date) were also tested for the determination of their predictive ability of the evaluated primary traits (Supplemental Figure S8). The highest mean for AUDPC (0.48 at 121 DAS in PLSR) and GY (0.57 at 129 DAS in neural networks) was lower than the best scenarios presented when using indices (0.84 at 121 DAS using NDVI and MSR in PLSR for AUDPC; 0.63 at 129 DAS using MSRre in ML). Handcrafted indices resulted in better prediction abilities, even though the utilized prediction methods are able to internally reproduce their equations and explain the variation using band values. Studies conducted on other crops have reported that hyperspectral imagery was found to have better predictive abilities with bands than with individual indices, suggesting that index estimation leads to information loss (Aguate et al., 2017;Montesinos-López et al., 2017). However, all bands available to our research were utilized for index estimation, which did not happen as reported in the aforementioned reports. Hence, additional studies in sorghum using hyperspectral data are warranted.
A series of secondary considerations can be drawn from this study that may be useful for researchers when working with vegetation indices in grain sorghum: (a) plots should have at least two rows due to high interplot leaf overlap (Supplemental Figure S9A and S9B); (b) whenever genotype is an explanatory variable in a primary-secondary trait correlation study (e.g., NDVI and GY), the number of treatments should be large enough so non-causal relationships are minimized; and (c) both panicles and disease symptoms have lower index (e.g., NDVI) values than healthy leaves for all tested indices (Supplemental Figure S9C and S9D). Hence, the association of indices with GY or AUDPC-an association between plant health or vigor and yield or disease resistance-may be somewhat compromised. Therefore, one could use further image treatments and segmentation techniques (which was not in the scope of our study) to improve the correlation between primary traits and spectral indices.

4.2
On the best (combinations of) flight date(s) As shown above, prediction scenarios were built for determining the usefulness of remotely sensed temporal data for predicting GY and AUDPC. Results suggest that combining measurements of a spectral index from multiple flights does not result in a statistically significant increase in predictive ability for both primary traits. Aguate et al. (2017), working with hyperspectral data in maize, reported that higher predictive abilities of GY were obtained when data from all the time points (five flights) were combined. Zhou et al. (2017), estimating GY in rice, also found that combining flight dates resulted in increased correlation estimates. To our knowledge, however, no other result is available on the combination of dates using multispectral imagery for sorghum hybrids.
Regarding the dynamics of spectral indices during the crop cycle, the amplitude of their adjusted means increased from the first to the last evaluation date (Supplemental Figure S4). This behavior matches the disease progress scores (Supplemental Figure S3). Nevertheless, as genotypic variances for spectral indices respectively increased throughout the growth cycle (Supplemental Figure S10), the increment in genotypic correlations between primary and secondary traits ( Figure 3) resulted from boosts in the genotypic covariance (Supplemental Figure S11). Hence, as indirect selection relies on the repeatability of the primary and secondary traits and their covariance, better predictive abilities are expected at later growth stages.
In this sense, our results suggest that 104 DAS for GY and 121 DAS for AUDPC were the best dates to collect the secondary traits for prediction. These findings corroborate those of Pugh et al. (2018) for AUDPC yet contradict those of Shafian et al. (2018) for GY. Shafian et al. (2018) reported that index assessment at the flowering stage was the best estimator for this trait. However, GY reduction is a known symptom of anthracnose infection (Mofokeng et al., 2017). Hence, stronger associations between GY and secondary traits are expected when the widest range of disease resistance levels are present (final stages). Finally, this study used data from a single year and location. Given that environment and genotype × environment interaction affect anthracnose resistance (Patil et al., 2017), results will likely vary from year to year. However, the field conditions in this evaluation were consistent with a normal production year (Supplemental Figure S1).

On the best prediction method
Partial least squares regression is the method of choice for many remote sensing scientists due to its ability to deal with highly correlated parameters such as spectral bands and temporal data (Aguate et al., 2017;Thorp et al., 2015;Weber et al., 2012). Recently, interest in ML has increased in biological fields due to its ability to extract hidden patterns and structures, learn from data sets, and adapt to iteratively increase predictive ability (Angermueller, Pär-namaa, Parts, & Stegle, 2016;Liakos, Busato, Moshou, Pearson, & Bochtis, 2018). In this study, PLSR was generally more effective than ML in predicting both GY and anthracnose incidence, and the best results were achieved with single flight dates and indices. In these cases, PLSR is equivalent to a conventional linear model. The superiority of the benchmark model was also reported by Montesinos-López, Montesinos-López, Gianola, Crossa, and Hernández-Suárez (2018), for whom the ML method was inferior to genomic best linear unbiased prediction. Given that the accuracy of neural networks potentially improves as the number of observations increases, additional studies with different parameters and higher number of observations should be completed to further elucidate the effective use of ML methods.

Application of remote sensing data for plant breeding
The results presented here indicate that both GY and foliar disease ratings (mostly due to anthracnose) can be estimated with considerable accuracy using HTP traits. In addition, we also demonstrated that the ability to select depends on spectral indices and specific flight dates. However, the difference in mean predictive ability of GY from the flights at 104 (best for GY) and 121 DAS (best for AUDPC) was only 0.02. Thus, the prediction of both GY and AUDPC using a single flight at 121 DAS and a single index (NDVI or MSR) should suffice and would potentially increase selection efficiency. Ultimately, the user must adjust the methodology so it is best utilized within their breeding program. For example, one could carry indirect selection at early stages if time (e.g., pre-flowering selection) is more relevant than predictive ability.
Increasing the rate of genetic gain is an important goal of any breeding program. Indirect selection should be considered when it is faster or cheaper than direct selection. In this study, the most predictive flight date (121 DAS) combined with the best general indices (NDVI and MSR) yielded selection efficiencies of ∼1.03 for GY and ∼1.02 for AUDPC. In this situation, selection based on secondary traits may lead to greater genetic gain than direct selection on the primary trait. Additionally, further increases in the efficiency of a breeding program may be achieved when one considers the relative cost of HTP for phenotyping, the advantage of earlier selection windows, reduction of subjectivity in measurements, and its ability to exploit genetic variability by increasing the effective size of the breeding population (Araus, Kefauver, Zaman-Allah, Olsen, & Cairns, 2018).
Further breeding inferences rely on the coincidence of selection. In this study, there was a moderate correlation between AUDPC and GY (−0.55), and only a few hybrids were best for AUDPC and GY at high selection intensities (Supplemental Figure S12). Thus, while high yield was associated with disease resistance, not all genotypes with good disease resistance had high GY. Nonetheless, in some situations there was a considerable coincidence of direct and indirect selection for both traits, and the coincidence increases as selection intensity decreases. This follows patterns similar to those previously presented regarding genomic prediction studies (Galli et al., 2018;Matias, Galli, Granato, & Fritsche-Neto, 2017). In this sense, spectral indices may serve as a form of phenomic selection where, after optimization, selection of hybrids could be based on a series of remotely sensed traits associated with breeding values in the same way that genomic selection is applied to genotypes. Like genomic selection, phenomic selection is likely more useful for eliminating undesirable genotypes than for selecting superior genotypes. Ultimately, this study demonstrated it is possible to optimize data collection and processing to increase the efficiency of phenotyping in grain sorghum.

A C K N O W L E D G M E N T S
We thank Luiz de Queiroz College of Agriculture (University of São Paulo); Texas A&M University; Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-Brasil (CAPES), Finance Code 001; and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for the (financial) support.