Opportunities and challenges in phenotyping row crops using drone‐based RGB imaging

Developing the resilient crops of the future will require access to a broad set of tools. While advances in sequencing and marker technologies have facilitated marker‐trait associations and the ability to predict the phenotype of an individual from its genotypic information, other tools such as high‐throughput phenotyping are still in their infancy. Advances in sensors, aeronautics, and computing have enabled progress. Here, we review current platforms and sensors available for top‐down field phenotyping with a focus on unoccupied aerial vehicles (UAVs) and red, green, blue sensors. We also review the ability and effectiveness of extracting traits from images captured using combinations of these platforms and sensors. Improvements in trait standardization and extraction software are expected to increase the use of high‐throughput phenotyping in the coming years and further facilitate crop improvement.


INTRODUCTION
With current rates of population growth and ongoing climate change, advancements in crop improvement and production are crucial. Advances in sequencing technology have driven the development of genome sequencing and marker platforms that can elucidate the genetic mechanisms of complex agronomic traits to speed up crop improvement through selection of quantitative trait loci (QTL) (W. Yang et al., 2020). Advancing crop improvement also requires accurate trait acquisition on top-down phenotyping due to UAVs' advantages of speed, ease of use, and high spatial coverage.

CURRENT PLATFORMS AND SENSORS FOR COLLECTING FIELD PHENOTYPIC DATA
There are many vehicle and sensor combinations available for collecting top-view images of crop fields. Vehicles range from ground machines that navigate through fields to aerial machines that navigate at heights ranging from close to the canopy (10 m) to higher altitude land observation satellites (currently up to 1,334 km) (Fu et al., 2020). There are a wide range of sensor options for spatial (pixel size) and spectral (number of spectral bands) resolutions. Each vehicle and sensor has advantages and disadvantages that impact the amount of time it takes to image the field, as well as the spatial, temporal, and spectral resolution that can be achieved ( Figure 1A-C), and the type of data that can be extracted from it.

Ground-sensing vehicles
Ground-based vehicles, including tractors, carts, or sprayers, can be co-opted for high-throughput phenotyping. Converting these ground vehicles for image-based phenotyping has been achieved by adding imaging sensors directly to the vehicle, beams, or an attached trailer that extends above the crop canopy (Barker et al., 2016;Busemeyer et al., 2013;Comar et al., 2012;Svensgaard et al., 2014;White & Conley, 2013). Ground-based platforms are able to achieve high spatial resolution since their sensors operate close to the crop canopy, are able to support multiple sensors, and have limited payload restrictions. However, they are difficult to implement at large scales due to the time to traverse through the entire field, resulting in limited ground space being covered, and often limited temporal resolution. Tractors are further constrained by their large size and low vertical clearance, which hinders their application over taller crops (Busemeyer et al., 2013;Comar et al., 2012;White et al., 2012). Other types of ground vehicles can be unoccupied, such as rovers or ground robots, that semi-automatically navigate the field, acquire data, and upload it without the need for extensive manual labor (Gage et al., 2019;Ruckelshausen et al., 2009;Young et al., 2019). These unoccupied machines include small vehicles that can be outfitted with side-scanning systems to measure traits within the canopy, as well as larger vehicles that primarily measure traits from above the canopy. While these platforms avoid some of the limitations associated with larger occupied ground vehicles and are often automated to reduce labor, they are still restricted by wet

Core Ideas
• High throughput phenotyping is an area of active development. • Aerial imagery from drones allows for high spatial, temporal, and potentially spectral resolution. • Variability in ability and effectiveness of extracting traits from images from different platforms and sensors are reviewed.
soil, weather conditions, and other common weather-induced events such as plant lodging, which can block the alleys through which they navigate (Tirado, Hirsch, et al., 2021). Unoccupied ground vehicles are also impacted by common obstructions such as weeds or rocks.

Fixed sensing platforms
Rail-based gantry systems or fixed phenotyping towers equipped with sensors eliminate the need for manual operation and are not affected by wet field conditions or obstructed alleys. They additionally have some of the highest spatial resolution and are the easiest to standardize due to their location-fixed nature. However, these platforms are limited to the installation area, which typically encompasses only a few acres, and tend to have high set-up costs Burnette et al., 2018;Virlet et al., 2016) that scale linearly for large-scale experiments (W. Yang et al., 2020).

Aerial sensing vehicles
Air-based platforms offer solutions to a number of the challenges faced by ground-based platforms, such as not being confined to specific areas or soil conditions, the ability to collect data with high temporal resolution in some cases, and the ability to traverse a range of field obstacles including rocks, weeds, and lodged plants. Air-based platforms include satellites, occupied aerial vehicles, and UAVs. Satellites have the ability to acquire very large-scale data over a multi-year range depending on the total time it has been in orbit. A current challenge with satellites is that most available satellite data comes from systems with a long visiting cycle, therefore low temporal resolution, low spatial resolution, and poor sensitivity under cloudy conditions (W. Li et al., 2015;Matese et al., 2015). The WorldView-3 satellite currently provides the highest spatial resolution of approximately 31 cm for shortwave infrared bands, which is useful for assessing regions of large production fields but is an insufficient resolution to extract  n.d.). Approaches that utilize generative adversarial networks and dense convolutional networks have started to be implemented to increase the spatial resolution of satellite imagery and achieve super-resolution (Ganguli et al., 2019;Tao et al., 2017;Wang et al., 2020). Manned aerial platforms can achieve higher spatial and temporal resolutions than satellites as they image closer to the crop canopy but are limited by wind and rain and can be constrained by high operating costs and operational complexity (G. Yang et al., 2017). UAVs, on the other hand, fly the closest to the canopy and can achieve subcentimeter spatial resolution depending on the UAV vehicle and sensor in use. Two broad categories of UAVs include multi-rotor and fixed-wing types, each of which have their own advantages and disadvantages. Multi-rotor UAVs are easy to fly, can easily perform autonomous flights, can fly close to the canopy due to maneuverability, and require limited landing/take-off space, but are limited in flight time and coverage area. Fixed-wing UAVs generally have better flight endurance and can cover much larger areas but require higher flight elevations and demand more flight expertise and larger landing/take-off areas (Hashemi-Beni et al., 2021). Regardless of the UAV vehicle chosen, UAV platforms are relatively low-cost and offer a wide range of commercially available models, as well as existing open-access platforms, such as Pix4D Capture, which requires limited operational expertise (Pix4D, n.d.). Unoccupied aerial vehicles provide an efficient means of gathering plant-level resolution information for multiple plots simultaneously (Araus & Cairns, 2014;Shi et al., 2016) but remain limited in battery life, payload, and the inability to fly during high wind, rain, extreme temperatures, or moving cloud conditions. Some of these challenges can be easily circumvented by using simple lightweight camera systems such as RGB sensors with a small payload and purchasing multiple batteries. Given the distinct advantages of UAVs to rapidly image fields with high temporal and spatial resolution, and without concerns for field obstructions, we will focus specifically on this vehicle platform for the remainder of this review.

RGB spectral sensors
Red, green, blue sensors are the most common spectral imagery sensors currently used and capture individual wavelengths within the red, green, and blue ranges encompassing the visible part of the electromagnetic spectrum ( Figure 1D) (Malacara, 2011). These sensors are inexpensive, easy to acquire, and widely used to extract morphological and color traits across many crop species in both indoor and outdoor settings (A. Feng et al., 2019;Enders et al., 2019;Varela et al., 2017;Watanabe et al., 2017). While RGB sensors do not provide wavelengths necessary for some plant health indices, they do provide economical and efficient means of collecting phenotypic data due to their small size, weight, and relatively minimal data storage and processing requirements. Although there is still significant room for advancement, extensive efforts have been made to improve RGB UAV imaging, processing, to extract useful parameters across crop types as described in more detail below.

Non-RGB Spectral sensors
Non-RGB sensors capturing spectral imagery mainly differ in spectral resolution or the wavelength range they measure. The wavelengths used in plant phenotyping typically range from 100 to 12,000 nm depending on the application ( Figure 1D). One of the most common non-RGB wavebands is the nearinfrared (NIR) waveband, which is between 740 and 1400 nm (Malacara, 2011). When collecting only one NIR waveband, NIR sensors are typically added on to RGB color sensors to extract the normalized difference vegetation index, which has been used to monitor canopy structure, light absorption, photosynthetic ability, and stress in plants (Gamon et al., 1995;Jin et al., 2017;Krause et al., 2020;Q. Zhang et al., 2012). Even a single NIR waveband added to an RGB camera increases both the cost and payload requirements for the UAV. It is also possible to collect multiple different wavebands in the NIR wavelengths for an increase in spectral resolution, which is one example of a multispectral or hyperspectral sensor. Multispectral and hyperspectral sensors can have tens to hundreds of spectral bands extending from ultraviolet (100 nm) through short wave infrared wavelengths (2500 nm) ( Figure 1D) (Garg, 2020;Malacara, 2011). This increased spectral resolution is used to observe traits beyond the human visible spectrum and monitor plant biochemical processes (Alisaac et al., 2019;Blackburn, 2007;Obeidat et al., 2018;P. Pandey et al., 2017). These sensors can also identify, classify, and quantify a range of biotic and abiotic stresses acting on plants (Adão et al., 2017;Alisaac et al., 2019;Obeidat et al., 2018;P. Pandey et al., 2017). Although hyperspectral and multispectral sensors can provide additional resolution for crop phenotypes, they are sensitive to ambient light and require controlled lighting conditions or extensive use of reflective standards. Difficulties in standardizing light makes implementation in field settings challenging (Tirado, Dennis, et al., 2021;Virlet et al., 2016). Moreover, hyperspectral sensors can generate terabytes of data in a short amount of time, which can lead to challenges with data storage (Vadez et al., 2015), as well as increased complexity in data analysis, which is often implemented with machine and deep learning analysis methods (X. Feng et al., 2020).
There are also thermal sensors that measure emitted infrared radiation in the thermal range (8000 to 12000 nm; Figure 1D) and have been used to monitor plant temperature which has been correlated to plant water status and changes in transpiration due to early pathogen infections (Calderón et al., 2013;De Swaef et al., 2021;Jones et al., 2002;Lapidot et al., 2019;Messina & Modica, 2020;Oerke et al., 2006;Sagan et al., 2019). A major limitation of thermal sensors is the influence of environmental conditions including wind, ambient temperature, and sunlight, which increases the difficulty in standardizing data collected from thermal sensors both temporally and spatially.

Nonspectral sensors
Light detection and ranging (LiDAR, active laser scanning) sensors measure the reflection of laser light by rapidly sending pulses of light and measuring the time for the light to travel and reflect back to the sensor. When equipped on UAVs flown above crops, the time and angle of return is used to calculate the distance traveled which is then translated to elevation. The pulse elevations give a 3D image of the target and show the ground by penetrating through the crop canopy. Initial work in LiDAR plant phenotyping was first used in forestry for canopy height, canopy volume, and biomass in large forest settings (Lim et al., 2003;Naesset, 1997). LiDAR has gained attention in crop phenotyping with work in multiple crops. It has been used to extract plant height and volume in cotton (Gossypium L.), maize (Zea mays L.), and wheat (Triticum aestivum L.) (Andújar et al., 2013;Eitel et al., 2014;Qiu et al., 2019;Sun et al., 2017), and LiDAR images were collected from 360˚to determine canopy structure, leaf area, and leaf angle in tomato (Solanum lycopersicum L.) (Hosoi et al., 2011). However, LiDAR is seldom ideal for collecting phenotypes that necessitate full plant structure or colors unless placed on a rover that travels under the canopy to capture the entire plant (Gage et al., 2019;Hosoi et al., 2011;Saeys et al., 2009). Additionally, LiDAR systems can be expensive and computationally intensive to implement (Wallace et al., 2012). Many morphological, biochemical, and stress-related traits have been successfully measured using a variety of sensors on the different platforms described above. As sensors continue to improve in efficiency, weight, and cost, more multispectral, hyperspectral, thermal, and LiDAR sensors will be used on UAVs. However, due to current payload restrictions and costs, current studies have focused largely on developing methods for extracting traits with RGB sensors. For the remainder of this review, we focus specifically on traits extracted using RGB sensors in conjunction with UAVs to varying levels of success, as this platform and sensor combination has been extensively developed due to its ease of use and cost effectiveness, which has democratized access across study systems and research groups.

MORPHOLOGICAL TRAIT EXTRACTION FROM UAV RGB IMAGES
Numerous important morphological agronomic traits have been estimated using RGB UAV platforms. The traits that can be measured need to be visible using a top-down approach, which limits measurements of lower canopy traits. The two main morphological traits that have been studied using RGB UAV imagery include plant height and canopy cover or greenness.

Plant height
In crop breeding programs, plant height is evaluated to aid in selection and field management decisions. The height of a plant is a compromise between access to light and allocation of resources to plant structure versus yield. Plant height has been used to estimate traits such as relative yield (Watan-  , senescence (Schirrmann et al., 2016), and biomass (Tilly et al., 2015). Conventionally, plant height has been measured from the ground to the top of the plant (or another point on the plant such as the flag leaf) with a measuring stick. This is a manual and time-consuming task. In recent years, researchers have shifted to UAVs to collect plant height to improve the speed and efficiency of plant height data collection (Han et al., 2018;W. Li et al., 2016;Thompson et al., 2019;Tirado et al., 2020;Watanabe et al., 2017). Most plant height data collected using RGB UAVs utilizes structure from motion (SfM) algorithms in downstream image analysis. Structure from motion algorithms have revolutionized crop phenotyping by overlapping 2D images to reconstruct a 3D image, and SfM algorithms match features present in multiple images to reconstruct the unknown 3D scene geometry, the camera positions, and the camera orientations (James & Robson, 2012;Snavely et al., 2008). Morphological and structural parameters, including plant height, can be extracted from the 3D reconstructions (SfM dense point clouds), orthomosaics, and digital surface models (DSMs), also called digital elevation models, which are created by these algorithms (Figure 2). Orthomosaics serve as a distortion-corrected image of the field in which areas of interest can be identified, while DSMs are a 2D representation that uses a color scale for the height elements, which provides a pseudo-color presentation of the field. Plant height metrics can be extracted directly from SfM point clouds and generally have higher accuracies, but point clouds consist of thousands of points and can exponentially increase with higher imaging resolution and overlap which makes their file size large, difficult to handle, and computationally intensive (W. Li et al., 2016). In contrast, DSM datasets are much smaller in size, require less hard drive storage space, and are less computationally intensive to process. These attributes provide an efficient means for plant height extraction, although their accuracy is typically lower than point clouds. In most methods to extract plant height, 3D reconstructions are made, plots are defined on the orthomosaic, and the altitude information is extracted from the DSM rather than the point cloud.
Digital surface model accuracy and quality depends on precise positioning information (Ruiz et al., 2013). For many aerial vehicles, the internal GPS is not accurate enough to generate an accurate DSM. Ground control points (GCPs) with high-quality GPS information placed in the field are used to compensate for UAV GPS inaccuracy. They can vary greatly in form, and there are a number of features that need to be considered when deciding on what to use for a GCP. Height, placement, size, brightness, and contrast to ground and plant material need to be evaluated to ensure visibility throughout the season. Ground control points also need to be robust against weather events and have a clear center from which accurate GPS measurements can be taken. In trait extraction pipelines, the GCP locations in images are used to transform 3D point clouds into real-world coordinates and rectify image positions before creating the DSM . Initial use of GCPs in SfM photogrammetry resulted in unexpectedly low correlations between UAV and manual measurements across multiple timepoints (R 2 = 0.22-0.71) (Bendig, Bolten, et al., 2013;Bendig, Willkomm, et al., 2013). This was attributed to low GPS accuracy in GCP locations, having GCPs that were shorter than crop heights, low image overlaps, and the need for image acquisition beyond field borders (Bendig, Bolten, et al., 2013;Bendig, Willkomm, et al., 2013). Current practices that emphasize better use of GCPs have improved DSM accuracy and UAV-derived height measurement extractions (Ruiz et al., 2013). Using these best practices, correlations between UAV and manual measurements exceeding R 2 = 0.99 have been observed (Holman et al., 2016).
Further issues can arise that affect the accuracy of height measurements from DSMs when there are inconsistencies in ground altitude, but several methods have been developed to improve this issue. One method identifies ground pixels and interpolates missing values to create a digital terrain model (DTM). The DTM is subtracted from plot DSM values to obtain plant height measurements for the canopy ( Figure 3A). This method has been useful for fields with bare ground areas in several crops (Table 1). This method has variable correlations between manual height measurements and UAV derived heights due to different crop species, growth stages, planting densities, and spatial resolutions (Watanabe et al., 2017). Shorter crops, later growth stages, higher plant densities, and higher spatial resolutions tend to have higher correlations (Holman et al., 2016). However, high plant density can cause low correlations if there are too few ground pixels to accurately estimate the DTM. In such cases, interpolation can be improved with the use of terrain modeling algorithms (Anderson et al., 2019). This method to improve ground accuracy is computationally intensive and can fail with wide canopy crops or high weed coverage.
Another approach to address ground accuracy is the difference-based method. In this approach, one DSM is subtracted from another to obtain height for any one flight date ( Figure 3B). The ground DSM is captured from a flight completed before the crops are growing, while the flight DSM captures the plant canopies on the day of interest (Acorsi et al., 2019;Belton et al., 2019;Chu et al., 2018). This method relies on accurate flight paths, DSM registration, and high DSM resolution for accuracy (Table 1). Subtraction is typically completed by using linear binning to average elevation values within a pixel (Chu et al., 2018). As pixel size increases, computational load decreases, but so does accuracy (Chu et al., 2018). A comparison of this method to more advanced point cloud methods used with LiDAR data found the difference-based method for plant height estimation resulted in lower genetic variation and repeatability in most circumstances (Anderson et al., 2019).
An alternate, less computationally intensive approach is the exposed alley subtraction method. In this method, ground points in the alleys surrounding the plot are identified based on the distribution of the height values within the plot and subtracted on a per plot basis to extract the height of the plot ( Figure 3C) (Tirado et al., 2020). Computing ground height for each individual plot reduces the high computation intensity required for interpolating a DSM for the entire field. A variant of this approach uses an adaptive triangulated irregular network ground classifier to split the field into segments, extract the lowest height value for each segment, triangulate these points, and interpolate the space in between them . This method does not require alleyways and may be able to handle small slopes better. However, it is more computationally intensive and has a larger chance of pixel misclassification, especially when the crop is early in development . Both of these methods, while successful, rely on the presence of unobstructed areas of exposed soil, which are not always available for all crops and field layouts (Table 1).
A final method for addressing ground accuracy is to use self-calibration for sites where there is no visible ground. This method estimates the ground level by fusing the UAV data with a small number of manual canopy height measurements ( Figure 3D) (Hu et al., 2018). An inverse distance weighted  interpolation algorithm predicts the ground level of unmeasured plots by using the estimated ground levels in the eight nearest measured plots (Hu et al., 2018). A drawback of this method is the reliance on manual measurements of 5-10% of the plots that require time and have a margin of human error associated with the measurements (Table 1) (Hu et al., 2018). However, when comparing the point cloud method, the ground reference method, and the self-calibration method for obtaining ground level, the point cloud and ground reference method resulted in lower R 2 values and lower repeatability than the self-calibration method (Hu et al., 2018). Aside from ground height, the accuracy of plant height measurements using RGB UAVs is also dependent on which pixels are chosen to represent the top of plants. This can be done by removing outlier points caused by lower canopy plant material by applying a moving cuboid filter across field-segments (Song & Wang, 2019). Each field segment is then broken into subcolumns in which plant height is estimated by subtracting minimum from maximum height within the sub-column. The values estimated in the subcolumns are then averaged to obtain the overall average for that segment (Song & Wang, 2019). Alternatively, a combination of highresolution 3D models and advanced segmentation algorithms that identify and determine height for individual plants can be used. In corn, the accuracy of this method was limited after the Vegetative 3 stage due to overlapping leaves. Despite this issue, a plant height accuracy of 89.2% has been observed with this method when validated using artificial corn plants emulating real world scenarios and real plants to the Vegetative 6 developmental stage. Development of algorithms to take leaf occlusion of growing plants and wind movement into account would increase the accuracy of phenotype extraction and reduce reconstruction noise (Zermas et al., 2020). Another approach extracts the top leaves of the maize canopy before plot height estimation by using plant segmentation and spatial Kriging interpolation based on multiple neighboring maximum pixels from multiple plants in a plot. This method identifies pixels from the top of the canopy, and the highest value is taken as representative of plant height at the plot scale with a high correlation to manual measurements (R 2 = 0.896), but in applying this method heights are overall underestimated (Han et al., 2018).
In addition to simply removing erroneous pixels, high accuracy can be achieved through optimal SfM metrics for DSM plant height extraction. Each method's accuracy varies by crop and study design. Common SfM metrics include the median height value (50th percentile), the 90th, 95th, and 99th percentiles, and the maximum height value (100th percentile) for all pixels in a plot. Morphological elements of taller grain crops, like occluding leaves, can generate SfM artifacts, or pixels with unrealistic height values. This makes their optimal percentile lower (i.e., ∼90th percentile) and can be variable based on growth stage, planting density, and image resolu-tion (Bendig et al., 2014;Madec et al., 2017;Watanabe et al., 2017). Canopy structure influences UAV-based measurement accuracy as the tallest vegetative parts typically move easily in the wind. Moving plant tissue decreases the accuracy during vegetative states (Tirado et al., 2020;Tirado, Hirsch, et al., 2021;Varela et al., 2017). The optimal SfM metric for highdensity small grains, such as barley and wheat, tends to be higher percentiles such as the 99th percentile due to the lack of pixel artifacts. These crops have achieved higher measurement accuracy than taller crops such as maize and sorghum (Bendig et al., 2014;Madec et al., 2017;Watanabe et al., 2017). Maximum and median height metrics, although frequently evaluated in studies, have been shown to be relatively inaccurate for plot height due to the presence of ground points, outlier points, or SfM artifacts (W. Li et al., 2016;Madec et al., 2017;Malambo et al., 2018).
In general, a trend of underestimation of UAV height values compared with ground measurements has been observed across most studies due to the inclusion of ground pixels in point clouds and the inability to capture sparse pixels pertaining to structures, such as in the canopy apex (Anderson et al., 2019;Holman et al., 2016;Madec et al., 2017;Varela et al., 2017;Watanabe et al., 2017). However, overestimation of values has also been reported and attributed to contamination from taller neighboring plots (Watanabe et al., 2017). Current best practices for plant height extraction rely on collecting images with high (85% or more) overlap over areas with an even distribution of GCPs. For best results, the locations of the GCPs should be consistent from one flight to the next and collected with high accuracy GPS. With the development of drones equipped with advanced GPS technology, such as real-time kinematic (RTK) positioning information, placement of GCPs may be reduced or eliminated.

Canopy cover
Similar to plant height, canopy cover is an important indicator of growth and its rates are associated with light interception, weed suppression, biomass accumulation, and yield potential (Campillo et al., 2008;Jannink et al., 2001;Purcell, 2000;Xavier et al., 2017). Canopy cover is commonly measured manually using a densiometer, which consists of a concave or convex mirror with 24 ĳ-inch squares engraved on its surface. Each square is then divided into four smaller squares that are each represented by an engraved dot. The number of dots with an obstructed view of the sky are then divided by 96 to obtain the percentage of canopy cover (Cook et al., 1995). This method, which is commonly used in forestry but can be applied to agricultural species, is time consuming and often overestimates canopy cover. Attempts to simplify and increase the speed of the process have been unsuccessful as accuracy issues still remain (Cook et al., 1995;Korhonen et al., 2006). When applied to crops, this method is particularly difficult due to the limited number of data points being measured and the difficulty with measuring from below the canopy in some crops. UAV imagery has been used in multiple ways to track canopy cover changes in crop fields. One method involves classifying orthomosaic pixels into vegetative and nonvegetative categories. There are a variety of classification techniques that have been successful. The most common method utilizes thresholding color parameters such as hue and saturation values, green vegetation index, lightness, green-magenta, blue-yellow (LAB) color space, excess green (ExG) index, and normalized green red difference index (NGRDI) (Ashapure et al., 2019;Han et al., 2018;Lang et al., 2019;Marcial-Pablo et al., 2019;Purcell, 2000;Schirrmann et al., 2016;Varshney, 2017). However, finding an optimal threshold value that is effective across different timepoints and lighting conditions requires normalization or calibration of the image color parameters prior to analysis due to changes in soil color and light conditions on a daily basis. Shadows and other types of vegetation also need to be accounted for in these approaches (W. Li et al., 2016;Purcell, 2000;Schirrmann et al., 2016). Other kinds of classification such as K-means clustering, gaussian mixture models, support vector machines, and fully convolutional networks can achieve high accuracies. K-means clustering and fully convolutional network specifically process relatively quickly with high accuracy and are highly useful when analyzing large datasets (Varshney et al., 2017). The size and complexity of the data are important to take into account when deciding which classification method to use. No matter the classification method, RGB indices have a limited ability to capture nongreen canopy pixels, therefore early season classification has the highest accuracy (80%+) with a rapid drop throughout development due to browning vegetative tissues and non-green flowers or fruits (60%+) (Ashapure et al., 2019;Marcial-Pablo et al., 2019). Accuracy can be improved through a morphological closing operation that fills in small gaps and keeps the boundary of the canopy intact (Ashapure et al., 2019). This method does not improve classification of weed pixels and weed removal or suppression is still necessary for accurate canopy cover estimations.
Classification methods such as a maximum likelihood classifier or support vector machine algorithm can also be used to detect color variation across orthomosaics and have been used to effectively identify crops including cotton, sorghum, soybean, and watermelon with high accuracy (J. Zhang et al., 2017). Advanced classifications such as the ability to distinguish between different genotypes of a single crop will be useful when evaluating productivity in a large production field containing mixed varieties but might not be practical with RGB sensors (Barot et al., 2017). Additionally, vegetation indices such as ExG index, the NGRDI, and the visible atmospherically resistant index have been shown to cap-ture morphological variation between rice varieties (Afdhalia et al., 2019).
Red-green-blue aerial imagery can provide additional canopy information beyond the proportion of canopy cover that can serve as indicators of overall plant health and vitality. The distribution of RGB colors in orthomosaics can illustrate spatial variation in plant coverage, growth stage, leaf vitality and degree of senescence (Schirrmann et al., 2016). For example, color variation within plant canopies can serve as an indicator of crop stress. Darker green colors in wheat fields have been shown to represent areas with denser crop canopy and higher plant vitality, while light brown or yellow colors indicated sparser canopy coverage (Schirrmann et al., 2016). Similarly, identification of early senescence or other stay-green traits can be used to estimate traits such as yield or drought resistance (Liedtke et al., 2020;Makanza et al., 2018). Early identification of abiotic stress traits through this method could be used for breeding superior varieties as well as for agronomic intervention and reduction of losses. However, it may be difficult to implement in-season agronomic intervention due to the amount of stress withstood by the plants in order to have visible color variation. Additionally, accurate identification of color variation would necessitate extensive reflective standards placed throughout the field.

Leaf area index
Leaf area index (LAI) is another canopy parameter that can be extracted from orthomosaics. It is defined as the onesided leaf area per unit ground surface area and is a key biophysical trait of interest due to its role in photosynthesis, transpiration, and nutrient cycling (J. M. Chen & Black, 1992;Shibles & Weber, 1965). Manual measurements of LAI use specialized harvesting techniques, direct leaf area measurements within a delimited area, or dry leaf biomass collection (Breda, 2003). These parameters are a prime target for UAV measurement since they are extracted from the upper canopy, time-consuming, and destructive. Common approaches correlate extracted colors, image variables, and vegetation indices to target canopy parameters. One of the first studies that extracted biophysical traits from UAV flights tested image variables and spectral indices including canopy cover, the ExG index, the individual red (R), green (G), and blue (B) color channels, as well as the ratios between the RGB channels (Schirrmann et al., 2016). All image variables except the RG ratio were useful in predicting wheat LAI while RB and BG were the most strongly correlated (0.80 to 0.95 and −0.82 to −0.92 respectively) (Schirrmann et al., 2016). A similar study in rice compared individual RGB channels, the L*a*b color space, and hue to a variety of biophysical parameters using single regression analysis (Shimojima et al., 2017). The image variables were useful in assessing LAI and the a* index had strong linear relationships with LAI at the flowering and grain filling stages (R 2 > 0.70) (Shimojima et al., 2017). However, other studies looking at cotton, soybean, sorghum, and watermelon (J. Zhang et al., 2017), soybean (Maimaitijiang et al., 2017), and winter wheat (Hasan et al., 2019) had success predicting LAI with indices such as the color index of vegetation (CIVE = 0.441R − 0.811G + 0.385B + 18.78745), ExG, and the red band. Together, these studies showed that assorted color indices from UAV RGB imagery were useful in predicting LAI in multiple crops. The models used in these studies had relatively few inputs and were fairly simple to use, which makes this approach universally accessible. However, color calibration was still necessary in order to get reliable results (Hasan et al., 2019).
Leaf area index has also been estimated by quantifying the structural complexity of the canopy. These metrics represent the spatial patterns and heterogeneity of the crop canopy, such as the spatial distribution of leaves in three-dimensional space (W. Li et al., 2017). A linear regression model created by following stepwise selection on the complexity metrics showed a significant, high correlation with manual LAI measurements in maize. Most models selected the rumple index and the standard deviation of point height as two major predictors for LAI (W. Li et al., 2017). This method reduced the need for color calibration standards. A similar method in soybean combined uncalibrated reflectance measurements, a segmentation approach and a gap fraction theory to account for row interference and leaf overlap. This approach achieved high accuracy compared to manual, destructive measurements, but only covered earlier growth stages which do not have issues with leaf chlorosis, senescence, or disease (Roth et al., 2018). Adding texture information to prediction models has improved LAI estimation in rice (correlation improvement from 0.66-0.72 to 0.75-0.76) (Duan et al., 2019;. However, when used alone, the textures for R, G, and B were less successful, which indicated it is critical to combine textures from all R, G, and B wavelengths . Together these studies showed that LAI can be accurately estimated through a range of both spectral and structural metrics collected with UAV imaging.

Biomass
Crop biomass is a biophysical trait extensively studied due to its importance in monitoring plant health and productivity. It is a crucial parameter for determining the amount of N needed to maximize yield while preventing overfertilization (P. Chen et al., 2010;Lemaire & Gastal, 1997). Biomass serves as a primary indicator of yield and can be used to develop yield variation maps and can aid in targeted management decisions (Jannoura et al., 2015). Traditional methods of monitoring biomass are destructive and labor intensive, requiring weighing the plant before and after drying. As such, biomass has been a major focus of UAV phenotyping research.
Several groups have investigated various UAV spectral indices alone to predict biomass. The NGRDI showed significant correlations to biomass in peas and oats with an even stronger relationship in soybeans, alfalfa, and corn when sampling the same genotype under different growth conditions (Table 2). While this method requires the use of light calibration standards, it requires less hands-on work than some of the other methods that require various manual measurements in addition to UAV data. Other groups have utilized plant height to predict biomass across a range of crops (Table 2). Although plant height showed overall moderate correlations to biomass, this was primarily driven by differences across growth stages, and low prediction power was observed for any particular growth stage (W. Li et al., 2016;Madec et al., 2017;Tilly et al., 2015). Plant height used in conjunction with spectral indices has shown to be highly successful with high prediction accuracies. Other structural parameters that have shown some promise include canopy roughness (the Euclidean distance between each point and the best fitting plane of the neighbors within a sphere of a user-set radius) (Herrero-Huerta et al., 2020) and a combination of image texture and spectral indices (Yue et al., 2019;Zheng et al., 2019). Both of these methods showed a moderate relationship between their best prediction models and manually measured biomass. Further combination of spectral indices with a variety of structural and volumetric parameters of the crop canopy into a vegetation index weighted canopy volume model suggested that structural parameters were more important for estimating biomass than spectral indices .
In general, prediction of plant dry biomass utilizing a combination of spectral and structural information, primarily plant height, gathered from UAV RGB imaging can be done with high accuracy, but there has been a lower predictive capacity for plant fresh biomass unless additional information is included due to the inability of RGB sensors to capture moisture content (Acorsi et al., 2019;Brocks & Bareth, 2018). New methods utilizing machine learning methodology on a range of structural and spectral parameters are being implemented to increase prediction power and minimize error, but further work with improved UAV accuracy and larger sample sizes is necessary to accurately evaluate these methods . Utilizing multispectral or hyperspectral sensors that can better capture plant moisture content could further increase the ability to predict plant fresh biomass. Note. For studies with multiple methods, only the method with the highest accuracy per species is included. AGB, above ground biomass; ExG, excess green; ExGR, excess green Minus Excess Red; NTDI, normalized difference texture index; NGRDI, normalized green red difference index; VEG, vegetation index.

BIOCHEMICAL PARAMETER EXTRACTION FROM UAV IMAGES
biochemical components are leaf chlorophyll and nitrogen content as these can be used to quantify foliar photosynthetic rate and primary productivity (Gitelson et al., 2006;Ripullone et al., 2003). Unlike biophysical traits, the direct use of vegetation indices and variables from RGB imagery have been less successful for extracting biochemical parameters (Roth et al., 2018;Schirrmann et al., 2016). Some studies have approached biochemical parameters with various spectral bands and vegetative indices, such as NDRE, and have found low to moderate correlations with chlorophyll a, chlorophyll b, and nitrogen (Maimaitijiang et al., 2017;Simic Milas et al., 2018). However, when multiplying leaf chlorophyll by LAI, the coefficient of determination increased from 0.177 to 0.774 in maize (Simic Milas et al., 2018). While this method showed promise in maize, these findings are likely dependent on crop type as other experiments have found only a moderate correlation of NDRE with LAI in soybean (Maimaitijiang et al., 2017). Adding other biophysical traits such as texture and removing background pixels has been suggested to improve correlations with chlorophyll content (Lang et al., 2019). Much of the variation in accuracy for leaf chlorophyll is likely due to the chlorophyll variation within a leaf or the slight variations in lighting during a flight (Simic Milas et al., 2018).
There is still much room, and need, for accuracy advancement related to the biochemistry of the canopy. Sensors with higher spectral resolution, such as multispectral or hyperspectral sensors, have the potential to provide advanced opportunities in this area (Haboudane et al., 2002;Jay et al., 2019;Maimaitijiang et al., 2017). Overall, estimations with multispectral sensors appeared to perform better, which has been attributed to the biochemical and geometric properties included with NIR. Thermal radiation may also contribute to an increase in accuracy, as leaf chlorophyll content can affect the temperature of the crop canopy (Maimaitijiang et al., 2017). This points to the limited ability of RGB for this trait and the need to move to other types of sensors for accurate assessments.

Biotic and abiotic stress detection
Detecting biotic and abiotic stress can be completed by collecting data on the previously mentioned crop traits or by evaluating other harder to quantify traits. Monitoring crop health and making field management decisions often depend on identifying stresses. Manual detection for biotic and abiotic stresses involves scouting fields by foot and scoring crop stands. UAVs can increase the efficiency of scoring by covering large areas quickly. Previous research using thermal sensors has been very successful in this space due to the high correlation between canopy temperature and water content. These sensors, while highly effective, are expensive and heavy (De Swaef et al., 2021). Other algorithms have been developed and tested for detecting stresses including root and leaf diseases as well as nutrient deficiencies from RGB imagery. These approaches have also been successful but have been limited by the ability of RGB sensors to distinguish between nongreen pixel classes. For example, a recommendation algorithm was built to detect maize leaves exhibiting symptoms of nitrogen deficiency, which achieved strong correct-classification performance, 79.2% accuracy (Zermas et al., 2015). However, there were issues with false positive and negative classifications, especially with yellow nonleaf elements such as tassels being classified as N-deficient regions and lower leaves with N-deficiency symptoms not being identified due to high occlusion and low light (Zermas et al., 2015). Quantification of potato leaf blight infection had similar issues as it also focused on the detection of chlorotic or necrotic spots. The necrotic spot RGB features were found to overlap with soil RGB features (Sugiura et al., 2016). Therefore, a percentage of healthy tissue to soil and diseased tissue was calculated and used to track differences in infection rates between plots. This resulted in a high correlation between UAV RGB image processing and visual assessment (Sugiura et al., 2016). These results could be further improved with additional spectral bands such as NIR, which would have less difficulty in identifying plant tissue.
More advanced analysis methods with machine learning have further facilitated the use of RGB sensors for quantifying stresses. For example, drought stress severity in wheat was quantified by extracting features from a square of neighboring plant pixels rather than individual pixel classifications (Su et al., 2020). This method resulted in faster and more effective classifications since spatial information reduced the amount of misclassification. Utilizing spectral and color index features achieved higher accuracy compared to spectral features alone, with 89.9 and 82.8% accuracy, respectively (Su et al., 2020). This work shows promise for future work with RGB sensors and drought detection, which could alleviate the need for thermal sensors (De Swaef et al., 2021). The advent of deep learning algorithms that can take spectral features, image texture, and shape information into account could allow UAV RGB imaging to become a more effective platform for scoring complex biotic and abiotic stresses.
Biotic stresses that do not heavily rely on leaf color classification have been more successful using RGB UAV imagery. In cases where the primary symptom is plant death, such as root rot disease in alfalfa (Mattupalli et al., 2018) and white grub larvae damage in soybean (Puig et al., 2015), the assessments rely on identifying regions of bare soil or low plant density. However, this process can be encumbered by the presence of weeds, the difficulty of which is illustrated by classification differences between occupied and unoccupied aircraft images. Automated classification labeled 20% of pixels differently between the two collection methods, but manual classification revealed that most of these pixels belonged to either alfalfa or soil (Mattupalli et al., 2018). Identifying white grub larvae damage in soybeans was more successful, due in part to the absence of weeds in the study system. Identification of the boundaries for the areas of interest was made easier through soft K-means clustering for identifying healthy vegetation, bare soil, and transitional areas (Puig et al., 2015).
Abiotic and biotic stress has also been successfully classified using multispectral sensors. Biotic stresses like oat crown rust have been imaged with RGB + NIR sensors and found to have a high accuracy for classification within a timepoint (McNish & Smith, 2021). Similarly, abiotic stresses such as iron deficiency chlorosis in soybeans have been classified using RGB + NIR sensors. This study showed a decrease in least significant difference when using the UAV images as opposed to visual ratings in 33 out of 36 trials, showing it to be more precise than visual scores (Dobbels & Lorenz, 2019).
Depending on the kind of stress, the crop being studied, and the weed cover, some simple classification of UAV RGB imagery has been effective, but in some cases more spectral information is needed for accurate classification at field levels. Further work needs to be completed to identify the best way to classify abiotic and biotic stresses that involve leaf color identification using UAVs.

Advantages of high temporal resolution
The ultimate goal of crop management and research is to use methods that increase yield either by making selections of top-performing lines or by making efficient farm management decisions to optimize performance. To this end, remote sensing allows the evaluation of larger areas per unit time than could be completed by personal field evaluations. Although unable to directly measure grain yield, UAV platforms can measure and thereby help select and improve traits that contribute to end of season performance. Monitoring these parameters periodically throughout development is essential to understand crop development and nutrient and pesticide demand throughout the growing season (Dammer et al., 2009;Schirrmann et al., 2016). These traits include many of those mentioned above such as plant height, LAI, plant senescence, plant growth, and biomass, but also field variability, nutrient capacity, and nutrient availability (Kefauver et al., 2017). Different kinds of curves can be fit to temporal morphological measurements (e.g., plant height, canopy cover, LAI, and biomass) to represent plant growth rates. These curves can include sigmoidal curves (Chang et al., 2017), linear regressions across the means of each timepoint (Han et al., 2018), logistic functions (Anderson et al., 2019), and polynomial curves (Tirado et al., 2020;Tirado, Hirsch, et al., 2021). An understanding of different elements impacting crop growth is essential when deciding the proper curve to use. For example, plant lodging events that occur during the growing season may not be captured or accurately accounted for by some methods used to fit growth curves (Tirado, Hirsch, et al., 2021). Growth curves can be used for many applications including genotypic differentiation, root lodging factor identification, and terminal height factor identification. For instance, cluster analysis in maize found that the relative growth rate based on the final height could effectively discriminate between genetic backgrounds during different growth stages (Han et al., 2018;. Temporal plant height measurements have also allowed the tracking of lodging responses of maize genotypes across different environmental conditions and identified factors contributing to the severity of lodging, which is beneficial for understanding root lodging resistance and identifying appropriate genotypes (Tirado, Hirsch, et al., 2021). Temporal plant height estimates have also proven useful in the detection of QTL when included in genomic prediction studies (Hassan et al., 2019). In this study, two major stable QTL were identified that explained more than 60% of the total phenotypic variation, while two new QTL associated with rapid growth rate were also identified (Hassan et al., 2019).
Temporal canopy parameters have also been useful in QTL detection. Canopy coverage observations in soybean were treated as a quantitative trait and fit to a logistic model to extract a single-trait value that represented the mean of the seasonal values. The mean seasonal values were used in a genome-wide association analysis to detect QTL which led to an increase in yield (Xavier et al., 2017). Other canopy parameters, such as LAI, can vary drastically from one day to the next, making temporal measurements necessary for assessing plant development, productivity, and potential input demands (Dammer et al., 2009;Roth et al., 2018;Schirrmann et al., 2016). Canopy biophysical traits can lead to a better understanding of responses to environmental factors and stresses. For instance, tracking biophysical traits through time can identify changes in early senescence at the individual plant level, soil-induced water deficit stress, crop vigor, and senescence patterns (Makanza et al., 2018;Schirrmann et al., 2016). These traits could be integrated into selection processes if their heritability is moderately high (Makanza et al., 2018).
Increasing the temporal resolution and the number of traits assessed has been shown to improve prediction accuracy of yield at the end of the season. Parameters extracted from a logistic function fit to temporal plant height in maize were found to achieve around a 400% improvement in grain yield predictions and improve the selection accuracy of the top 10% yielding hybrids (Anderson et al., 2019). These parameters can be predicted before plant maturity and can therefore allow for faster genotypic selection prior to flowering (Anderson et al., 2019). Temporal plant height measurements have also allowed for assessment of in-season root lodging in maize. Plots with extreme lodging showed a mean of 12.3 bu/acre yield loss when compared to plots with no lodging (Tirado, Hirsch, et al., 2021). Temporal canopy closure has also been used to detect QTLs linked to increases in yield in soybean, as canopy closure was shown to have a high correlation to yield and was highly heritable (Xavier et al., 2017). These studies all show great promise in using temporal information to improve early season yield predictions, which could have profound impacts on making crop selection and in season management decisions more effective. Future work could move toward utilizing deep learning methodologies to build models that capture spatial features to aid in in-season yield prediction. Initial work using deep learning methodology has shown improved estimation accuracy with temporal resolution over traditional vegetation index predictions in both rice (Q. Yang et al., 2019) and soybean (Maimaitijiang et al., 2020).
Increased temporal resolution has also led to an improved understanding of the response of plants to their environment. A combination of high temporal resolution data and climatic information has been used to understand the genotype × environment interactions and responses to environmental challenges (G. Yang et al., 2017;Malambo et al., 2018;Tirado, Hirsch, et al., 2021). High temporal data has also been used to correlate between differences in growing environments (e.g., nutrient status, crop rotation, and tillage systems) and growth rates (Boomsma et al., 2010;Liu et al., 2021;Pedersen et al., 2021;Zhu et al., 2019).

MAJOR CHALLENGES STILL TO BE ADDRESSED IN UAV PHENOTYPING
While collecting and analyzing UAV technology based data has rapidly advanced in recent years, there are still numerous challenges that provide opportunities for improvement. GPS accuracy of UAV imagery continues to be an issue for trait extraction using SfM algorithms (Bendig, Bolten, et al., 2013;Bendig, Willkomm, et al., 2013;Holman et al., 2016;Ruiz et al., 2013). Using GCPs has greatly reduced this error, however, it requires accurate coordinate surveying which in turn necessitates expensive GPS surveying equipment such as RTK GPS systems. Furthermore, GCPs are difficult to implement in settings that lack visible alleys and where GCPs hinder the mobility of ground vehicles. Drones equipped with RTK-GPS devices that accurately log image coordinates and reduce the necessity of GCPs could be a solution to this major challenge but increase the cost of the UAV substantially. Other remaining challenges include low payload, lack of stability under windy conditions, and low battery life (Belton et al., 2019).
In terms of analyzing UAV RGB imagery, issues associated with plant height accuracy are exacerbated in early season due to low canopy density and late season due to plant senescence (Tirado et al., 2020). Additional issues arise with trait extraction due to the artifacts caused by image stitching. One solution for this issue gathers plot level data using combined data from single images of the plot rather than the stitched orthomosaic (Hearst, 2019).
The largest challenge in analyzing UAV RGB imagery is the need for automatic and standardized software and protocols for gathering images, processing data, segmenting individual plots, extracting traits, and data sharing pipelines. This is made difficult by the variability in research needs. The needs vary across platforms, sensors, field layouts, and crops. Developing protocols for uniform data collection to serve many purposes rather than different protocols for specific aims would enable the generation of large, sharable datasets. This would also aid in standardizing processing software prior to data analysis. Groups such as the Genomes2Fields initiative are moving towards more standardized practices for within crop data/image collection and processing (https:// www.genomes2fields.org), and more of this will be needed to maximize the value of UAV data collection.
For breeding purposes, the ability to track individual plots through time is crucial. This is particularly challenging as most techniques rely on manual adjustments of a grid based on planting patterns to segment individual plots since crop rows are rarely in perfect grids. This is time consuming and labor intensive. These adjustments can be reduced through improved image registration by utilizing GCPs and/or RTK GPS onboard the platform and will enable the reuse of plot boundaries. In terms of data processing and analysis, common SfM model software such as Pix4DCapture and Agisoft Photoscan are proprietary, costly, and their outputs alone are difficult to utilize (Agisoft L L C, 2019;Pix4, n.d.). These outputs, primarily the orthomosaics and digital elevation models, can be useful when further analyzed, often focusing on extracting traits that are normally measured by hand but also providing opportunities for new traits such as textural features. As optimal algorithms to extract desired phenotypes are developed, software, such as FIELDImageR or Phenix, will need to become a focus in the research community for providing easy, adaptable solutions that can complete larger portions of processing in one package (Matias et al., 2020;Progeny Drone Inc, 2020). Open-source software will allow for wide and efficient implementation across breeding programs and large-scale experiments. A key component of this goal is the generation of publicly available, high quality, annotated datasets that can be used to develop and train new software and trait extraction pipelines. Fast and efficient pipelines for extracting agronomic traits are crucial to breach the bottleneck in crop improvement and selection.

C O N F L I C T O F I N T E R E S T
The authors declare no conflict of interest.