QScout: A QGIS plugin tool suite for georeferencing and analysis of field‐scouted and remote sensing data

Field scouting is an important part of many research methodologies in plant pathology and plant phenomics. However, linking scouting data to field imagery is often hampered by the time‐consuming task of georeferencing with a GIS. Here, we present the QScout tool suite for integrating remote sensing imagery and raster data with field‐scouting data in QGIS, an open‐source GIS program. The central features of QScout are the Drop Pins and Locate Pins plugins, allowing the user to easily link scouted data to remote sensing imagery. QScout also includes the Value Grabber and Grid Aggregator plugins, which transfer raster data into pins and aggregate the data from the pins into a grid, respectively. The final tools, Drop, Grab, and Aggregate and Locate, Grab, and Aggregate, are plugins that combine subsets of the four core plugins. The interface allows GIS users to effectively make use of field‐scouted observations with remote imagery and can improve data organization, analysis, and identification of locations of interest for further scouting or targeted management. QScout is publicly available as a GitHub repository: (https://github.com/GoldLabGrapeSPEC/QScout).


INTRODUCTION
Field scouting has a long history of importance in crop production and as a foundational research methodology in many agricultural science disciplines. Field scouting is the act of physically visiting a plot of land and inspecting the plants systematically to assess their health, growth status, or other condition, such as disease (Nutter et al., 1991). Traditionally, field scouting has been the primary method by which growers and scientists collect data on the individual plant level. Field-scouted observations serve as important points of reference and validation in many phenomic investigations (Martinelli et al., 2015;Polgar & Primack, 2011). In contrast to the long history of field scouting, remote sensing has emerged This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. The Plant Phenome Journal published by Wiley Periodicals LLC on behalf of American Society of Agronomy and Crop Science Society of America only recently (Martinelli et al., 2015;Zeng et al., 2020). Whereas standard photography techniques sample light at only a few wavelengths, typically red, blue, and green, multispectral imaging, a commonly used type of remote sensing in plant pathology and plant phenomics, consists of sampling light from additional near-infrared wavelengths to yield a more complete absorbance spectrum and understanding of plant condition (Zeng et al., 2020).
The availability and quality of satellite and other types of remote sensing imagery have increased exponentially in recent years. Although kilometer resolutions were once standard, recent developments in constellation architecture (Nagel et al., 2020) have brought satellite imagery resolution to the sub-meter level, with commercial providers offering resolutions as fine as 30 to 50 cm, such as the data available from WorldView3 or Planet Labs SkySat (Digital Globe Team, https://gbdxdocs.digitalglobe.com/docs/worldview-3;Planet Team, 2017). For the first time, this allows researchers to analyze satellite datasets at an individual plant level. Much of this analysis is conducted in a GIS program such as the free open-source tool QGIS (QGIS Development Team, 2021). However, the very scale and resolution that make these data so promising can also hamper analysis. Although some software tools exist to easily process and analyze large quantities of multispectral satellite data, such as the built-in QGIS processing toolbox (QGIS Development Team, 2021), GRASS (GRASS Development Team, 2020), and GDAL (GDAL/OGR contributors, 2021), we found that the lack of specific open-source tools for processing and validating imagery with field-scouting data was significantly slowing our workflow. Most significantly, we lacked a tool to easily convert field-scouted observations in spreadsheet form into QGIS points then combine these points into a grid of rectangular polygon geometries.
New tools are needed to aid users in efficiently processing validation data and high volumes of imagery in a standardized format to meet the growing interest in the use of satellite observations in addition to observations from groundlevel equipment and unoccupied aerial vehicles in plant phe-

Core Ideas
• Linking remote sensing imagery to ground validation data is an essential first step in nearly all usecases. • Validation data are frequently not geo-referenced, which makes connecting it to remote sensing data difficult and time consuming. • We have developed an open source tool that can help users easily link non-georeferenced field data to raster and remote sensing imagery. • Our tool can improve data organization, analysis, and identification of locations of interest for investigation.
nomics. The integration of ground data with multispectral imagery with a high spatial resolution allows researchers to find correlations between imagery-derived plant health metrics and in vivo plant health. Further, this advancement may aid researchers and extension specialists in helping stakeholders to diagnose and treat problems in their crops more accurately. To more effectively manage, convert, and analyze F I G U R E 1 Cartoon demonstrating the mechanism of the Drop Pins and Locate Pins algorithms. In both algorithms, the user provides the algorithm with a row vector, a bounding polygon, a starting corner, and values for within and between row spacing. (a) The program calculates the angle between the x axis and the row vector and uses trigonometry to calculate the x and y components of the within-and between-row spacing. (b) The program uses those values to place points above, below, to the right, and to the left of the origin point of the row vector. (c) The program then continues to place points adjacent to existing points, avoiding redundant points and not placing points outside the bounding polygon. (d) The program places points in empty spaces adjacent to existing points. (e) On the 15th iteration, the program will place no additional points because no empty locations adjacent to the existing points remain within the bounding polygon. So, with Δx and Δy referring to longitude and latitude, respectively, H and V referring to coordinate positions in the field ("horizontal" and "vertical"), the trig functions take the angle between the row vector and the line of latitude and use that angle to decompose the row height and point interval (perpendicular and parallel to the row vector respectively) into H Δx , H Δy , V Δx , and V Δy . These vectors are used to place adjacent points. In this example, the row height is 6 ft and the point spacing is 5 ft. Therefore, H Δx = 5 × cos ( our remote sensing and field-scouted data, we developed the QScout suite of plugins for QGIS. The tool suite applications include routine processes such as locating observations of plants or environmental traits for statistical or spatial analysis in relation to image data, mapping plant locations for repeated observations, and spatially interpolating field data for correlating them with remote sensing-derived plant health indices.

Sample data
Field scouting data were collected from Vitis vinifera L. 'Chardonnay' plantings in the Cornell Pathology Vineyards at Crittenden Research Farm in Geneva, NY, on 3 Aug. 2021. The plant attributes collected were disease incidence and severity of grapevine powdery mildew (Erisyphe necator), grapevine downy mildew (Plasmopara viticola), and miscellaneous plant damage. Incidence was recorded as the percentage of leaves with any disease, and severity was recorded as the percentage of leaf area with visible disease. The row number, panel number, and vine number within the panel were also recorded for each data point.
The satellite terrain raster used in the figures showing the QGIS workflow was captured by SkySat Generation C Planet Labs satellite at 42˚52′43″ N, 77˚0′57″ W on 2 Aug. 2021. SkySat is a commercial platform of 21 constellation spectroscopic satellites. SkySat imagery captures four regions of the electromagnetic spectrum, and thus the multispectral imagery captures the blue, green, red, and near-infrared regions and a panchromatic image. The spatial resolution of these products is 75 cm for the multispectral measurements and 50 cm for the panchromatic.

2.2
QScout's use Conceptually, QScout was developed to replace the timeconsuming manual processes outlined in Figure 1. QScout plugins can be accessed from the processing toolbox (Ctrl + Alt + T) of QGIS. The Drop Pins and Locate Pins plugins require the user first to draw a bounding polygon and a row vector. The bounding polygon is a single-feature geometry layer, where the polygon's sides form the boundary area of the region under analysis. The row vector provides a starting location from which the program will generate points at regular intervals and provides row point orientation. In the case of Drop Pins, the user can also provide a spreadsheet file to limit where the pins are dropped and provide data to attach to the dropped pins, in .csv or .xlsx format, as shown in Table 1.
Neither the Value Grabber algorithm nor Grid Aggregator require a bounding polygon and a row vector. Value Grabber requires a point layer and a raster file to run. The Grid Aggregator algorithm requires only a layer of point geometries. The grid boundaries produced by the algorithm matches the bounds of the points layer, unless the user specifies bounds in the algorithm's optional parameters.

QScout's algorithms
The Drop Pins and Locate Pins algorithms function very similarly. The mechanism by which they identify locations for pins is identical (see Figure 1) but differs in how the pins' information is used. After the steps shown in Figure 1, the program numbers the row and position values of the points, originating at the starting corner specified by the user. From Drop Pins: After the pins' locations have been identified, the Drop Pins algorithm numbers the row and position of each pin from the starting corner specified by the user. If the user has not provided a data file or has checked the "Drop Data-Less Points" box, the program will create QGIS features for all the points calculated by the method shown in Figure 1; otherwise, the program will only create QGIS features for points with data.
Locate Pins: After the points' locations have been identified, the program will take each point in the input layer and calculate which point location is nearest to that input point. The program will then create a copy of the input point in a new geometry layer and assign it attributes for the row number and location within the row.
Grid Aggregator: The program first creates a grid using the parameters of grid cell width, grid cell height, and (optionally) grid extent provided by the user (if the grid extent is not provided, the extent of the points layer will be used). The program then creates a list of the points contained within each grid cell. The program then goes through each attribute of the

RESULTS AND DISCUSSION
The QScout algorithms described here for implementation in QGIS are focused on processing and analyzing field-scouting data with aerial or satellite remote sensing imagery. The Drop Pins algorithm creates a grid of point geometries, with optional attributes taken from a spreadsheet file (The GIMP Development Team, 2019). The Locate Pins algorithm takes a set of points, duplicates them, and assigns them row and column index attributes within a local user-defined coordinate system. The Value Grabber algorithm takes a set of points and a raster and assigns each point the attributes corresponding to the raster band values at that point. The Grid Aggregator algorithm takes a set of points and overlays a grid on top of them, assigning each grid cell a set of attributes that are the aggregates (e.g., mean, median, sum, SD, etc.) of the points within that cell. The Drop, Grab, and Aggregate algorithm applies Drop Pins, followed by Value Grabber, followed by Grid Aggregator. Similarly, the Locate, Grab, and Aggregate algorithm applies Locate Pins, followed by Value Grab-ber, followed by Grid Aggregator. With these algorithms, QScout allows users to efficiently process high volumes of non-geotagged scouting data and imagery in a format that can be easily incorporated into analyses, modeling, and downstream decision-making.

The Drop Pins algorithm
The Drop Pins workflow begins with a data spreadsheet file in .csv or .xlsx format (as illustrated in Table 1) and an approximate understanding of the geography of the plot of land where the data were collected. The user begins by drawing a bounding polygon around the plot of land in QGIS to indicate where pins should be dropped. Within the bounding polygon, the user then draws a row vector, indicating the ordinal direction of the rows, and a reference point for the location of plants.
The user then opens the Drop Pins graphic user interface from the QGIS processing plugins panel and sets the Bounding Polygon, Row Vector, and Input Data to the appropriate values. The user also specifies the row spacing and the interval between plants within a row with the Row Spacing and Point Interval parameters. If the user does not provide the algorithm with a data file, the algorithm will drop a pin at every point in the field where it thinks a plant will be and will assign row and plant number attributes to those fields (see Figure 2).

F I G U R E 4
The mechanism of the Value Grabber algorithm. The circles at the top of the figure represent pin geometries, and each grid represents a raster band. The pin geometries are copied and the values of the raster pixels occupying the same location as each pin are set as the attributes of the pins, as shown at the top of the figure Once the data have been transferred from the spreadsheet to the point layer, the user can easily perform any number of geographic analyses on them. For the purposes of this article, the focus has been placed on an analysis comparing scouting data with satellite raster datasets. Using Drop Pins and a datasheet, QScout can drop pins on a field at a close approximation of the individual plants' locations, allowing the user to georeference plants and the corresponding scouting data easily. When used without a datasheet, the Drop Pins functionality can be used to collect a random sample of points for downstream analyses or to identify all individual plants within a field, or as a foundation for using other tools such as Grid Aggregator to obtain individual plant raster values.

The Locate Pins algorithm
The Locate Pins algorithm estimates the location of each pin in a field in terms of the row number and plant number within a row, or some other specified designator (such as "panel" for trellised crops). It takes the same set of inputs as the Drop Pins algorithm, but with a pin geometry layer instead of a data spreadsheet, and it produces a duplicate of the pin geometry layer provided, with plant and row number attributes added to each pin (see Figure 3). Thus, when the user starts with previously geotagged or dropped pins in a field, the Locate Pins function can identify plant locations in a manner that facilitates easy communication with field workers. This allows the user to see how the scouted data correspond to satellitederived plant health assessments and to use and/or return that information to individuals on the ground for making decisions about crop management.

The Value Grabber algorithm
The Value Grabber algorithm transfers values from a raster layer to a pin geometry layer (see Figure 4). The user provides the algorithm with a pin geometry layer and a raster layer, and the algorithm produces a duplicate of the pin geometry layer with attributes added from the raster file. The raster value attributes for each pin are set to the value of the raster pixel at the location of the pin. The algorithm also has more advanced optional parameters that allow the user to grab values from a circular area around the pin, with customizable options for how multiple pixel values should be weighted and combined. The user also has the option to specify their own grab function by passing the algorithm a Python script. Value Grabber is similar to the Sample raster values plugin that is packed with QGIS (QGIS Development Team, 2021), but can be used on a raster that is not a QGIS layer, allowing for the analysis of large rasters that may cause issues for QGIS on less powerful machines. In addition, Value Grabber allows for greater user specificity in the method of sampling through the optional Grab Radius and Grab Function parameters. The functionality of Value Grabber allows the user to easily export data files for downstream analysis, create summary statistics, and share plant health data with stakeholders.

The Grid Aggregator algorithm
The Grid Aggregator algorithm functions as the near reverse of the Value Grabber algorithm (see Figure 5). Whereas the Value Grabber algorithm transfers data from a raster grid to points, the Grid Aggregator algorithm transfers data from points to a grid. The user provides a points layer and width and height values for the grid cells and specifies which attributes of the points' features to aggregate. The algorithm produces a vector layer of consecutive rectangular polygon geometries forming a grid aligned north-south-east-west-aligned grid. Each rectangle, which represents a grid cell, is assigned the attributes of all the pins within the grid cell. The user is able to decide whether to set the values to the mean, median, minimum and maximum, sum, SD, or weighted mean of the pin values. The user also has the option to specify their own aggregation function by passing the algorithm a Python script.
Varying the size of the grid allows the user to accommodate datasets at different resolutions and to determine which resolution is best suited for constructing a model for their variables of choice. In addition to the QGIS polygon geometry layer, the user can enter the name of an excel file, to which Grid Aggregator will write the data. This function makes it easy to aggregate ground-scouted information (e.g., disease severity or incidence) to the same spatial scale as the imagery (see Figure 6).

Combination algorithms
The combination algorithm Drop, Grab, and Aggregate allows the user to quickly georeference scouting data, combine it with raster data, and aggregate the points into a grid for easy analysis. Similarly, the combination algorithm Locate, Grab, and Aggregate makes it easy for the user to identify areas of poor plant health for targeted scouting by field workers. In both cases, the same parameters are inputted, and the function is the same as in the individual algorithms, but they are contained in the same interface.

CONCLUSION
With the growing interest in agricultural remote sensing, it is more important now than ever to provide users with tools to link these data with on-the-ground validation in order to provide a full analysis of agricultural systems on multiple scales and to look for biologically relevant patterns. QScout provides a tool suite that is appropriate for GIS users to georeference scouting data, determine the location of georeferenced data relative to a larger layout, combine scouting data with remote sensing data, and form geometric vector data into formats that are easier to process. As remote sensing datasets-particularly high-resolution multispectral and hyperspectral datasets-continue to proliferate in the coming years, tools such as QScout and other QGIS plugins will provide an invaluable organizational aid for plant pathologists, plant phenomicists, and others in the agricultural sciences.

A C K N O W L E D G M E N T S
Funding for the QScout development was provided by USDA-ARS Grape Genetics Research Unit CRIS project 8060-21220-007-00D. Support for Fernando E. Romero Galvan comes from NASA JPL Strategic University Research Program and NASA project #80NSSC21K1605. We thank Graham Trolley, Nikita Gambhir, and Shivranjani Baruah for their work as early beta-testers of the program, David Combs