ARD‐VO: Agricultural robot data set of vineyards and olive groves

The availability of real‐world data in agricultural applications is of paramount importance to develop robust and effective robotic‐based solutions for farming operations. In this application context, however, very few data sets are available to the community and for some important crops, such as grapes and olives, they are almost absent. Therefore, the aim of this paper is to introduce and release ARD‐VO, a data set for agricultural robotics applications focused on vineyards and olive cultivations. Its main purpose is to provide the researchers with a real‐world extensive set of data to support the development of solutions and algorithms for precision farming technologies in the aforementioned crops. ARD‐VO has been collected with an unmanned ground vehicle (UGV) equipped with different heterogeneous sensors that capture information essential for robot localization and plant monitoring tasks. It is composed of sequences gathered in 11 experimental sessions between August and October 2021, navigating the UGV for several kilometers in four cultivation fields in Umbria, a central region of Italy. In addition, to highlight the utility of ARD‐VO, two application case studies are presented. In the first one, the data set is used to compare the performance of simultaneous localization and mapping and odometry estimation methods using vision systems, light detection and ranging, and inertial measurement unit sensors. The second one shows how the multispectral images included in ARD‐VO can be used to compute Normalized Difference Vegetation Index maps, which are crucial to monitor the crops and build prescription maps.

Vehicles can now be equipped with high throughput embedded computers (also with graphics processing units) that open further possibilities for automatic inspection and, more generally, for smart agriculture (Basri et al., 2021). To fill the gap with human capabilities, robots must use their perception systems to achieve two main goals: navigate through the operative scenario and collect data from it.
To develop the navigation and the perception modules of robotic platforms, the availability of data sets collected in agricultural contexts is fundamental. The advent of Deep Learning techniques has, indeed, brought impressive benefits and unlocked applications that were previously unthinkable. However, this comes at the cost of intensive training phases, whose effectiveness heavily depends on the availability of domain-specific data sets. In addition, also the design of nondata-driven (i.e., model-based) approaches and, most importantly, their validation requires data that are representative with respect to the scenario under consideration.
Despite their importance, very few data sets are, in general, available for precision farming (in contrast to automotive scenarios) and almost none have been collected for some specific crops. This is the case of vineyards and olive cultivations, despite their great importance within the overall agricultural field: the 2020/2021 season shows the production of 3,029,000 tons of olives (Commission, 2021), while 14,811,725 tons of grapes (Eurostat, 2020) have been exported from Europe. 1 In addition, derived products such as olive oil and wine represent a significant economic factor in the agricultural chain (Palliotti et al., 2014;Poni et al., 2018). Smart monitoring and robotic support represent a fresh technological approach to preserve these crops. Recent works show that intelligent systems can be easily incorporated into the farming processes with promising results for both vineyards (Fernández-Novales et al., 2021;Majeed et al., 2021;Roure et al., 2017), and olive crops (Babanatsas et al., 2018;Rey et al., 2019).
Motivated by these considerations, the purpose of this work is to collect and make available to the community a data set, namely, ARD-VO, specific for vineyard and olive crops to improve the development of robust and effective robotic systems for navigation and monitoring applications. Data acquisition is performed by using a real robotic platform and is designed to embed the information and the challenges associated with agricultural environments. The robot sensors setup guarantees multiple information sources essential for different research areas: simultaneous localization and mapping (SLAM) algorithms, prognostic analysis on the robot state, and specific agricultural studies, like, the Normalized Difference Vegetation Index (NDVI) analysis (Carlson & Ripley, 1997;Carter, 1994;Matese & Di Gennaro, 2021). ARD-VO is composed of more than 1 TB of data collected in approximately 10 h of navigation sessions traveling for several kilometers. Data come from eight heterogeneous sensors and were acquired in four different cultivation fields.
The paper is organized as follows. After recent and relevant works surveyed in Section 2, the robotic platform Agrobot and its equipment are described in Section 3. The robot is used to record raw data in different crops as described in Section 4. The information is then processed, organized, and packed as described in Section 5. Two application studies that show the utility of ARD-VO are presented and discussed in Section 6. In Section 7, we summarize the conclusions and the future work.

| RELATED WORK
Although different publicly available localization and mapping data sets are available for indoor or urban outdoor scenarios, data sets for autonomous robotic tasks in agricultural environments are less common. Those that are available are mainly focused on monitoring and PA tasks (Y. Lu & Young, 2020), such as crop/weed discrimination, plant phenotyping, leaf detection, canopy volume calculation (Potena et al., 2020), fruit counting (Bellocchio et al., 2020), and yield estimation (Bellocchio et al., 2022).

| Agricultural plant monitoring data sets
In the last few years, most of the agriculture-related data sets that have emerged are focused on plant monitoring. They typically consist of images collected with RGB cameras and multispectral sensors. The most common applications of those data sets are weed control, pests monitoring (Lippi et al., 2021), and fruit detection.
Weed control data sets are gathered using unmanned ground (Haug & Ostermann, 2014) or aerial vehicles (Sa et al., 2018), handholding sensors (Lameski et al., 2017) or ground fixed platforms (Leminen Madsen et al., 2020). These data sets are typically used to train semantic segmentation models in case the reported annotations are pixel-level (Haug & Ostermann, 2014), or object-detection strategies in case annotations consist of bounding boxes (Leminen Madsen et al., 2020) (which allows one to distinguish between cultivated crop areas and weeds).
Fruit detection data sets are used for training objectdetection or semantic/instance segmentation models capable of detecting and counting fruits within the images. In particular, they often consist of images depicting plant canopies with fruits collected using both remotely operated platforms (Bargoti & Underwood, 2017;Bellocchio et al., 2019), or hand-holding sensors (Häni et al., 2020). Each image is annotated using bounding boxes or pixel-level ground truth reporting the fruit's positions and the region of interest.
Less common applications of agriculture-related data sets include tree pruning (Akbar et al., 2016), flower detection (Dias et al., 2018;Kusumam et al., 2016), and disease or damage on plant detection (Alencastre-Miranda et al., 2018). In Y. Lu and Young (2020)  2.2.1 | Urban scenarios data sets Several common data sets for robot localization involve indoor (e.g., buildings) and outdoor (e.g., city streets) environments. The majority of indoor data sets are collected in industrial buildings (Burri et al., 2016) or offices (Klenk et al., 2021) using handheld sensor kits, sensors mounted on a backpack (Wen et al., 2019) or unmanned robotic platforms (Shi et al., 2020). Outdoor urban data sets are usually collected using a sensorized car (Blanco-Claraco et al., 2014;Choi et al., 2018;Geiger et al., 2013;Kim et al., 2020) or a wheeled robot (Smith et al., 2009). In the last few years, there is also a growing interest in data sets built using virtual urban environments (Cabon et al., 2020;Gaidon et al., 2016). The main advantage of virtual environments is the possibility of collecting new data sequences without the cost and the risk of experiments in the real world. Aguiar et al. (2020) provide a comprehensive survey of robot applications in rural environments. The study examines the current state-of-the-art localization and mapping techniques for ground robots in agricultural and forest settings, including SLAM and Visual Odometry (VO) approaches. It highlights the challenges and the importance of an adequate system of sensors (proprioceptive and exteroceptive) when the surrounding environment presents harsh conditions: absence of structured objects, repetitive scenes, reflection, rough terrains, and vegetation appearance, which changes according to the season and the weather condition. These represent significant design constraints (hardware and software). In addition, the survey reviews different data sets that can be used to test and study the performances of SLAM and VO algorithms in outdoor environments. Among those, the data sets that we believe are most related to ours are those proposed by Pire et al. (2019) andChebrolu et al. (2017). The Rosario Data set (Pire et al., 2019) is gathered within soybean fields using an unmanned ground vehicle (UGV) designed for weed removal. The data set is composed of six sequences and contains synced measures of wheel odometry, inertial measurement unit (IMU), stereo camera, and global positioning system real-time kinematics (GPS-RTK) system, recorded while driving the platform within the field. Similarly, the BoniRob (Chebrolu et al., 2017) data set is acquired in sugar beet fields using a multipurpose UGV developed by Bosch DeepField Robotics and aims at the crop/weed distinction. In this data set the data were recorded three times per week over 3 months in the spring of 2016. Each sequence contains data from different sensors: Leika and Ublox GPS with RTK, Velodyne light detection and ranging (LIDAR), Kinect and JAI camera, and FX8 camera. While the sequences of the abovementioned data sets are collected within herbaceous crops, our proposed data set is gathered in two substantially different types of arboreal cultivation.

| Rural scenarios data sets
Other relevant data sets in rural scenarios for localization and obstacle detection purposes are collected without a robotic platform, such as in Kragh et al. (2017) and Ali et al. (2020). In Kragh et al. (2017) the authors present a data set for obstacle detection in agricultural scenarios collected with a tractor-mounted sensor system in a grass-mowing scenario in Denmark. The acquired sequences are composed of 2 h of data gathered by a stereo camera, thermal camera, web camera, ∘ 360 camera, LIDAR, radar, IMU, and Global Navigation Satellite System (GNSS). All the static and moving obstacles are labeled with object class information and geographic coordinates. In Ali et al. (2020) the authors propose a data set for visual SLAM evaluation in a forest landscape, collected using a sensor rig mounted on a vehicle based on four RGB cameras, an IMU, and a GNSS receiver. The data set is composed of 12 sequences collected while driving a vehicle on a road crossing a forest in different seasons and times of the day.
Although these data sets undoubtedly provide a useful base for designing and comparing localization and mapping strategies, they do not cover the application scenario of a robot working within the field.
Drawing inspiration by Aguiar et al. (2020) and differently from previous works, we collected a data set with a robotic platform in vineyards and olive cultivation to provide researchers with a benchmark that favors the analysis of the challenges associated with these rural contexts, particularly with respect to the development of effective navigation algorithms and the analysis of agriculturalspecific information (e.g., multispectral images).

| Contributions
All the aforementioned data sets are gathered in indoor buildings, urban scenarios, or rural environments characterized by herbaceous crops. To the best of our knowledge, our proposed data set is the first one gathered in unstructured outdoor scenarios related to olive cultivations and vineyards. The main contributions of this work are summarized as follows: • We propose a data set of rural outdoor unstructured environments gathered in two different arboreal crop scenarios. Specifically, ARD-VO consists of sequences collected in different vineyards and olive cultivations. The sequences include data collected with a stereo-camera rig, a Velodyne LIDAR, a GPS-RTK module with IMU, and a multispectral camera. At the same time, inverters states are collected providing additional information about robot motors operation. A detailed description of the data acquisition campaign and data set structure is provided in the following sections.
• We provide a detailed technical description of the UGV, specifically designed for autonomous plant monitoring in unstructured outdoor cultivation is provided. The UGV description includes a list of all the sensors installed on the platform, the details of the robotic structure, and the specifications of the electric devices.
• The experimental section illustrates two use cases of the proposed data set, that is, a benchmark comparison between state-of-theart LIDAR-based and vision-based localization approaches and the computation of geo-referenced NDVI maps.

| THE ROBOTIC PLATFORM
In agriculture, conventional farming machines (i.e., tractors and harvesters) and robotic platforms share the same working conditions: a challenging and heterogeneous environment. Vineyards and olive groves are often located in terrains generally not difficult to reach (the harvesting process must be guaranteed) but with unpredictable situations, such as uneven surfaces with boulders, steep slopes, and positive/negative obstacles (like, bumps and holes). In addition to these unreliable surfaces, the weather can play a crucial role in maintaining grip and traction (i.e., mud or other slippy condition after the rain). These aspects were considered when designing the robotic platform, named Agrobot, which we used to gather the data set. In the following, the platform is described and the details about its structure, sensors, and onboard computer units are provided.

| Robot structure
The upper part of the robot was designed to facilitate the deployment of the devices. Figure 1 shows a simplified scheme of the Agrobot platform and the corresponding main dimensions are provided in Table 1.
Specifically, three flattened surfaces yield flexibility in positioning external sensors, while the trapezoidal profile guarantees a considerable internal space for fitting the onboard electronics. The lower part of the chassis houses the battery compartment and the propulsion motors. Agrobot has been designed as a four-wheel Skid-Steering Mobile Robot, a kinematic configuration commonly employed for ATVs (Campion et al., 1996;Kozłowski & Pazderski, 2004). The main advantages of this configuration are the simplicity and robustness of the mechanical system (i.e., no additional steering system) and the high maneuverability. When compared with the Ackermann steering model (Carpio et al., 2020;King-Hele, 2002), skid-steering kinematics ideally offers zero-radio turning. Even with wheels, Agrobot operates like a tracked vehicle: the difference between the equivalent speed of the left and right sides translates into a rotational motion of the body. Each wheel is connected with a kinematic chain of 80:1 transmission ratio to a 2-kW three-phase brushless direct current motor that can reach 4300 RPM and 4.6 Nm torque. Two dual-channel inverters ensure that the wheels on each side of the robot have the same angular speed (otherwise, skid steering would not be possible).
The ground clearance (g*) is evaluated on a flattened surface and may change according to the surface condition. The overall weight** of the robot refers to the normal working condition (including all the devices, sensors, and items required for the operations).

| Sensors equipment
In Figure 2, an overview of the Agrobot experimental setup is shown.
The external sensors that are fundamental for navigation tasks (i.e., cameras, GPS, and IMU) have been fixed to the robot body employing polycarbonate sheets and antivibration mounts to damp highfrequency noise. This arrangement guarantees minimal visual occlusion and the best Field-of-View (FoV) with respect to the robot's bulk.
The drivetrain and processing units have been installed inside the robot following the same antivibration strategy to extend the lifespan of electronics. Thanks to the full-electric propulsion and a dedicated communication bus (CANbus), it is possible to acquire all motor signals directly from the inverters. In the following, a brief description of the sensors devices is provided.

Tyres
Genial Tyre Agri Line R 6.5∕80 13 CROCETTI ET AL.  (14), red edge 717(12), and near IR 842 (57) bands, 2 with a fixed resolution of 1280 × 960 pixels at 1 Hz. The main benefit of this device is a narrow bandwidth for each set of images: instead of using one wide-band imager and a set of blocking filters, it embeds five optimized imagers with their sensor and filter. The camera module also has one stand-alone external GPS to geotag the images. the Satellite-based Augmentation Systems that allows for positioning measurements with a refresh rate of 10 Hz, and horizontal and vertical accuracies of 1 cm +1 ppm and 1.5 cm +1 ppm, respectively. It also has an integrated 6-DoF IMU module (Bosch BMI160) with a selectable data rate between 10 and 200 Hz. The CL55 unit is an experimental and underdevelopment device that has been configured to work with the Galileo constellation at the same refresh rate as the Duro device (10 Hz).
• Camera rigs: The robot is equipped with two stereo-camera rigs.
The front of the robot houses a Bosch Rexroth profile with two Blackfly S BFS-PGE-04S2C-C equipped with a 4-mm FIFO lens.
The cameras can capture images at a maximum rate of 350 frames per second (FPS) with a resolution of 720 × 540 (0.4 megapixels).
The stereo rig has a baseline of 0.62 m.

| Onboard computer and embedded control unit
The block diagram of the Agrobot architecture is depicted in Figure 3. In particular, motor inverters, Human Machine Interfaces, and teleoperation modules (RC and the BlueTooth transceivers) are connected to the B&R X90 control unit. The X90 acts as a level 1 Supervisory Control and Data Acquisition PLC controller (Boyer, 2009) that handles the connection of the physical devices (i.e., with General Purposes Input Output ports and the CANopen buses) and offers functionalities to the higher-level computer. The X90 is programmed to communicate with the supervisory computer using the Ethernet/IP protocol (Brooks, 2001): transmission control protocol and UDP packets carry data and high-level control signals using the OPC-UA standard (Leitner & Mahnke, 2006

| DATA SET ACQUISITION
The Agrobot platform was employed to acquire data in vineyards and olive crops located in Umbria, a region of central Italy. The alias names, varieties, and locations of the crops within the data set are reported in Table 2. The two vineyards shown in Figure 4a,b and two olive crops shown in Figure 4c,d have been selected for the recording campaign.
The vineyard crops are 24 km from each other and have the same Grechetto Todi G5 (Esti et al., 2010) cultivar, a local white wine grapevine variety. Inside the vineyards, the objective was to create NDVI maps using geo-tagged multispectral images: yield estimation (Barriguinha et al., 2021), water stress (Frioni et al., 2021;Kazmierski et al., 2011;Sanchez et al., 2017), and pests monitoring (Hassan et al., 2019) are examples of how these maps can be used in agriculture. The two olive groves are very close to each other (less than a kilometer) and are located in the agricultural site of Deruta, a small town in Umbria. According to the genetic biodiversity of the geographical area, there are three olive varieties in these crops: "Moraiolo," "Leccino," and "Frantoiano" (Cicatelli et al., 2013). The first olive grove (alias OlvCs A) has only the "Moraiolo" variety (mono cultivar crop), while the second has all the three types (multicultivar).
Olive crops suffer from the infestation caused by the olive fly pest Bactrocera oleae (Malheiro et al., 2015;Marchi et al., 2016;Marchini et al., 2017;Nardi et al., 2005;Petacchi et al., 2015) and continuous monitoring can help the prevention and pest control.
F I G U R E 3 Block diagram of the systems: the on-board computer is connected via Ethernet protocol to the external sensors and the X90 control unit, acting as an interface for the robot's built-in devices. CAN, controller area network; ETH, Ethernet; GPIO, General Purposes Input Output; GPS, global positioning system; LIDAR, light detection and ranging; PC, personal computer; RC, remote control; ROS, robot operating system.
T A B L E 2 Fields employed for the data set data collection. As reported in Table 2, the data acquisition campaign was performed between the summer and the autumn of 2021, from August to October. The robot was driven in manual mode at an average speed of 1 m/s to guarantee the reliability of the recorded data both for agricultural and navigation analyses. In vineyards, the average traveled distance for each session was about 8000 m, while in olive groves the trajectory length spans from 400 to 1000 m.  Table 4 we report information on the production characteristics and the indices of the vegetative-productive balance of the vineyards. These values are obtained by performing manual inspections in different zones of the crops. In particular, we provide these values grouped with respect to the NDVI index computed at that location (see Section 6.2 for a description on the computation of the NDVI maps).
The collected sequences that have been included in the ARD-VO data set come from a subset of the sensors listed in Section 3.
Specifically, we considered the two frontal cameras, the GPS-RTK and the IMU of the Swiftanv Duro Inertial kit, the RedEgde MX multispectral camera, the Velodyne laser scanner, and the two inverters.

| DATA SET
This section describes the content and characteristics of the proposed data set. For each session listed in Table 2, two sets of data are available: the first is made using the sensors connected to the onboard computer, and the second only with the multispectral camera, whose streams are independently geotagged by using the GPS of the RedEdge Micasese kit. For each session, we used robot operating system (ROS) (Quigley et al., 2009) to handle the data related to the devices directly connected to the onboard unit, while the multispectral images are stored directly on the solidstate drive (SSD) memory of the RedEdge camera. Since the duration of the sessions may vary between 1 and 2 h, each one is split into a number of shorter sequences. In the following sections, we introduce a brief description of the content of these data collections. An example set of RGB images and laser scans is shown in Figure 5.

| Calibration and sensors displacement
The extrinsic and intrinsic calibration parameters for the various sensors are summarized in Tables 5 and 6. The online data set also includes complete calibration files containing the camera, IMU, and LIDAR calibration parameters and the transformations among all sensors. Both the left (i.e., cam0) and the right (i.e., cam1) cameras, and the IMU are calibrated through the kalibr (Furgale et al., 2012Maye et al., 2013) toolkit. For the cameras, the pinhole camera model and the radial-tangential distortion are used for calibration. The IMU noise model is characterized by the accelerometer noise density (continuous-time), σ a , the accelerometer random walk, σ ba , the gyroscope noise density (continuous-time), σ g , the gyroscope random walk, σ bg , and the sampling rate (Hz). The values associated with those parameters are reported in Table 5. We employ the kalibr tool also for estimating the stereo extrinsics and the relative transformations between the IMU and the cameras. For the LIDAR-IMU transformations, we measured the relative distance and orientation between the sensors. The coordinate systems of the LIDAR and the IMU mounted on the robot are shown in Figure 6.
Finally, the associated relative transformations are reported in Table 6.

| Rosbags
In ROS, each topic handles different messages (containing the data) depending on the type of service (or device) associated with it. The rosbag package allows recording these topics and messages in a unique file that can be executed in batches to reproduce the experiment. where fim is fixed_imu_msg and im is imu_msg. The topics /agrobot/ Inverter/HBL2360A_* provide diagnostic information gathered and processed by the two inverters. The data include their status (fault, fault flag, and board temperature) and motors measurements (RPM and current absorption). Each inverter drives two electric engines (left and right sides) using two channels (noted as "CH1" and "CH2"): the name of the topics and their messages reflect this configuration.
As an example, Figure 7 shows some diagnostic data of the motors: current absorption and rounds per minute from a short test sequence.

| Micasense MX data
Before each experimental session, radiometric calibration is performed to compensate for sensor black-level, sensor sensitivity,

| Image sequence postprocessing
Due to voltage oscillations, shocks, network congestion, and ROS node miscommunications, we experienced spurious and corrupted frames among the raw RGB image sets. To produce a high-quality data set, the collected video sequences were, therefore, subjected to a frame-by-frame automated check to ensure their integrity and guarantee frames equally spaced in time. As a result, the image streams in the postprocessed sequences are free from corrupted data and characterized by a framerate in the range of 8-10 FPS.

| APPLICATIONS
ARD-VO is a versatile data set that can be employed in different fields of research, from mobile robotics to crop monitoring. In this section, two use cases focused on robot navigation and precision farming are shown to highlight the utility of the ARD-VO data set. In the first one, the recorded sequences are used for evaluating the T A B L E 6 Relative transformations between the sensors employed to collect the proposed data set. F I G U R E 6 Sensors reference systems (LIDAR in blue, cameras in yellow, and IMU in red). IMU, inertial measurement unit; LIDAR, light detection and ranging.
performance of robot localization algorithms in agricultural scenarios.
In the second application study, the multispectral images and the GPS positions gathered by the Micasense RedEdge-MX multispectral camera are used for calculating NDVI maps.

| Robot pose estimation
The ARD-VO data set represents a challenging scenario to test the robustness of pose estimation algorithms, since several nonideal conditions are present, for example, repetitive texture, lighting variations or nonsmooth motions of the robot.
We performed comparative experiments on a selection of eight sequences, considering two sequences for each cultivation field.
Quantitative evaluation is performed with respect to the groundtruth trajectories obtained with the GPS-RTK measurements provided by the Swiftnav Duro device.
Before discussing the results, in the following we briefly summarize the characteristics of the aforementioned strategies.
T A B L E 7 Topics, messages, and the description of the rosbags provided with the data set. • DSO (Engel et al., 2017) samples pixels from the whole image, and it is based on the continuous optimization of the photometric error evaluated over a window of recent frames. Each new frame is initially tracked with respect to these reference frames, and then it is either discarded or accepted to create a new keyframe.  (Lucas & Kanade, 1981). To maintain a minimum number of features, Shi Tomasi corner features (Shi, 1994) are detected in each image, and finally, the keyframes are selected.
• Open-VINS (Geneva et al., 2020) is a visual-inertial approach whose core is an Extended Kalman filter that fuses inertial information with sparse visual feature tracks. These visual features are detected with features from accelerated segment test (FAST) algorithm (Rosten & Drummond, 2006), tracked with KLT (Lucas & Kanade, 1981), and finally fused leveraging the MultiState Constraint Kalman Filter sliding window formulation.

| LIDAR-based pose estimation
The second group of methodologies used for the localization experiments performs robot localization and environment mapping using the LIDAR scans: • LeGO-LOAM (Shan & Englot, 2018) is a lightweight and groundoptimized LIDAR odometry and mapping method. It can provide a real-time pose estimation on a low-power embedded system. In the first phase, the method applies a point cloud segmentation to filter out the noise. Afterward, the feature extraction phase is performed to derive distinctive planar and edge features. The method leverages the presence of a ground plane in its segmentation and optimization procedure.

| Metrics
The selected VO-VIO approaches have been evaluated through three distinct metrics: the pose estimation accuracy, the ratio of the estimated trajectory with respect to the ground-truth trajectory, and the error rate with respect to the trajectory length. Since the algorithms rely on some nondeterministic operations, multiple runs on the same sequence might exhibit different results. Therefore, we perform a total of five experiments per algorithm and report the mean and standard deviation of the results for each metric.

Pose estimation accuracy
The evo where Finally, the Root Mean Square Error (RMSE) is computed by averaging the ATE over all the (N) sequence poses: This value is computed after performing both scale and reference system alignment.

Trajectory duration ratio
Whenever a VO-VIO algorithm fails in estimating the trajectory, it may stop providing pose estimations. This behavior is strictly related to the authors' implementation and may vary between different algorithms. This aspect has to be taken into consideration since the APE value could turn out to be misleading. For this reason, an additional metric has been taken into account, which is the ratio between the time duration of the estimated trajectory t Δ est and the time duration of the ground-truth trajectory t Δ gt . More specifically: where, for example, ts end est , is the last timestamp of the resulted output estimation and ts start gt , is the first timestamp of the groundtruth trajectory. R ≈ 1 ts when the algorithm succeeds in predicting the trajectory. Conversely, R → 0 ts when the algorithm stops the estimation at the beginning of the sequence. Hence, the closer R ts is to 1, the more this metric indicates a low failure rate.

Error rate
This metric takes into consideration the percentage of error with respect to the length of the estimated sequence. In particular, this error ratio is computed as follows: where L is the total length of the sequence, while * R L ts considers the portion of the sequence which has been predicted by the algorithm.

Failure ratio
Finally, failure ratio is taken into account since multiple runs are performed. An experiment is considered to be failed either if the algorithm fails and does not return any estimation, or if the estimated trajectory is less than 10% of the ground-truth one. The ratio is computed over 5 runs, as follows: where N is the number of failed runs (i.e., if one run out of five fails, the failure ratio is 0.2).

| Results and discussion
In Table 8 Among the vision-based methods, the best-performing ones are VINS-Mono and Open-VINS. However, Open-VINS has a very high standard deviation, which highlights that nondeterministic operations have a significant impact on its performance. ORB-SLAM Mono achieves worse results in terms of RMSE than its stereo version, although the latter has, in general, a lower trajectory duration ratio (i.e., R ts ). Instead, DSO is the approach that experiences more failures.
It is worth mentioning that all the vision-based approaches suffer from hairpin turns: that is due to the accumulated drift error, leading to estimate a larger curvature. This is an additional aspect that makes the proposed data set interesting, since it represents a challenging scenario to test the robustness of VO-VIO approaches.

| NDVI maps
In the second case study, the multispectral images of the ARD-VO data set have been used to compute geo-referenced NDVI maps. The NDVI is a simple graphical indicator that represents the amount of chlorophyll of a plant and it is used for estimating the live green vegetation contained in a remotely sensed area. It can be calculated using the formula: where Red and NIR represent the spectral reflectance measure- F I G U R E 9 Trajectory estimation on Vynrd B (2021-09-01-12-25-09). Some approaches may lose tracking or fail before the end of the trajectory. In this cases, the execution of the algorithm is stopped and estimated poses are no longer provided. In these cases, the corresponding plots show shorter trajectories since only the estimated poses (and the associated ground-truth ones) before the algorithm failure are depicted, discarding the remaining ground-truth trajectory points.

| CONCLUSIONS
In this work, we describe and make available to the community a novel data set for robotic applications in agricultural scenarios, that is, ARD-VO. The data set is collected in vineyards and olive groves and includes sequences of heterogeneous data collected by navigating a UGV for several kilometers in four different cultivation fields.
In the first part of the paper, we provided a description of the technical characteristics of the robotic platform and the sensors used for data collection. In addition, we detailed the acquisition campaign, the structure, and the characteristics of ARD-VO.
The utility of the data set is illustrated through two application studies, that is, robotic localization and mapping in rural F I G U R E 10 Example of multispectral elaborated images. visible red, green, and blue channels are overlaid with pixels whose NDVI level is over a fixed threshold. environments and plant health monitoring using multispectral imaging and NDVI maps. Specifically, a performance comparison of state-of-the-art SLAM and VO estimation methods using visual, LIDAR, and inertial data has been performed with the sequences of the proposed data set. It is also presented an NDVI map calculation procedure that elaborates the multispectral images and the GPS measurement gathered with the Micasense RedEdge-MX camera to compute a geo-referenced map encoding the plant health status within the monitored cultivation.

| Limitations and future challenges
As showed in the previous sections, the proposed data set can be used as a benchmark to facilitate the development of autonomous navigation and data analysis algorithms for applications in olive and vineyards contexts. Nonetheless, the data acquisition campaign we performed offers different insights about limitations of the overall acquisition system, and open challenges that might inspire future work.
First, the Agrobot platform we employed collects information of the surrounding environment from a "ground-level" viewpoint. While this makes it possible to observe fine-grained details of olives or grapes, it prevents to gather information about the upper parts of the cultivation or acquire laser scans and single images that capture, as an instance, an entire tree. Combining the ground platform with an aerial robot (e.g., a Micro Aerial Vehicle) might compensate this limitation and provide data sets that offer information of the crops from different viewpoints. This would give access to both fine-grained details of single fruits and bird-eye views of a group of trees or a row of vines.
Using a ground robot limits also the dimension of the cultivation fields in which we are interested in collecting data. This is due to both battery autonomy constraints and the velocity of the platform, which together limit the traveled distance within a single battery cycle. To address this problem, besides MAVs, also a multirobot setup with multiple ground platforms could be a viable alternative. The different robots could be coordinated to maximize the coverage of the cultivation field while optimizing the energy resources and the acquisition time. However, designing these solutions should take into account the additional complexity that stems from the need to synchronize the data collected from each unit.
While in the proposed data set different information, such as the olive growth stage, and the production characteristics and the indices of the vegetative-productive balance of the vineyards were provided, there are other important details that would be a valuable addition to foster agricultural analysis. These include (i) other phenological features of the plants, (ii) information such as the moisture or nitrogen content, and (iii) the presence of possible diseases. Future work will integrate additional devices, for example, humidity sensors, more multispectral cameras, and thermal imaging cameras, ensuring their tight synchronization with other data and the overall acquisition system. Furthermore, plants health status and characteristics will be recorded and associated with the data sequences.
Finally, during the acquisition session, the memory required to store the device measurements grows with the size of the cultivation and increases as more sensors are added to the systems. To address this issue, the onboard computational unit of the platform could be provided with online data processing procedures that extract from the raw the information of interest.

DATA AVAILABILITY STATEMENT
The ARDVO data set is available at https://isar.unipg.it/datasets/.