GrowliFlower: An image time series dataset for GROWth analysis of cauLIFLOWER

This article presents GrowliFlower, a georeferenced, image-based UAV time series dataset of two monitored cauliflower fields of size 0.39 and 0.60 ha acquired in 2020 and 2021. The dataset contains RGB and multispectral orthophotos from which about 14,000 individual plant coordinates are derived and provided. The coordinates enable the dataset users the extraction of complete and incomplete time series of image patches showing individual plants. The dataset contains collected phenotypic traits of 740 plants, including the developmental stage as well as plant and cauliflower size. As the harvestable product is completely covered by leaves, plant IDs and coordinates are provided to extract image pairs of plants pre and post defoliation, to facilitate estimations of cauliflower head size. Moreover, the dataset contains pixel-accurate leaf and plant instance segmentations, as well as stem annotations to address tasks like classification, detection, segmentation, instance segmentation, and similar computer vision tasks. The dataset aims to foster the development and evaluation of machine learning approaches. It specifically focuses on the analysis of growth and development of cauliflower and the derivation of phenotypic traits to foster the development of automation in agriculture. Two baseline results of instance segmentation at plant and leaf level based on the labeled instance segmentation data are presented. The entire data set is publicly available.


Introduction
Field-grown crops are strongly affected by prevailing environmental conditions. As a consequence, crop production requires careful plant management and complex decisions to minimize yield losses due to abiotic or biotic stresses. Farmers support plant growth and development through irrigation, fertilization, weeding, and pesticides applications, which are, however, costly and labor-intensive. To optimize plant management and support decision-making, farmers rely on frequent crop monitoring, but to date, this remains time-consuming and requires expert knowledge. Typically, farmers and agricultural advisors monitor fields regularly through spot checks of individual plants. Here, remote sensing and analysis methods can help farmers to monitor whole fields more comprehensively (Chi et al., 2016;Weiss et al., 2020).
Remote sensing data can be collected at all scales without damaging or impacting crops. Large-scale observations from satellites or aircraft and medium-scale observations from Unmanned Aerial Vehicles (UAVs) provide an overview of larger agricultural areas (Lillesand et al., 2015). Large-area crop monitoring with such sensors makes it possible to detect heterogeneity in the field and support the farmer's decision-making regarding field management. With such area-wide yet detailed information on biotic and abiotic stress, these factors can be counteracted more selectively to support environmentally friendly plant management. Medium-scale and close-range observations acquired from UAVs and ground robots are beneficial for collecting detailed information and can be used especially well for phenotyping individual plants. Nock et al. (2016), for example, use optical remote sensing data to define traits like structural and phenotypical characteristics on all levels from individual plants to whole areas. Other applications using remote sensing data are yield estimation (Chaparro et al., 2018), yield forecasting (Mosleh et al., 2015), and monitoring of rapid land surface changes (Verger et al., 2014).
To process and interpret large amounts of remote sensing data, machine learning (ML) methods become increasingly important (Lary et al., 2016). ML is concerned with learning a predictive function that relates observations to the desired output. The learned models can be flexibly designed with respect to the type of observations (Debolini et al., 2015;Reichstein et al., 2019), and can, for example, identify plant traits from remote sensing data (Ali et al., 2015;Verrelst et al., 2019). A main area of application is plant phenotyping, which can be made more objective and automated by using advanced ML methods such as deep neural networks. Romera-Paredes and Torr (2016), Ren and Zemel (2017), and Scharr et al. (2016), for example, learn ML models to infer phenotypic traits such as the number of leaves per plant. Similar traits can also be derived with a combination of object and leaf keypoint detection, allowing observation of plant growth as done by Weyler et al. (2021). Sa et al. (2016) use deep convolutional neural networks for the detection of single fruits, which serves e.g. as a precursor for later autonomous harvesting (Arad et al., 2020). Drees et al. (2021) use image time series of cauliflower and broccoli to predict the growth in the field using conditional generative adversarial networks (Isola et al., 2017). In detail, they generate an image of a plant at a later time point and use Mask R-CNN (He et al., 2017) to calculate the projected leaf area. Another typical agricultural application is weed control in the field, where weeds, crops, and soil need to be distinguished. With the use of neural networks, promising results have already been achieved, where the task can be approached using classification , detection  or semantic segmentation Ahmadi et al., 2021).
To foster the development of ML methods for plant-specific tasks using remote sensing data, benchmark datasets with annotations and in-situ measurements are beneficial. Although various benchmark datasets already exist, many of these are domain-specific with objects such as buildings (Roscher et al., 2020), animals (Deng et al., 2009) and other semantics such as land cover (Cordts et al., 2015). Therefore, they are not suitable for plant applications in general. The link between ML and plant sciences is becoming increasingly important (Lary et al., 2016), as can be seen from the growing number of related publications in recent years (Chebrolu et al., 2017;Förster et al., 2019;Kierdorf et al., 2019;Zabawa et al., 2019;Halstead et al., 2020;Ahmadi et al., 2021). Despite increased demand, there are only a few publicly available plant-specific datasets that can be used for ML purposes.
Among the few datasets that are publicly available or covered in the literature, many have been acquired in a greenhouse (Scharr et al., 2014;Minervini et al., 2016;Mureşan and Oltean, 2017;Halstead et al., 2020) or are based on synthetically generated data (Ward and Moghadam, 2018;Kierdorf et al., 2022), making them difficult to apply to real-world scenarios. In particular, the greenhouse-grown plant Arabidopsis thaliana rosettes is a frequently used research plant in the ML community due to its simple rosette morphology (Scharr et al., 2014). However, agricultural crop plants are more diverse in their morphology and their development is affected by changing environmental conditions and abiotic as well as biotic stresses. Because of this, the need for agricultural datasets that represent field conditions and cover challenges such as occlusions and variable shapes, poses, and colors of plants and plant parts is high.
An active research area is modeling the temporal development of plant growth and plant traits, which requires datasets that monitor plants over time. However, publicly available time series datasets of plants are rare. One of these is the cauliflower and broccoli (Brassica oleracea) dataset from Bender et al. (2020). The dataset was acquired with a camera-equipped robot that took close-range images at several time points. However, the dataset is limited to a few plants and lacks semantic information and accurate georeferencing of single plants.
Cauliflower is a suitable crop plant to develop ML algorithms, as its cultivation, morphology and economic value give rise to many potential applications in the context of the digitization of agriculture. It is a high-value crop that needs to fulfill high quality criteria. Precise timing of plant management procedures is needed to avoid yield losses by abiotic or biotic stress and produce marketable cauliflowers. Cauliflower harvesting is a labor-intensive process, as each cauliflower must be harvested within about one week, when heads have a sufficient size but are not yet overripe. Due to within-field variability in plant development, cauliflower must be harvested by hand. As the head is covered by leaves, each individual cauliflower head must be touched to assess, if it fulfills the size criteria. After cutting and removing the surrounding leaves, product quality is visually assessed, to dismiss heads with discolorations, misshapes, or stress symptoms. Cauliflower growth is highly climate-dependent, making it hard to predict the time for harvest. Depending on the prevailing temperature, irradiance, and soil water availability, plants may develop rather heterogeneous, so that harvesting of simultaneously established fields can take weeks. Under favorable conditions, plants in sequentially established fields develop may need to be harvested at the same time, which requires more workers and lowers the price per cauliflower. The early prediction of harvestable plants and the time of harvest would allow for better planning of sales and bring economic advantages for farmers. This article presents an agricultural dataset suitable for the development of ML approaches. The provided dataset is meant to specifically address the analysis of growth and development of crop plants and the derivation of phenotypic traits relevant for agricultural applications to foster the development of automation in agriculture. The dataset comprises: • RGB and multispectral orthophotos of two different cauliflower fields acquired over the whole growing period, from planting to harvest; • plant IDs and coordinates enabling the dataset user the extraction of complete and incomplete time series of image patches showing individual plants accompanied with in-situ reference data captured manually on the field; • plant IDs and coordinates enabling the dataset user the extraction of image pairs of plants pre and post defoliation accompanied with a time series of the respective plant to allow an analysis of the correlation between the external appearance and the internal head of the cauliflower plant; • pixel-accurate labeled data useful for classification, detection, segmentation, instance segmentation, and similar computer vision tasks on plant and leaf level.

Field design
Cauliflower fields were located on a farm in Western Germany (50°46'6.742" N, 6°58'20.271"O), close to the city of Bornheim, 20 km south of Cologne (see Fig. 1). The mean annual temperature in Bornheim is 14°C and the mean annual precipitation is 383 mm per year. It is dry 142 days a year with an average humidity of 81%. On the farm, fertile soil is available.
We acquired data for two fields: 1) the field shown in Fig. 1 in blue, further referred to as field 1, in the year 2020 and 2) the field shown in orange, further referred to as field 2, in the year 2021. The cauliflower plants for both fields are planted in rows with an orientation from northwest to southeast. Fields are designed for sprayers with a working width of 18 m. Before planting, fields were plowed to prepare the soil. Tractors with 1.8 m track width were used to plant five rows of nursery-grown young cauliflower plants at a time, with 3 rows between tractor tracks. The distance between rows was 0.6 m and the distance between plants within a row was 0.5 m, resulting in a planting density of 33000 plants/hectare. Every 18 m, there is a 2 m wide lane for spraying and irrigation. Fields were subject to conventional farming practices including hoeing of cauliflower plants before canopy closure to reduce weeds as well as application of pesticides (including herbicides, insecticides, and fungicides). In addition, fields were irrigated when needed using sprinklers. In both fields, abiotic and biotic stresses were consequently rather low, and plants developed rather uniform.

Field 1
This field has a width of about 100 m and a length of 240 m. Thus, the area is about 2.4 ha. The field was planted with the cultivar Korlanu (Syngenta, Maintal, Germany). Three quarters of the field were planted with plants from seedling trays (Fig. 2, left) on July 28 th , 2020 from the southwest direction. The remaining north-eastern part of the field was planted on July 29 th , 2020. It is worth mentioning that in this field, almost no weeds have grown.

Field 2
This field has a width of about 55 m and a length of 210 m. Thus, the area is approximately 1.32 ha. The field was planted with the cultivar Guideline (Syngenta, Maintal, Germany). The plants were transplanted from seedling trays on June 15 th , 2021. Field 2 shows more weeds than field 1, especially along the southwestern edge of the field due to Figure 2: Example field and plant images. Image (a) shows seedling trays before planting. Image (b) shows plants two weeks after planting. Image (c-d) were taken four weeks after planting and illustrates how different plants develop over time and to get a feeling of how a field looks like. Image (e) depicts plants shortly before head formation.

Data collection
Three types of data are collected, namely: 1. RGB and multispectral UAV image data with a high spatial resolution which is an indirect measurement of the phenotypic development of the plants; 2. georeferenced ground control points (GCPs) to locate the data in space, spatially arranged according to the field size to ensure accurate and robust processing of orthophotos (Persia et al., 2020); 3. in-situ measurements of phenotypic traits characterizing the development state and stress factors that serve as reference observations.
The different types of data are collected on the same day to synchronize them. However, to ensure that workers are not visible in the image data, the data acquisitions were not conducted at the same time. The acquisition was carried out once a week during the entire growth period. During the harvest period, data was collected once between two different harvest days and once after the last harvest. Drone flights were only performed on sunny or overcast days to ensure stable illumination for the generation of orthophotos without shading effects due to moving clouds. Due to this, the time intervals between successive overflights vary slightly. Fig. 3 illustrates the dates of data collection for both monitored fields. As seen in the top timeline, seven orthophotos are only partly available, which will be discussed in Sec. 4.1. The data collection took a few hours per day, with the in-situ measurements being the most time-intensive. Data collection was adjusted to both field conditions, resulting in adaptations to camera settings, number of GCPs, and flight altitude.
Thus, the following subsections describe the procedure separately for field 1 and field 2.

RGB and multispectral imaging using UAVs
UAV images were taken with a DJI Matrice 600 hexacopter and two mounted cameras (Fig. 4). The first camera is a Sony A7 rIII RGB camera. It contains a Zeiss/Batis 2.0 lens with a resolution of 47.4 MP. The focal length is 25 mm with a field of view of 71.5°. A shutter speed of 1/1250 th and aperture of floating with a largest value of 2.0 was chosen. The ISO value was set to automated for field 1 and changed to 50 for field 2 in order to align our approach with the image-capture settings recommended by Agisoft. The second camera is a MicaSense 5CH for multispectral image data. It contains five built-in lenses with a resolution of 1.2 MP per band. The wavelengths of the five acquired bands are 475 nm, 560 nm, 668 nm, 717 nm, and 840 nm. The focal length of the camera is 5.4 mm. For field 1, an altitude of around 10 m and image overlap of 60/80 was used, whereas, for field 2, an altitude of around 16 m and image overlap of 80/80 was used to optimize data acquisition and resulting image data processing. We ensured the following for each flight: No irrigation in or too close to the flight area, flying with a temperature and wind speed within the drone's safe operating range, and no rain during the whole flight.

Time series flights
For each acquisition date, a specified field area was flown over once which remains the same for the whole growing period. For field 1, this area had a width of 91 m and length of 62 m, resulting in approximately 0.60 ha. For field 2, the area had a width of 30 m and length of 131 m, resulting in approximately 0.39 ha.
Because the plant does not necessarily grow straight, the center of the plant in later growing stages does not match the position of the seedling traits exactly (Grenzdörffer, 2019). A shift of up to ±10 cm between the center position of the head and the stem position of the early growing stages was observed.

Defoliation flights
In addition to time series flights, so-called defoliation flights were conducted. For those images, the upper leaf layers covering the cauliflower head were manually removed for individual plants after the time series flight. This step is further referred to as defoliation. Care has been taken to ensure that the defoliated leaves do not affect neighboring plants. The defoliated plants give information about the development of the head in relation to the outer appearance of the plant. By performing another UAV flight after defoliation, a dataset of plants is given for which the time series of the outer appearance (Fig. 5b) of the plant in addition to the inner head (Fig. 5c) on the day of defoliation is recorded.
For field 1, the defoliation of plants was performed on two days, October 27 th and 29 th , after harvesting took place. Thus, the defoliated plants represented plants, whose head size did not fulfill the quality criteria for harvest, which majorly meant that the head size was too small. For field 2, starting on August, 19, when most of the cauliflower heads started developing, weekly between 70 and 200 plants were defoliated. All plants with developed heads were defoliated in rectangular plot regions to minimize the impact of defoliation on the biological growth of neighboring  plants. Care was taken not to defoliate the reference plants described in section in-situ measurements (Sec. 3.3). A distribution of plots for the first 5 defoliation time points is shown in Fig. 5a. For the last overflight (after the last harvest), most of the remaining plants that had not yet been harvested were defoliated, which resulted in a random distribution and is therefore not shown in Fig. 5a.

Georeferenced Ground Control Points (GCPs)
To localize the image data globally in space, the data were georeferenced with the help of circular 12-bit GCPs with a diameter of around 20 cm, as shown in Fig. 6. The GCPs were fixed in the ground using plastic pegs. GCPs were evenly distributed across the field (see appendix Fig. 17) and positioned on tractor tracks or between plants, to avoid displacement by external influences such as plowing. To ensure their visibility on image data, surrounding plants were removed where necessary. We used 21 GCPs in field 1 (35 GCPs/ha) and 44 GCPs in field 2 (113 GCPs/ha) (see Fig. 17 in the appendix), with each GCP showing a different pattern. The greater number of GCPs in field 2 is due to the fact that they facilitate subsequent image alignment by ensuring at least 3 GCPs in each captured UAV image, especially for growth stages with a high degree of plant overlap and dense canopy.
As measuring device, the Trimble R4-Model 3 Base station with a horizontal standard deviation of ± 5 mm + 0.5 ppm RMS and vertical standard deviation of ± 5 mm + 1 ppm RMS was used for both fields. In addition, the Trimble Juno slate controller was used. The measured coordinates are given in the coordinate system WGS84 / UTM 32N. To control that the markers for GCPs were not displaced due to external influences, the GCPs were measured at the beginning and end of the campaign to dismiss GCPs that were displaced. For field 2, a third measurement was added in the middle of the growing period.

In-situ measurements of plant development
In each field, so-called reference plots were selected where information about the plants, denoted as reference plants, located in these areas was manually captured. For field 1, there are four reference plots (see appendix Fig. 18a). Each The plots are distributed in the north-western half along the long side of the field. For field 2, there are five reference plots (Fig. 7a). Each plot consists of 5 rows with 20 plants each (Fig. 7b). Thus, there are 100 plants per plot, 500 plants in total. The plots are evenly distributed in the south-western half along the long side of the field. Thus, reference data are collected along the entire field. Each reference plant is assigned its specific plant ID, which consists of the row (Field 1: A-C; Field 2: A-E) and plant number (Field 1: 1-10, 90-99; Field 2: 1-20).
For all reference plants of field 1, the following measurements were taken: As the farmer pursued a rigorous plant protection schedule and hardly any stresses were detected in 2020, information about stresses was no longer explicitly recorded in 2021. Due to the observed homogeneous development, the focus

Dataset
The basis of the dataset (Fig. 8) are RGB and multispectral orthophotos derived from captured UAV images. Single plants are identifiable via their coordinates and plant IDs in the orthophotos. The dataset contains four subsets intended for different machine learning tasks. The instance segmentation subset GrowliFlowerL contains patches that are extracted and processed from the RGB orthophotos. The other three subsets contain time series of individual plants. The subset GrowliFlowerT comprises randomly selected time series representing a large variety of cauliflower developments. The subset GrowliFlowerD contains additional image pairs of the plants before and after defoliation besides the time series. GrowliFlowerR contains in-situ measurements in addition to the time series. For each field, a txt-file including measured coordinates of GCPs at the beginning and the end of the field monitoring is provided. For field two, coordinates measured during the growing period are also given.

Orthophotos (GrowliFlowerO and GrowliFlowerM)
The acquired RGB and multispectral UAV images are aligned to orthophotos using Agisoft Metashape Professional software to obtain a large-scale overview of the monitored fields. In combination with measured GCP coordinates, the orthophotos are georeferenced. The individual orthophotos are exported in the coordinate system WGS84 / UTM 32.
The ground resolution for the RGB orthophotos of field 1 is 1.65 mm/px, respectively for pixel width and height, with a minimum and maximum file storage size of 1.64 GB and 6.7 GB. The ground resolution for field 2 is 3.10 mm/px with a minimum and maximum file storage size of 1.3 GB and 5.0 GB. 12 orthophotos are available for field 1, where 5 are processed entirely and 7 contain data gaps for small areas where the UAV images' quality was not high enough. For field 2, 15 orthophotos are available, as shown in Fig. 3b. The set of orthophotos denoted as GrowliFlowerO is provided. Additionally, the dataset contains multispectral orthophotos for field 2 with a ground resolution of 2.5 cm per pixel width and length, denoted as GrowliFlowerM.

RGB image patches
The data described in this section is extracted from the RGB orthophotos. The ground resolution of the resulting image patches is the same as for the respective orthophotos.
Each of the following described datasets (excluding the labeled dataset in Sec. 4.2.1) contains a txt-file with global information for each field, containing the image ID, including the plant ID, and the corresponding georeferenced UTM coordinate for the plants. The coordinates refer to the center of the plants observed on August, 19 th for field 1 and July, 7 th for field 2. Additionally, information about the planting day and a proposed assignment to training, validation, or test subset is given as the basis for the comparison of machine learning methods. The proposed training, validation, and testing subsets are spatially disjoint to minimize spatial correlation between sets. Yet, certain systematic factors from a biological point of view are not excluded. The use of these sets is expected to promote the development of machine learning methods with high generalization ability. For reference data, presented in Sec. 4.2.3, the harvesting time is specified and for defoliation data, presented in Sec. 4.2.4, the defoliation date of the plants. Furthermore, txt-files with local information for each acquisition date are provided, which contain the image ID to connect the local information with the global information, and the corresponding local pixel coordinate with respect to the respective orthophoto for each data acquisition day. Moreover, information about the day after planting (dap) is added.
For the use of image patches showing single plants, patches have to be extracted out of the orthophotos using plant IDs and coordinates. An image side length and width of at least 490 px for field 1 and of at least 256 px for field 2 is recommended to ensure that regardless of plant developmental stage, the whole plant is fully captured on the image patch.

Labeled image patches (GrowliFlowerL)
This subset consists of pixel-wise, manually annotated images and is well suited for tasks like classification, semantic segmentation, detection, instance segmentation, or stem detection. For this set, image patches of four acquisition dates of field 1 are extracted by a sliding window approach. The image patches have a size of 368 px × 448 px. The size of the patches varies from the proposed sizes, as only plants from the earlier stages of development are included. Furthermore, in this dataset, the focus is not on individual plants but on variability between images, so that the plants are not located in the center of the patch either.
(1) The plant instance mask segment the image in soil and plant pixels with instance information for plants.
(2) The leaf instance mask segment the plants in their single leaves. Plants at image borders for which no stem or only a quarter of the plant is visible are annotated as void and no leaf annotation is done.
(3) The mask, including void segmentations, is a binary mask where only plants are segmented as void located at image borders where no stem is visible, or only a small amount of leaves is visible in the RGB image.
(4) The stem annotation mask represents the position of the stems of non-void plants.

Time series for plant data (GrowliFlowerT)
For each field, plant coordinates are determined to enable the dataset user to extract plant image time series. This data is denoted with GrowliFlowerT. Time series of field 1 comprise the early plant developmental stages as well as the harvest dates, but lack dates, when the canopy was closed. Time series of field 2 comprise all growth stages.
For field 1, coordinates for about a third of the plants in the field are determined, which are 3804 plants in total. The distribution of the location of the extracted data is visually seen in Fig. 19a in the appendix. The chosen plants are distributed along the southeastern edge of the field due to the availability of data for most time points and the possibility to determine the harvest window of individual plants. October, 19 th , one week before harvest, is the only day no image data for any plant is available for GrowliFlowerT. The dataset is divided into a training, validation, and test set as shown in Fig. 19a in the appendix. In addition, it is ensured that cauliflower planted on July, 28 th or July, 29 th is included in all three sets. Since the orthophotos do not overlap completely, image data is not available for all plants at all times. This leads to temporal incomplete time series. For field 2, 8736 coordinates of plants were extracted, evenly distributed over the field. The dataset is divided into training, validation, and test set as shown in Fig. 19b in the appendix. All plant coordinates are provided as georeferenced UTM coordinates.  For the use of individual plant images, patches have to be cropped by the dataset user around the local plant coordinates determined in the dataset. The dataset contains, in addition to all global plant coordinates, the local coordinates of the patches for each acquisition date, which at a size of 490 px × 490 px for field 1 and 256 px × 256 px for field 2 lie completely within the orthophoto and are not showing spatial data gaps, as patches shown in Fig. 12b. Five examples of time series are shown in Fig. 10 for field 1 and four in Fig. 11 for field 2. Due to spatial data gaps, the amount of coordinates per date for field 1 varies, which leads to temporal data gaps within the time series. The largest set of time series that includes equal time steps consists of 3611 time series based on eight time points, including the five time points up to day after planting 42 (Sept, 8 th ), and all three time points from day after planting 91 (Oct, 27 th ). In addition to the file that contains all UTM coordinates, a txt-file containing UTM coordinates for this set is also provided, so that time series can be extracted for these selected plant IDs by the dataset user. After removing patches with spatial data gaps, there are 8402 complete image time series for field 2. Due to the heterogeneous weed occurrence in field 2, the patches contain different amounts of weed, as can be seen in Fig. 12a. Due to the given UTM coordinates, there is the possibility to extract the complete time series set of local coordinates for both fields if needed.

Time series for reference plant data (GrowliFlowerR)
For each field, the dataset includes plant IDs and coordinates to enable the dataset user the extraction of an image time series set of monitored reference plants. Those time series look similar to those described in Sec. 4.2.2. Time series of field 1 comprise the early plant developmental stages as well as the harvest dates, but lack dates, when the canopy was closed. Time series of field 2 comprise all growth stages (see Fig. 11). Tab. 2 gives the distribution of available plant IDs and thus, the number of images of plants per time point for field 1. The orthophotos pre defoliation of October, 27 th and October, 29 th do not overlap with the reference plots due to the low quality of underlying UAV images. Since the reference plants were not defoliated, the orthophotos of the defoliation flights are used to extract

Jun 23
Jul 01 Jul 07 Jul 12 Jul 20 Jul 29 Aug 04 Aug 11 Aug 19 Aug 23 Aug 25 Aug 30 Sept 03 Sept 08   images of these days for reference time series. For field 2, all local coordinates are given for every acquisition date which enables the user to extract complete image time series. The data is divided into a training, validation, and test set for both fields. Plants of each plot are presented in each set. The visual distribution for both fields is shown in the appendix in Fig. 20.

Time series for defoliated plant data (GrowliFlowerD)
For field 1, the dataset contains in total 130 plant IDs and coordinates of defoliated plants, 30 for October, 27 th and 100 for October, 29 th . For field 2, it contains in total 722 plant IDs and coordinates of defoliated plants. The coordinates enable the dataset user to extract time series of defoliated plants. Tab. 3 gives an overview of how many plants were defoliated on which acquisition day. Besides the time series, pairs of pre and post defoliation images are provided. The data is divided into a training, validation, and test set for both fields. Each day of defoliation is presented in each set. The visual distribution for both fields is shown in the appendix in Fig. 21.

In-situ data
Two csv-files are provided, one for each field, which contain the plant ID in addition to the described measurements of Sec. 3.3 for each data acquisition day. The measured values correlate with the images of GrowliFlowerR. Fig. 13 shows the distribution of the number of harvested plants within the reference plots per acquisition date for both fields.  5 Baseline for instance segmentation application

Experimental Setup
We show two possible applications of the presented data by creating baselines using the labeled dataset GrowliFlowerL and Mask R-CNN (He et al., 2017), a state-of-the-art method for instance segmentation. We address the tasks of segmenting plant instances and segmenting leaf instances, hence, we use the mask and bounding boxes derived from the mask of plant instances as target for one baseline and the mask and bounding box derived from the mask including leaf instances as target for a second baseline. For leaf instance segmentation baseline the given void instances are used as background since only leaves that do not belong to the void plants have been labeled. The estimation of the semantic masks for the individual instances enables a derivation of phenotypic traits. As data augmentation, we use random horizontal flipping with a probability of 0.5.
We train on a single GPU machine with Intel Core i7-6850K 3.60 GHz processor and a Geforce GTX 1080Ti with 11 GB RAM. The network is pretrained on COCO dataset (Lin et al., 2014). The training is done over 100 epochs with a learning rate of 0.001 and batchsize of 2. We use an SGD optimizer and for the backbone network a ResNet-50.

Evaluation metrics
The Intersection over Union (IoU) was calculated as: following the evaluation metrics of COCO dataset (Lin et al., 2014), where TP are true positives, FP false positives and FN false negatives. Two additional scores used are precision p and recall r. They are defined as: The F1 score summarizes the precision and recall scores and is defined as We compute precision, recall and F1 score in respect to the single object class cauliflower plant and calculate the scores for IoU thresholds tIoU = 0.50 and tIoU = 0.75. In addition, we determine the average precision (AP), average recall (AR) and average F1 (AF1) score over all IoU in the interval 0.50 − 0.95 with step size 0.05 as for the COCO benchmark. This is indicated by (·)@0.5 − 0.95. For the leaf instance segmentation baseline, we reduce the evaluation on recall, as we do not want to penalize predictions on void pixels.

Results
We perform the metric calculations with respect to the detected bounding boxes and the segmented masks of the respective objects. In the case of segmented masks, we consider the accumulated number of correctly classified pixels and thus, the more precise shape of the object. With the bounding box, it is more generally about the accuracy of the detection and thus, the localization of the object.
Tab. 4 summarizes the results for plant instance segmentation for the baseline method. The table shows that 95% at IoU ≥ 0.5 are predicted correctly. The precision on bounding box level and pixel level is above 80% for all IoU thresholds ≤ 0.8 (see Fig. 14a). At an IoU ≥ 0.85 it decreases quickly. This also applies for recall (Fig. 14b) and F1 score (Fig. 14c). For higher IoU values the prediction on pixel level is less accurate as on bounding box level since slight changes in the segmentation generally lead to higher errors in the segmentation mask than in the bounding box. An overview of our metric results is given in Tab. 4.
Looking at the visual results, it can be seen that many of the objects and masks are estimated accurately (Fig. 16a).
The results show all prediction with a score higher than a threshold of 50%. A precise contour is estimated and in the earlier stages of development the instances are well spatially separated. The ground is not considered to be an object in any case, nor are the smaller weeds that can be seen on some patches. The inaccuracies occur especially with plants that lie at the edge of the image patches. Only small parts of the plant are visible and therefore its leaves are not adjacent to each other, see Fig. 16b top left and bottom left. Some errors occur in later developmental stages since the plants overlap (Fig. 16b bottom right) which states a more challenging scenario than well-separated plants.
In particular for overlapping plants, it is difficult even with the human eye to assign the leaves to the individual instances. Furthermore, there are not as many training images available for the later stage of maturity compared to the earlier stages of development where no overlapping occurs. Another distinctive feature involves plant objects that loose leaves or are impaired in their growth and thus decay. Thus, it is difficult for the model to distinguish whether one or more plants are represented (Fig. 16b top right).  In the baseline for leaf instance segmentation, the recall results are slightly worse than for plant instance segmentation (see Tab. 15b), suggesting that the task is more difficult than that of plant instance segmentation. The distinction between individual leaf instances is more complex than the distinction between plant instances. In addition, we do not assign the void labeled objects to class leaf but class BG for this baseline, since individual void plants can also contain several leaves, which, however, were not individually labeled. The calculated values for recall are similar for pixel level and bounding box level.
We find explanations in the visual consideration of the results. Even though these results show predictions with a score higher than a threshold of 50%. By our definition of void instances as background, the model is challenged in non-predicting leaves belonging to void instances, as shown in Fig. 16d top left and bottom left. The model has difficulty distinguishing whether plants at the edge of the patches are void instances or leaf instances. Therefore, either leaves are predicted that are present in the target (low precision) or no leaves are predicted although they are present in the target (low recall). For plants that are completely visible in the patch, the model has more strength in prediction. Another source of error is the prediction of several instances on one leaf as shown in Fig. 16d top right and bottom right because the model needs to learn features such as leaf structure and size, as these play a crucial role in distinguishing different leaves.
We observe that our instance segmentations, plant instance as well as leaf instance, perform and can be used for different growth stages of the cauliflower plants.

Conclusion and future directions
This article introduces the GrowliFlower dataset -a georeferenced, image-based UAV time series dataset of two monitored cauliflower fields during their entire growth period. The paper includes the description of the dataset and provides insights into the data collection process that can be helpful for other data collection activities. The dataset consists of weekly RGB and multispectral UAV orthophotos and image time series of individual plants reflecting weekly plant growth. For a subset of the time series, in-situ reference measurements such as plant size are available. For another subset, pre and post images of defoliation are available to provide a relation between the interior and exterior of a cauliflower plant. The dataset also contains annotations with segmented plant and leaf instances as well as annotations on stems. The data is available at http://rs.ipb.uni-bonn.de/data/. The dataset is intended to advance and evaluate machine learning methods and to foster close collaboration between different disciplines such as agricultural sciences, remote sensing, and machine learning. We present baseline results from two applications that were approached using a Mask R-CNN. One application for plant instance segmentation and one for leaf instance segmentation. Furthermore, the findings and descriptions should help to ensure that the conduction of the data collection can be used and transferred to other areas.   Figure 18: Visual overview of (a) reference plots for in-situ measurements within field 1 and (b) the respective design of reference plot 4 including reference plants and ordering of reference plant numbers. The plot design is valid for all reference plots of field 1. and test set (red). For field 1, the two planting days are separated using dark colors for July 28 th , 2020 and light colors for July 29 th , 2020.