Automating analysis of vegetation with computer vision: Cover estimates and classification

Abstract This study develops an approach to automating the process of vegetation cover estimates using computer vision and pattern recognition algorithms. Visual cover estimation is a key tool for many ecological studies, yet quadrat‐based analyses are known to suffer from issues of consistency between people as well as across sites (spatially) and time (temporally). Previous efforts to estimate cover from photograps require considerable manual work. We demonstrate that an automated system can be used to estimate vegetation cover and the type of vegetation cover present using top–down photographs of 1 m by 1 m quadrats. Vegetation cover is estimated by modelling the distribution of color using a multivariate Gaussian. The type of vegetation cover is then classified, using illumination robust local binary pattern features, into two broad groups: graminoids (grasses) and forbs. This system is evaluated on two datasets from the globally distributed experiment, the Nutrient Network (NutNet). These NutNet sites were selected for analyses because repeat photographs were taken over time and these sites are representative of very different grassland ecosystems—a low stature subalpine grassland in an alpine region of Australia and a higher stature and more productive lowland grassland in the Pacific Northwest of the USA. We find that estimates of treatment effects on grass and forb cover did not differ between field and automated estimates for eight of nine experimental treatments. Conclusions about total vegetation cover did not correspond quite as strongly, particularly at the more productive site. A limitation with this automated system is that the total vegetation cover is given as a percentage of pixels considered to contain vegetation, but ecologists can distinguish species with overlapping coverage and thus can estimate total coverage to exceed 100%. Automated approaches such as this offer techniques for estimating vegetation cover that are repeatable, cheaper to use, and likely more reliable for quantifying changes in vegetation over the long‐term. These approaches would also enable ecologists to increase the spatial and temporal depth of their coverage estimates with methods that allow for vegetation sampling over large spatial scales quickly.

tion is a key tool for many ecological studies, yet quadrat-based analyses are known to suffer from issues of consistency between people as well as across sites (spatially) and time (temporally). Previous efforts to estimate cover from photograps require considerable manual work. We demonstrate that an automated system can be used to estimate vegetation cover and the type of vegetation cover present using top-down photographs of 1 m by 1 m quadrats. Vegetation cover is estimated by modelling the distribution of color using a multivariate Gaussian. The type of vegetation cover is then classified, using illumination robust local binary pattern features, into two broad groups: graminoids (grasses) and forbs. This system is evaluated on two datasets from the globally distributed experiment, the Nutrient Network (NutNet). These NutNet sites were selected for analyses because repeat photographs were taken over time and these sites are representative of very different grassland ecosystems-a low stature subalpine grassland in an alpine region of Australia and a higher stature and more productive lowland grassland in the Pacific Northwest of the USA. We find that estimates of treatment effects on grass and forb cover did not differ between field and automated estimates for eight of nine experimental treatments. Conclusions about total vegetation cover did not correspond quite as strongly, particularly at the more productive site. A limitation with this automated system is that the total vegetation cover is given as a percentage of pixels considered to contain vegetation, but ecologists can distinguish species with overlapping coverage and thus can estimate total coverage to exceed 100%. Automated approaches such as this offer techniques for estimating vegetation cover that are repeatable, cheaper to use, and likely more reliable for quantifying changes in vegetation over the long-term. These approaches would also enable ecologists to increase the spatial and temporal depth of their coverage estimates with methods that allow for vegetation sampling over large spatial scales quickly.

K E Y W O R D S
automation, computer vision, image analysis, visual cover estimate

| INTRODUC TI ON
Point quadrat analyses and visual cover estimation of vegetation are widely used, standard vegetation survey techniques that date back to the 1920s (Gleason, 1920). These survey techniques are expensive and can be time-consuming because they require an expert ecologist or botanist to make decisions about the amount and type of vegetation present. For these reasons, vegetation surveys are often conducted sparsely in terms of geographical location as well as temporally. In the case of the management of natural spaces on either private or public land, vegetation surveys and regular monitoring can be constrained by the availability of expert ecologists to be in the field with a reasonable assessment taking up to half a day per plot; survey time based on a plot size of 40 m 2 (Vittoz & Guisan, 2007).
With long-term research and monitoring, different ecologists may be required to conduct vegetation surveys over time, which increases the probabilities of inconsistencies in the accuracy of cover estimates made by multiple experts at the same site (Bergstedt, Westerberg, & Milber, 2009;Vittoz & Guisan, 2007). Vittoz and Guisan (2007) found that one ecologist completing a survey was more reliable than a succession of different ecologists. These issues motivated us to explore automated methods that could permit vegetation surveys to be conducted more frequently (temporally) and potentially over larger areas (spatially) with similar and, certainly for long-term monitoring, better consistency to an expert.
Algorithms that interpret the visual imagery are a key for automating plot-based vegetation surveys. Considerable work on detecting the presence of vegetation from imagery, and on classifying vegetation/weeds, has been conducted in the field of robotics, automation, and computer vision. In robotics, vegetation detection has been dominated by the use of vegetative indices that usually consist of a ratio between pixel values of color and/or near-infrared (NIR) imagery (Haug, Michaels, Biber, & Ostermann, 2014;Keranen, Aro, Tyystjarvi, & Nevalainen, 2003;Weis et al., 2008). A downside of these approaches is that they often require specialized hardware as joint color and near-infrared imagery is necessary. Outside of vegetative indices, researchers have explored transforming the raw color information (RGB) to a more suitable color space. Philipp and Rath (2002) found that the Lab, Luv, and HSV color spaces were particularly effective to discriminate between vegetation and nonvegetation.
Automated classification of vegetation is a challenge because of large intraclass and small interclass variations. Intraclass variation includes differences among growth stages of the same vegetation (species of plant), differences among species in the same vegetation type, and differences in growth form (shape, size) based on environmental conditions. Interclass variation can be small because species that are visually similar may belong to different vegetation types. For example, distinguishing a forb with long linear leaves (e.g., Plantago lanceolata) from a grass is a significant challenge. Successful applications of vegetation classification may require restricting the classification to particular subsets of plants. For example, Gerhards and Oebel (2006) distinguished between three different broadleaf weed species but grouped all grasses together. A variant of local binary patterns (LBPs) has also been used to classify leaf images for 51 species (Herdiyeni & Santoni, 2012) achieving good accuracy of 72% for this challenging multiclass problem. Haug et al. (2014) and Hung, Xu, and Sukkarieh (2014) approached this as a two-class classification problem, rather than multiclass classification. Haug et al. (2014) performed crop versus weed classification in an agricultural setting and achieved high accuracy using shape and pixel statistic features. Hung et al. (2014) learnt features to classify weeds versus not weeds to detect invasive weeds, using imagery collected by an unmanned aerial vehicle (UAV), they were able to achieve high accuracy (>90%) for two of three weed species. Recently, an approach to detect and their approach was promising for estimating total cover; however, it was not able to differentiate vegetation types, especially between species. Another limitation was that they required a ColorChecker board to be in the image so that the color in the images could be standardized. A reliable and consistent automated approach for estimating vegetation cover from visual imagery could revolutionize how we assess vegetation changes over time.
In this study, we take the approach of Bawden et al. (2017) and apply it to automate quadrat-based studies. This approach can make use of nonspecialized or consumer-level cameras, potentially even mobile phone imagery which is a near ubiquitous technology.
As they rely on collecting imagery, the time spent in the field will be much less than a field ecologist performing vegetation surveys.
Furthermore, such an approach has the advantage of being deployable on many lightweight robotic platforms (including unmanned aerial vehicles), as well as being usable by citizen-scientists.
We apply the automated system to digital SLR imagery and evaluate its efficacy to: (1) detect vegetation, and (2) classify vegetation into two broad groups (classes), grasses and forbs. Overall, we find grass and forb cover estimates are correlated with estimates made in-situ by field ecologists. We then consider the implications of these estimates for determining differences among experimental treatments; in many cases, the same conclusions were arrived at whether based on estimates from the automated system or from field ecologists.

| AUTOMATI C E S TIMATI ON OF VEG E TATION COVER
We present an automated approach 1 to detect and classify vegetation using still camera images. The proposed algorithms are evaluated on quadrat images taken by a field ecologist, using a topdown view of 1 m by 1 m quadrats. These images are taken at the same time that the field ecologist estimates the amount and type of vegetation present. This provides us with a catalogue of images with associated ground truth estimates of the amount and type of vegetation recorded by the ecologist in the field.
An assumption made is that the images taken from a site are consistent. This means that a similar camera was used to take the images and that the scale and pose of the camera was similar; the images were taken at a similar distance with a similar angle.
The performance of our proposed algorithms is compared against the question that is central for this study: "Can images from consumer-level cameras be analyzed to automatically estimate important coverage values"? To this end, we only evaluate the performance of our system against the result of the in-field ecologist (ground truth) for three key tasks: (1) estimating total vegetation cover, (2) estimating the total cover of vegetation type (grasses vs. forbs), and (3) the effect that a particular treatment had on the environmental system. As we concentrate on this central question, we do not explore engineering approaches to incrementally improve performance such as data augmentation (Krizhevsky, Sutskever, & Hinton, 2012;Poh, Marcel, & Bengio, 2003). More details on this dataset can be found in Section 3. Below, we outline the automated approaches for detecting and then classifying vegetation.

| Vegetation detection
To detect vegetation, we model the distribution of vegetation color using a multivariate Gaussian. We use a pretrained model as described in Bawden et al. (2017). The model is trained on data not taken from the quadrat images to ensure its generality for detecting (green) vegetation. This enables it to be deployed quickly and easily to different settings. The detection model was trained, evaluated, and tested using 40 images taken with two cameras: a Canon 7D and a mobile phone (Sony XPeria Z3 Compact). The 40 images were split such that 14 images were used for training, 14 for evaluation, and 12 for testing; The DSLR camera had an image resolution of 2,592 × 1,728 and the mobile phone had an image resolution of 3,840 × 2,160. Below we describe the approach taken, further details of this approach can also be found in Bawden et al. (2017).
The standard RGB representation of each pixel is first transformed to other well-known color spaces. We make use of the Lab, Luv, and HSV color spaces as these are known to provide more consistent representations of color than standard RGB. Each of these new color spaces consists of two chromaticity components and an intensity component (either L or V). To provide robustness to varying illumination conditions, we remove the intensity components by ignoring them and combine the three representations into a D = 6 dimensional feature vector z.
The color of green vegetation is then modelled using a multivariate Gaussian, and is standard irrespective to the photograph imaging equipment. This model combines the information from all D = 6 dimensions of the feature vector, z, and is defined by its mean μ and covariance Σ (assumed to be diagonal as calculating the log-likelihood of the diagonal function is the most efficient 2 ).

The likelihood that a pixel represents green vegetation is then given by,
A pixel is then declared as being green vegetation if its likelihood is greater than a predetermined threshold τ, The parameters of this model, θ = [μ, Σ], are estimated on an independent training set of 14 annotated images, Where the operator diag(.) takes the diagonal of the resultant matrix.
The predetermined threshold, τ, is calculated on an evaluation set of 14 annotated images and achieved an F 1 score of 91.5 on the 12 annotated test image (Bawden et al., 2017). The F 1 score is the point at which the precision, the ratio of selected pixels which are vegetation, and recall, the ratio of vegetation pixels which are correctly selected, are equal.

| Vegetation classification
Once a pixel has been identified as containing vegetation, we classify the type of vegetation present using visual texture based on a region around the pixel. The visual texture feature is a histogram of LBPs which was shown by Bawden et al. (2017) to achieve an accuracy of 96.0% and 95.9% for classifying grass and forbs. The procedure in three steps.

1.
For each pixel declared as being green vegetation, we take a two-dimensional (2D) window around it and each pixel within the window is converted to an LBP.

2.
We use a histogram of LBPs as the feature to compactly represent each 2D window. This histogram of LBPs is now our feature vector, y k , which is used to represent the visual texture associated to the k-th window, W k .

3.
The class for the k-th window is compared against templates for each class (grass and forb) and the class template that best matches (represents) the window is declared as the vegetation type for that pixel.
An example of output of this procedure is given in Figure 1. Below, we describe each step in more detail.
For step 1, we use a square sliding window of W × W pixels, also known as a sliding window. This type of window is commonly used to detect faces and other objects. Here, we used W = 100 pixels. Each pixel within the window is converted to an LBP.
They encode local texture information into a binary string consisting of N entries. These entries are obtained by comparing the central pixel P c with N surrounding pixels sampled at a distance R from the central pixel. This leads to an illumination robust feature as local differences are used to obtain the feature.
The local binary pattern for pixel values can be interpreted as a binary string that can be converted to an integer value, where h(x) is a binary function This process is applied to each pixel. If pixels are on the edge, then they are reflected but this process uses the imagery from the quadrat so edges are not an encountered issue.
For step 2, we use a histogram of LBPs to compactly represent each 2D window. This is achieved by summarizing the W × W LBP values from each 2D window as a histogram. The total number of histogram bins is N = 256 for this work. Thus, the k-th window, W k , is represented by the feature vector y k , which is the LBP histogram.
The feature vectors for two windows, A and B, are then compared using the cosine similarity measure, For step 3, we perform a two-class classification declaring each window of vegetation as being either grass or forb. Using the cosine similarity measure, the k-th window is compared against templates for each class (grass and forb) and the class template that best matches (represents) the window is declared as the vegetation type for that pixel; this is a nearest neighbor classifier.
The class templates were obtained by manually annotating a small number of randomly selected images from each site the class.
A subset of these windows that contained only grass or forb was chosen to be the templates. A total of 15 and 10 windows were annotated for Bogong High Plains and Smith Prairie respectively. The low number of annotations makes this approach rapidly deployable to new sites; however, it meant that other classifiers such as support vector machines or random forests could not be utilized.

| Nutrient network
The automatic estimation of vegetation cover is evaluated on two sites from the Nutrient Network (NutNet) Grace et al., 2016;Hautier et al., 2014). Bogong is an alpine grassland situ- Vegetative cover was surveyed at the Bogong site each year at the start of the growing season in January. Cover was estimated F I G U R E 1 An example output of the vegetation classification procedure. On the left is the area within the quadrat and on the right, from top to bottom, is the result of grass and forb classification separately for each species, so total cover could sum to >100% because of overlapping canopies. Cover was generally recorded to the nearest percent. At Smith Prairie, the vegetation was assessed in both May (Spring) and June (Summer) to capture the variation associated with species turnover at this higher productivity site characterized by a longer growing season. Where a species was recorded during both census times, the highest cover estimates were retained as the estimate.
As each site was part of the NutNet study, the experimental treatments were the same. There were ten treatment types that include the impact of the boundary (Fence) as well as the addition of nitrogen (N), phosphorus (P), potassium (K), and combinations thereof. Treatments were initiated in 2009 at Bogong site and in 2008 at the Smith Prairie site. Fertiliser application will generally increase species cover and generally increase the cover of graminoids particularly in areas that are fenced from vertebrate consumers, but this response can vary depending on the type of grasslands, and annual climatic conditions such as rainfall and temperature.

| Performance analysis
To evaluate the effectiveness of the automatic system in comparison with vegetation surveys conducted by field ecologists, we employed two performance measures. First, we evaluate how accurately the automated approach estimates the amount and type of vegetation present compared with visual estimates from ecologists. Second, we evaluate whether inferences were consistent between the two methods, the automated system and ecologists, by comparing analyses of treatment effects using automated and expert (ecologist)-derived data.
To measure how accurately the automated system estimates the amount and type of vegetation, we use two measures: (1) the correlation between the automated system and the field ecologist across all of the quadrats and (2)   where Z i is the random effect parameter and D is the covariance matrix of random effects, u i is the covariate associated with each treatment for the random effects, β 0 is the control treatment, ε i is the irreducible error, Σ i is the covariance matrix of the irreducible errors, and h(.) is a binary indicator function.

| Estimating vegetation and group coverage
We found reasonable Pearson's correlations between cover estimates made by the field ecologist(s) and automated system (Table 1).
For Bogong, across the 3 years, the correlation for grass and forb is 0.59 or greater. The correlations for Smith Prairie, for grass and forb, are slightly lower but always 0.48 or greater. This trend can also be seen in Figure 3, for Bogong, and Figure 4, for Smith Prairie, which present the estimated coverage for the automated system and field ecologist. Equivalent scatter plots are given in Figure 5, for Bogong, and Figure 6, for Smith Prairie.
Examining the kappa coefficients (Table 2)   A limitation with the automated system is that it is unable to detect some of the peaks for vegetation coverage, in particular when the estimated coverage exceeds 1.0. The automated system is unable to provide a coverage value greater than 1.0 as the vegetation coverage is the percentage of pixels that are considered to be vegetation (green), consequently the maximum value is 1.0. This explains the discrepancies in the Smith Prairie data for 2016, see Figure 4.

| Impact of treatment type
The field ecologist(s) and automated system are reasonably correlated when we examine cover response to the treatments (i.e., nutrient and/or exclusion of vertebrate consumers) using a LMEM We attribute the poor performance for vegetation coverage for Smith Prairie to the fact that the automated system describes vegetation coverage as the percentage of pixels that contain vegetation.
This is not equivalent to what a field ecologist is estimating. In particular, this analysis focused on dominant species at the time the images were taken, whereas the ecologist estimated cover in both spring and summer, and some species were subordinates that grew beneath the dominant species and thus were not evident in the images.

| D ISCUSS I ON
In this study, we have presented an automated approach that can estimate vegetation cover from digital images of quadrat taken with a regular camera and can classify that cover into two functional groups: grass and forb. We found that using the automated system would lead to similar treatment conclusions. Based on grass and forb coverage, nine of 10 of the interpretations of treatment responses would be the same at each of the two sites whether using the automated photograph interpretation process or the field ecologist data, although inference was less consistent for total vegetation cover. This demonstrates the potential of an automated system to facilitate the interpretation of quadrat imagery and provide important information with similar consistency and accuracy over space and time to a field ecologist.
A particularly exciting aspect of this work for ecology is the prospect to increase the spatial and temporal resolution of vegetation studies. An automated system like this could be applied to images of larger areas than quadrats. For example, images could be obtained from a UAV during a short time window, thus avoiding errors in interpretation that arise from species turnover and the phenological changes that occur during a long field season of vegetation sampling.
In fact, those phenological changes could also be studied at large scales by taking additional images throughout the season.
There are three limitations with the current automated system.
First, the photographs convert a three-dimensional system into a two-dimensional image, which limits the ability to assess understory cover. Also, we have assumed that the scale of the images is similar for a given site and that the viewing angle of the camera does not impact the overall area that a pixel is responsible for. Nevertheless, this method illustrates how cover of different vegetation types can be easily collected and then quantitatively evaluated to record changes over time. In particular, these methods offer a reliable alternative or complimentary approach to estimating cover that is more repeatable and likely to be more reliable over the long term and so is especially useful for long-term monitoring We recommend that top-down photographs be taken as a standard adjunct to any quadratbased measurement, and possibly even top-down video of an entire site using transects. While aggregated cover estimates can reliably be made, photographs also provide a permanent picture of vegetation at a point in time that can be analyzed retrospectively. As image analysis methods develop, the photographs can be reanalyzed to reflect these advances and so enable improved measurement of historical data.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R S CO NTR I B UTI O N
The following is a summary of the author contributions. The first author, Dr Chris McCool, designed, implemented, and ran the computer vision algorithms. He wrote much of the manuscript and collated the F I G U R E 7 Linear mixed effects model (LMEM) response effect sizes (intercepts) for forb, grass, and vegetation, respectively, for the Bogong High Plains (left) and Smith Prairie (right) datasets. Groundtruth based on the field ecologist estimates (Ground truth, salmon) and the results from the algorithm (model, aqua). Each row corresponds to, from top to bottom, the estimated forb, grass, and total vegetation coverage. Included is the 95% confidence interval for each value input from the other authors. The second author, Mr James Beattie, ran the statistical interpretation of the results comparing the computer vision algorithms and the ground truth (provided by in-field ecologists).
He also wrote considerable content for the manuscript. She contributed to writing major sections of the manuscript, in particular the ecological interpretation of the results.

E N D N OTE S 1
The code and data used is available through the following link: https:// tinyurl.com/automation-ecv.
2 Initial experiments found no improvement in performance when using the full covariance matrix.
3 30 quadrats were examined by a field ecologist from Smith Prairie; however, only 27 and 28 images were used, from 2015 to 2016, respectively, as the other images had inconsistent top-down views of the quadrats.