Evaluating new technology for biodiversity monitoring: Are drone surveys biased?

Abstract Drones and machine learning‐based automated detection methods are being used by ecologists to conduct wildlife surveys with increasing frequency. When traditional survey methods have been evaluated, a range of factors have been found to influence detection probabilities, including individual differences among conspecific animals, which can thus introduce biases into survey counts. There has been no such evaluation of drone‐based surveys using automated detection in a natural setting. This is important to establish since any biases in counts made using these methods will need to be accounted for, to provide accurate data and improve decision‐making for threatened species. In this study, a rare opportunity to survey a ground‐truthed, individually marked population of 48 koalas in their natural habitat allowed for direct comparison of the factors impacting detection probability in both ground observation and drone surveys with manual and automated detection. We found that sex and host tree preferences impacted detection in ground surveys and in manual analysis of drone imagery with female koalas likely to be under‐represented, and koalas higher in taller trees detected less frequently when present. Tree species composition of a forest stand also impacted on detections. In contrast, none of these factors impacted on automated detection. This suggests that the combination of drone‐captured imagery and machine learning does not suffer from the same biases that affect conventional ground surveys. This provides further evidence that drones and machine learning are promising tools for gathering reliable detection data to better inform the management of threatened populations.

areas more rapidly than ground surveyors, often where it may not be feasible or practical for ground-based researchers to conduct surveys (Koh & Wich, 2012;Seymour et al., 2017). While the data captured by drones have most often been analyzed manually, machine learning-based methods are being used with increasing frequency to automated detection (Corcoran et al., 2019;Lhoest et al., 2015;Longmore et al., 2017;Oishi et al., 2018;Seymour et al., 2017). This has led to faster and more accurate counts of wildlife population with fewer false-negative results compared to manual image analysis and conventional ground surveys (Corcoran et al., 2019;Gonzalez et al., 2016;Lhoest et al., 2015;Longmore et al., 2017;Oishi et al., 2018;Seymour et al., 2017).
Given the surge in the use of drones, machine learning, and their potential importance in conservation programs, it is important to establish their limits and any possible biases. For example, in traditional surveys, accounting for the probability of detection of individual animals is fundamental for accurate abundance estimation (Royle & Dorazio, 2006). Unaccounted for biases in detection of species introduced by new survey methods could mask declines if the new methods result in an increased probability of detection compared to currently used methods (Kery & Schmidt, 2008). Conversely, they could erroneously suggest declines if they result in decreased probability of detection compared to currently used methods (Kery & Schmidt, 2008). Differences among individuals due to demographic traits, behaviors, and habitat selection can lead to hidden biases that result in inaccurate counts of populations when using traditional survey methods (Biro, 2013;Carter et al., 2012;Merrick & Koprowski, 2017). For example, the abundance of wolves in France was found to be underestimated by 27% when individual differences in detectability were not considered (Marescot et al., 2011).
While efforts have been made to understand variables that influence detection probabilities in drone surveys (Baxter & Hamilton, 2018), as yet there have been no studies that have been able to robustly compare these factors with traditional survey methods. Traits such as sex, age, and reproductive status and behaviors such as movement, burrowing, and microhabitat preferences that differ among individual animals have been shown to impact on detection probability in remotely sensed camera trap data (Karp, 2020;Meek et al., 2014Meek et al., , 2016. These factors are important to consider, as they may lead to certain demographics of a population being less likely to be detected and therefore under-represented in counts. This leads to underestimations of abundance, with important conservation implications (Brack et al., 2018;Karp, 2020;Meek et al., 2014Meek et al., , 2016. We explore this issue by comparing the factors that impact on the probability of detection of koalas (Phascolarctos cinereus) observed by experts from the ground to the factors that impact on the probability of detection for the same koala population data using manual and automated analysis of thermal images collected from drones. Koalas are a vulnerable Australian mammal species that are cryptic, widespread, and often occur in inaccessible areas (Beyer et al., 2018;McAlpine et al., 2015). Automated analysis of drone-derived thermal imagery of koalas has been shown to be more accurate than ground-based detections (Corcoran et al., 2019) and to be useful for koala detection across a range of environments . The impact of individual traits and behaviors such as sex and host tree preferences on detection of koalas using this method and how these detectability variables compare to those impacting on ground observation and manual analysis remain unknown (Corcoran et al., 2019;Ellis & Bercovitch, 2011). Gaining an understanding of this will provide insight into how individual differences may bias automated detection of wildlife in drone imagery more broadly and better inform the use of these new technologies so that reliable data on vulnerable populations can be gathered from which to make effective management decisions.

| ME THODS
More detailed description of the survey design, manual, and automated image analysis methods used collect data for model development can be found in Corcoran et al. (2019). In summary, the dataset for this study consisted of 119 observations of individually identified radio-collared koalas surveyed at two sites in Petrie, Queensland.
The north site was approximately 1.1 km by 0.5 km in size, and the south site was approximately 1.0 km by 0.7 km. The radio-collared koalas were surveyed 5 times between May and August 2018 using a All surveys were conducted during predawn hours in temperatures ranging from 8.0-21.7°C to ensure maximum contrast between the body temperature of koalas and their surroundings (Corcoran et al., 2019). On the same day that each of the five drone surveys was conducted, ground surveys were conducted by expert observers, resulting in a total of five ground surveys. All radio-collared koalas present within the study sites at the time of surveying were tracked using conventional radio telemetry and their GPS locations recorded. Due to koalas being free to move in and out of the boundaries of the study sites, this meant that some individual koalas were observed in all or multiple surveys and some in only one survey.
Generally, determining the factors impacting on probability of detection can be challenging because of the difficulty in disentangling availability (whether the individual animal is present in the area and so available to be detected or not) with the capacity to detect the individual once it is available (Brack et al., 2018). Because the koalas in this study are radio-collared, their exact location is available.
This means we can know with certainty that a nondetection occurs because they are present but unable to be seen. Ground observers thus assigned a visibility score that can be considered an independent proxy for ground-based detection probability. This visibility score was assigned by ground observers to each individual koala identified within the site during each survey given on a scale of 0-5, with 0 being completely obscured from view of the ground observer at all angles, and five being completely unobscured from all angles.
No visibility scores of 5 were allocated by the ground observers at any time, reducing this to a 5-point scale from 0 to 4. The identity, sex, and height in tree of individual koalas were recorded as well as the genus, diameter at breast height (DBH), and height of host trees. The height of host trees and height of koalas in host trees was measured using a clinometer. The height of koalas from the top of host trees was then calculated by subtracting the height of koalas from the total height of host trees. Tree genus was divided into categories of Corymbia spp., Eucalyptus spp., and "Other" which included tree species of the Casuarina, Lophostemon, Ulmus, Melaleuca, Acacia, Cinnamomum, and Alphitonia genera.
The drone-derived thermal images were analyzed both manually, by a researcher evaluating the footage frame by frame and recording instances and co-ordinates of possible koalas, and using the automated koala detection algorithm described full in Corcoran et al. (2019). This algorithm pipeline involved a combination of two previously published deep convolution neural networks referred to as Faster R-CNN and YOLO, the results of which were fused so that only objects that were detection above a set threshold of certainty (0.05) were recorded as possible koalas (Corcoran et al., 2019;Redmon & Farhadi, 2016;Ren et al., 2015). The authenticity of possible koalas identified in both manual and automated image analysis was verified by comparing their co-ordinates to those of the radiocollared koalas confirmed to be on site by the telemetry-guided ground observers. It was then recorded whether the radio-collared koalas found to be present within the survey site at the time of ground and drone surveying were successfully detected in manual or automated image analysis (1) or not detected (0).
Using Shapiro-Wilk tests for normality, the height of host trees (W = 0.98, p = .14) and height of koalas above ground (W = 0.98, p = .08) were found to be normally distributed, while the height below the top of the host tree (W = 0.90, p = 1.84 e−06 ) and DBH of host trees (W = 0.82, p = 2.56 e−09 ) were not normally distributed.
Ordinal regression models were therefore developed to investigate the impact of covariates on detectability of individual radio-collared koalas to ground observers, as the response variable was ordered categorical data, the five-point scale visibility rating and the some of the covariates investigate were not normally distributed. These models were constructed using the "MASS" package "polr" function in R (Venables & Ripley, 2002). Because the response variable for manual and automated analysis of drone images was a binary outcome (detected = 1, undetected = 0) and some of the covariates investigated were not normally distributed, generalized linear models (GLMs) with binomial error distributions and logit link functions were developed for probability of detection for these methods using the "lme4" R package (Bates et al., 2015).
Separate models for detectability of koalas by ground researchers, in manual, and in automated analysis of thermal images derived from the drone were developed using the same training dataset, which was comprised of a random 80% split of the total dataset (n = 95). The data were split into separate training and testing datasets in this way to ensure predictions made with the final models were not biased by predicting values for data they had already been trained on. Models of probability of detecting koalas with each method were developed separately in a forward stepwise manner.
First, univariate models were evaluated for each variable listed in Table 1 and ranked by Akaike information criterion (AIC), residual deviance, and p-value. Interaction variables between host tree height and koala height above ground, host tree height and koala height below treetop, and host tree height and host tree diameter at breast height were also investigated as possible covariates. Subsequently, multivariate models with all possible combinations of covariates found to have a significant p-value in univariate model testing were evaluated. A covariate was only retained in the final model if it resulted in a reduction of AIC of greater than or equal to two compared to the model with the next best fit (Bozdogan, 1987;Terletzky & Koons, 2016). McFadden's pseudo R-square values were then calculated for the final models in order to determine the percentage of variation in the response variable that could be explained by the selected model covariates.
The final models were then used to predict the probability of detection for koalas in manual and automated image analysis and which visibility rating they would receive from ground observers.
The classification rate of each was calculated as the proportion of test data correctly classified as either "undetected" or "detected" (manual and automated analysis of drone-derived thermal images) or into the correct visibility score from 0-4. The dataset used to evaluate predictions made based on the final models was comprised of the remaining 20% of randomly split data (n = 24).
To examine the differences in host trees used by males and females, t tests were conducted to compare the difference in mean host tree height and height above ground between sexes, while Mann-Whitney U tests were conducted to compare the difference in mean DBH of host trees and height of koalas below tree tops. TA B L E 1 List of covariates used in models of probability of manual and automatic detection of koalas in RPAS-derived thermal imagery, and visibility rating given to koalas by ground observers The sex of koalas was found to significantly impact the visibility rating assigned to individuals by experienced ground observers (p < .01). Male koalas were more likely to receive higher ratings of 3 or 4 compared to females, which were more likely to receive lower ratings of 1 or 2. Overall, male koalas were found to be 3.817

Unit of measurement
(β koala sex = 1.16 ± 0.41) times more likely to receive a higher visibility rating than females.
Tree genus was also found to significantly affect the visibility of koalas to ground observers, specifically when comparing Corymbia spp. host trees to Eucalyptus spp. host trees (p = .02). Visibility of koalas was similar in both Eucalyptus spp. and trees of "other" genera, with koalas in Eucalyptus trees 3.50 (β Eucalyptus = 1.25 ± 0.52) and 3.31 (β Other = 1.20 ± 0.62) more likely to receive a higher rating than koalas in Corymbia spp.
Height of koalas above ground was also found to have a significant impact on the visibility rating given to individuals by ground observers (p = 1.35 e−05 ). The visibility rating given to koalas was more likely to be lower when koalas were positioned higher above the ground with the likelihood that the visibility score would increase by 1 increasing 0.82 times for every meter increase in height (β koala height above ground = −0.20 ± 0.05). This meant that koalas that were positioned higher above the ground during surveys were more difficult for ground-based observers to detect.
Finally, host tree height significantly impacted on the visibility rating given to koalas by ground observers (p < .01). For every meter increase in host tree height, it was 0.903 times as likely that the visibility score would increase by 1 (β Host tree height = −0.10(±0.04)) resulting in koalas receiving lower ratings in taller trees. Host tree diameter at breast height (p = .36), distance of koalas from the top of host trees (p = .23), and the identity of the observer who assigned the score (p = .28) were all not found to be significant in determining the visibility rating ground observers gave koalas.
The top-performing model of visibility rating for koalas observed by ground observers was a multivariate ordinal regression model with koala height above ground, koala sex, and host tree genus as covariates (Table 2). This model explained 14.95% of variation in visibility score (McFadden's pseudo R-squared = 0.15) and achieved a classification rate of 54.17% for the test dataset.
Tree genus significantly impacted the probability of manually detecting koalas using thermal imagery, specifically when comparing trees of "other" genera to Corymbia spp. trees (p < .01). Koalas in other trees were 0.11 times as likely to be detected than koalas in Corymbia trees (β Other = −2.23 ± 0.79). Host tree height also significantly impacted on probability of manual detection (p = .04). For every meter increase in host tree height, koalas were 1.08 times as likely to be detected (β Host tree height = 0.081(±0.039)); therefore, koalas resting in taller trees during drone surveys were more likely to be detected using manual analysis than koalas in shorter host trees.
Koala sex (p = .48), host tree DBH (p = .79), height of koalas from ground (p = .12), and distance of koalas from tree tops (p = .18) had no significant effect on probability of manual detection of koalas in RPAS-derived thermal imaging.
The top-performing model of probability of manual detection was a univariate ordinal regression with host tree genus as the explanatory variable, as the multivariate model with host tree genus and tree height did not provide a lower AIC ( In contrast to ground-based observations and manual detections using drone-captured thermal imagery, none of the covariates investigated explained a significant amount of the variation in probability of detecting koalas in thermal images derived from drones using automated methods (Table 4).
A summary comparison of the effects of each covariate on probability of detecting koala using ground observation, manual, and automated analysis of drone-acquired thermal imagery can be found in Table 5.

| D ISCUSS I ON
A key finding of this study was that none of the traits of koalas or their host trees had a significant impact on the probability of detection of individuals in automated analysis of drone-derived thermal imagery. Conversely, several of these factors significantly impacted on ground-based detection by experts and even on manual detection of koalas in drone-derived thermal imagery. This is significant in light of recent work that has proposed using drone thermal imagery for the detection of koalas in the absence of automated imagery (Beranek et al., 2020). The results of the current study suggest that attempts to detect koalas without machine learning approaches may yield significant biases. More broadly, this important finding suggests that drone-based automated detection methods have the potential to overcome the biases in detection due to environmental and behavioral factors (Corcoran et al., 2019Hamilton et al., 2020).
When using traditional ground-based counts for koalas, this analysis suggests that female koalas are likely to be under-represented.
A possible explanation for this is the size difference between male and female koalas, as female koalas have been found to have a significantly lower average adult body mass of 6.2(±0.2) kg compared to 7.1(±0.3) kg for males, resulting in females making for smaller targets for ground observers to detect which are more easily obscured by foliage between the animal and the observer (Ellis & Bercovitch, 2011). It was also found that koalas higher in taller trees may have fewer detections, likely because they were further away from the ground observer. This may also explain why koalas in taller trees were more likely to be found in manual analysis of drone imagery as koalas higher above the ground were positioned closer to the overhead drone-mounted camera. Tree species composition of a forest stand was found to impact on detection for both ground observation and manual image analysis methods.
A surprising result was that ground observers found it harder to detect koalas in Corymbia spp. host trees compared to trees of Eucalyptus spp. and other genera, given the morphological similarities between trees of the Corymbia and Eucalyptus genera (Vlasveld et al., 2018); further investigation revealed a higher ratio of female koalas in Corymbia host trees. This may explain a portion of the decreased detectability of animals in these trees as smaller female koalas were found to be more difficult to detect. However, as this While drone data have the potential for false-positive detections, methods for abundance estimation have been developed to account for this . A review by Kellner and Swihart (2014) McKellar et al.,2020). However, only manual analysis of simultaneously collected RGB and thermal imagery from drones has only been conducted thus far, and it remains unknown whether these two data streams could be integrated into automated image analysis methods due to difficulties synchronizing the images collected with RGB and thermal infrared sensors McKellar et al., 2020).
The capture of thermal data from drones in combination with machine learning may have advantages over human observers due to the thresholds and features used by computer vision to identify koala heat signatures being different than the features used by the human eye, allowing for detection even in cases where animals are partially obscured (Lhoest et al., 2015;Longmore et al., 2017;Oishi et al., 2018;Seymour et al., 2017). This is an intriguing possibility considering that in this experiment, human observers were advantaged by knowing not only that koalas were available to be detected but because they tracked radio-collared koalas; observers also had a reasonable idea of where to search in the canopy. Where radio telemetry is not used, we would expect ground-based observers to perform worse than in this experiment. Compounding that, in traditional surveys observer expertise and fatigue can also reduce detections, where this has no impact on detection with machine learning (Lhoest et al., 2015;Longmore et al., 2017;Oishi et al., 2018;Seymour et al., 2017).
While this study has been conducted at a single site with several sampling events, and using a single species, the rapid increase in the number of drone surveys for threatened species using automated detection suggests that it is important to continue to comprehensively evaluate the methodology. All combinations of species and environment will have their own set of factors that will impact on detectability. However, as demonstrated in this study, machine learning algorithms have the capacity to detect cryptic koalas in a complex and challenging environment without being subject to the same observation biases as traditional detections methods. This can provide some reassurance that these methods provide a robust method for wildlife surveys, and we look forward to additional studies that will contribute to this important topic.

ACK N OWLED G M ENTS
We thank Jon Hanger, Bree Wilson, and all members of Endeavour Veterinary Ecology who assisted in designing and conducting ground surveys. This work was enabled by use of the Research Engineering Facility hosted by the Institute for Future Environments at QUT.
Funding for surveys was provided by the Queensland Government.

E.C. was supported by an Australian Government Research Training
Program scholarship.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.