Structure from motion photogrammetry in ecology: Does the choice of software matter?

Abstract Image‐based modeling, and more precisely, Structure from Motion (SfM) and Multi‐View Stereo (MVS), is emerging as a flexible, self‐service, remote sensing tool for generating fine‐grained digital surface models (DSMs) in the Earth sciences and ecology. However, drone‐based SfM + MVS applications have developed at a rapid pace over the past decade and there are now many software options available for data processing. Consequently, understanding of reproducibility issues caused by variations in software choice and their influence on data quality is relatively poorly understood. This understanding is crucial for the development of SfM + MVS if it is to fulfill a role as a new quantitative remote sensing tool to inform management frameworks and species conservation schemes. To address this knowledge gap, a lightweight multirotor drone carrying a Ricoh GR II consumer‐grade camera was used to capture replicate, centimeter‐resolution image datasets of a temperate, intensively managed grassland ecosystem. These data allowed the exploration of method reproducibility and the impact of SfM + MVS software choice on derived vegetation canopy height measurement accuracy. The quality of DSM height measurements derived from four different, yet widely used SfM‐MVS software—Photoscan, Pix4D, 3DFlow Zephyr, and MICMAC, was compared with in situ data captured on the same day as image capture. We used both traditional agronomic techniques for measuring sward height, and a high accuracy and precision differential GPS survey to generate independent measurements of the underlying ground surface elevation. Using the same replicate image dataset (n = 3) as input, we demonstrate that there are 1.7, 2.0, and 2.5 cm differences in RMSE (excluding one outlier) between the outputs from different SfM + MVS software using High, Medium, and Low quality settings, respectively. Furthermore, we show that there can be a significant difference, although of small overall magnitude between replicate image datasets (n = 3) processed using the same SfM + MVS software, following the same workflow, with a variance in RMSE of up to 1.3, 1.5, and 2.7 cm (excluding one outlier) for “High,” “Medium,” and “Low” quality settings, respectively. We conclude that SfM + MVS software choice does matter, although the differences between products processed using “High” and “Medium” quality settings are of small overall magnitude.

the past decade and there are now many software options available for data processing. Consequently, understanding of reproducibility issues caused by variations in software choice and their influence on data quality is relatively poorly understood.
This understanding is crucial for the development of SfM + MVS if it is to fulfill a role as a new quantitative remote sensing tool to inform management frameworks and species conservation schemes. To address this knowledge gap, a lightweight multirotor drone carrying a Ricoh GR II consumer-grade camera was used to capture replicate, centimeter-resolution image datasets of a temperate, intensively managed grassland ecosystem. These data allowed the exploration of method reproducibility and the impact of SfM + MVS software choice on derived vegetation canopy height measurement accuracy. The quality of DSM height measurements derived from four different, yet widely used SfM-MVS software-Photoscan, Pix4D, 3DFlow Zephyr, and MICMAC, was compared with in situ data captured on the same day as image capture. We used both traditional agronomic techniques for measuring sward height, and a high accuracy and precision differential GPS survey to generate independent measurements of the underlying ground surface elevation. Using the same replicate image dataset (n = 3) as input, we demonstrate that there are 1.7, 2.0, and 2.5 cm differences in RMSE (excluding one outlier) between the outputs from different SfM + MVS software using High, Medium, and Low quality settings, respectively. Furthermore, we show that there can be a significant difference, although of small overall magnitude between replicate image datasets (n = 3) processed using the same SfM + MVS software, following the same workflow, with a variance in RMSE of up to 1.3, 1.5, and 2.7 cm (excluding one outlier) for "High," "Medium," and "Low" quality settings, respectively. We conclude that SfM + MVS software choice does matter,

| INTRODUC TI ON
There is a pressing need within ecology for spatial data that can deliver information about ecosystem functional traits and their dynamics through time. Due to the rapid and at times complex nature of ecosystem dynamics, it is critical to have access to agile, effective, and reproducible methods for capturing key habitat or species traits such as canopy structure. Such data can allow differentiation between early trends and short-term fluctuations and can also be used for identifying and establishing conservation sites with specific protected features (Fourcade & Öckinger, 2017). An example habitat requiring such information is high-value temperate grasslands, which are threatened by agricultural intensification (Fritch, Sheridan, Finn, McCormack, & Ó hUallacháin, 2017;Ridding, Redhead, & Pywell, 2015) and climate change (Ibáñez et al., 2013;McCauley, Ribic, Pomara, & Zuckerberg, 2017). Remote sensing techniques have proven their worth in delivering spatio-temporal data for evaluating ecosystem dynamics across a range of ecosystems (Dalponte, Frizzera, & Gianelle, 2018;Lesak et al., 2011;Luoto, Toivonen, & Heikkinen, 2002;Mori, Tatsumi, & Gustafsson, 2017;Phinn, Menges, Hill, & Stanford, 2000), but in grassland systems there are methodological challenges. Airborne LiDAR-derived data products potentially provide the best opportunity for gathering fine-grained measurements describing grassland vegetation structure (Müller et al., 2018), but laser penetration through the canopy can be inconsistent and factors including vegetation canopy density can bias results (Luscombe et al., 2015). Hence, it is not straight forward to determine whether the signals originate from the canopy and soil surface, or if the signal represents something in between (Bretar & Chehata, 2007;Yang, Ni-Meister, & Lee, 2010). Consequently, new techniques are needed for delivering operational, cost-effective measurements describing the spatial distribution of fine-grained canopy structure in such ecosystems (Forsmoo, Anderson, Macleod, Wilkinson, & Brazier, 2018).
The emergence of SfM + MVS-based data analysis approaches has been complemented in recent times by an upsurge in drone-based environmental monitoring (Anderson & Gaston, 2013). The two approaches combined offer a means of executing a workflow for low cost and frequent capture of fine-grained data to generate surface structural models, including digital surface models (DSMs) from which vegetation height metrics may be obtained (Dandois, Olano, & Ellis, 2015;Forsmoo et al., 2018).
The quality of drone and SfM + MVS-based models depends on a range of factors including type of camera used and flying speed and altitude, with work by O'Connor,  showing how varying camera settings can impact SfM + MVS-based data products. There are also issues of methodological-based uncertainty to consider, for example the impact of lighting conditions and image overlap on resultant model quality James, Robson, & Smith, 2017). Additionally, there are now a great number of commercial or free and/or open-source SfM + MVS software options that are available for researchers and stakeholders to use. Table 1 summarizes those softwares that are available, but restricts the list to include only those with GPS-based capabilities, since these can be used to generate spatially meaningful mapping products. From a user's perspective, it is difficult to evaluate which of these software options is optimal, because there is a lack of comparative work that evaluates the products against a consistent baseline. This is particularly true with respect to proprietary SfM + MVS-based software, where there is little to no information on the algorithms used (Smith, Carrivick, & Quincey, 2016;Verhoeven et al., 2015). Indeed, Fraser and Congalton (2018) call for more analysis on SfM + MVS-based approaches. Hence, there is a need to quantify the influence of software on data quality, and yet to our knowledge, there have been no statistically robust investigations of this type. This makes it challenging to attribute differences in results to variations in the SfM + MVS-based method (e.g., software used). This problem limits the quantitative understanding of change in ecosystems surveyed using an SfM + MVSbased workflow, which is what this paper sets out to test.
The experiment described in this manuscript sought to determine the influence of SfM + MVS-based software used to process aerial photographs captured from a low-flying multirotor drone, over a low sward, intensively managed grassland system. The experiment quantifies the extent to which derived sward height measurements can be replicated and thus facilitates the adoption of SfM + MVS-based workflows for land management frameworks and conservation schemes. We explored and evaluated this problem by quantifying the influence of the choice of SfM + MVS software and replicate image acquisition workflows. Specifically, the following hypotheses were tested: although the differences between products processed using "High" and "Medium" quality settings are of small overall magnitude.

K E Y W O R D S
drone, elevation model, photogrammetry, reproducibility, structure from motion and multiview stereo, sward height 1. Three independently captured, replicate image datasets taken over the same field, but from different drone flights (where the drone followed the same preprogrammed flightplan), and processed using the same SfM + MVS workflow can produce significantly different digital surface models (DSMs).

| Study area
The study area was a single agricultural field (8,059 m 2 ) located on a grazed, organic dairy farm in Cornwall, southwest England (50°12′09.5″N 5°09′28.4″W, 90 m above mean sea level) with a surface cover of Lolium perenne (perennial ryegrass) and Trifolium pratense (red clover). The site included a 25 × 20 m patch of set-aside, unmanaged grassland. The site was chosen because there is a need to understand short sward ecosystems where it is difficult to derive high quality DSMs (Forsmoo et al., 2018;Zahawi et al., 2015). The site was gently sloping with a maximum elevation of 90.8 m (HAMSL) and minimum elevation of 86.8 m (HAMSL).

| In situ sward height and topographic validation data
In situ data were collected using a centimeter precision and accuracy differential GPS (DGPS; a Leica GS08plus base and rover GNSS system). Over 2 days, and immediately following the drone flight acquisitions, 236 DGPS data points were collected inside the area covered by the SfM + MVS DSM (6,800 m 2 ). The DGPS points were collected across the full spatial extent of the field using a systematic survey pattern, walking along near-linear transects where the direction and sampling frequency were varied according to the perceived degree of topographic heterogeneity. Data points were collected more frequently where the perceived topographic heterogeneity was greater, that is, where breaks in slope occurred. In addition to the DGPS data points, sward height measurements were collected using a drop disk (Stewart, Bourn, & Thomas, 2001;Waring, 1992)

| Drone aerial photography survey
A small multirotor drone (3D Robotics Iris) was used to obtain aerial photographic data of the field on 21 June, 2016 when the grass was in a period of active growth. The (mean) wind speed during the flight was 2 ms −1 . The 3DR Iris was chosen due to its low cost (US$400), good reputation regarding flight stability and low rate of mechanical and electrical failures, lightweight construction (1,020 g take offweight), and ease of use. A multirotor drone was chosen over a fixed wing drone due to the small area covered and to reduce photographic motion blur. A fixed, prime lens consumer-grade digital camera (Ricoh GR II) was used to capture the images, and a Pixhawk autopilot guided the drone along a waypointed route (see Figure 1a-c). A more detailed description of the camera settings is outlined in Forsmoo et al. (2018).
Mission Planner (ver. 1.3.38) software was used to prepare the flight. A cross-stitch lawnmower flight pattern was chosen ( Figure 1c), with 70%/70% side/forward overlap in each of the two directions of the grid. Fourteen georeferenced high contrast markers were dispersed throughout the study area using a cluster of ten in the center of the scene and four in two of the opposite edges of the scene, following recommendations by Cunliffe, Brazier, and Anderson (2016). The georeferenced markers were used to convert the SfM + MVS generated DSMs from a relative coordinate system to British National Grid (BNG36)-these markers were surveyed in terms of their x,y,z position using the DGPS. Flying at a height of 50 m, the drone produced image data with a ground sampling distance (GSD) of between 0.52 and 0.60 cm. The survey was repeated three times using exactly the same parameters and
MICMAC was significantly more difficult to learn-and took the lead author of this paper approximately 30 days, though the exact F I G U R E 2 Workflow outline. A typical SfM + MVS workflow, the workflow utilized in this study, is outlined. The major steps in terms of computational cost or labor intensity are as follows: (I) aerial images are collected using a consumer-grade drone along waypointed route, (V) generate a DSM in an absolute coordinate system (e.g., BNG36), (VI) utilize the SfM + MVS DSM and in situ collected DTM data points to calculate the sward canopy height In terms of computational cost, three different processing workflows ("High," "Medium," and "Low") were identified for each software (n = 4). These settings were used for each replicate image dataset (n = 3) to explore how accuracy depends on theoretical grade of desktop workstation or server the user has access to (see Table 3).

| DSM generation
Sward height validation points located in edges with poor image overlap (n < 3) and/or which were not covered by either of the dense

| Comparison of SfM photogrammetric outputs with ground validation data
To quantify the quality of the DSM generated using an SfM + MVS workflow, the SfM + MVS model was compared to sward height ground validation data. The elevation was extracted at the locations where the DGPS (soil surface elevation and sward height) was To test for significant difference between results, a two-sided, paired t test was used with an alpha value of 0.05. This was carried out using MATLAB 2016b. More specifically, the following were tested for significance: 1. Is there a significant difference between results from different software (n = 4) when using the same image dataset and the same ground control points?
2. Is there a significant difference between replicate image datasets (n = 3) processed using the same software and workflow?
3. Is there a significant difference between the combined results (software n = 4) for replicate image datasets (n = 3)?

| Change detection with M3C2
The To understand the rationale for using M3C2, one must understand how it works. In short, M3C2 consists of two steps: First, for each point cloud a plane is fitted to the points within the radius D/2 of point i, which enables the calculation of a normal vector.
Secondly, the normal vector is used to calculate the distance be-

| Overview of field site and drone survey
Over 90% of the field site was covered by a high degree of image overlap with at least three images per point, but with a central area of interest coinciding with the field validation points where overlap was consistently very high (see Figure 1). The remaining ~10% where image overlap was <3 images per point was excluded from the analysis. In situ measurements on the day of the drone flight showed that the mean canopy height was 11.5 cm (min: 4.9 cm, max: 48.4 cm; Figure 3).

| Reproducibility with computational cost
To understand the robustness of the software better, the significant differences between the resulting dense point clouds for each of the three replicate image datasets were computed using the M3C2 method (Lague et al., 2013). This was carried out for each software (n = 4) using CloudCompare (ver. 2.9.1; see Figures 4, S1 and S2, Appendix S1).

| Replicate image datasets
A boxplot of the RMSE for Pix4D, Photoscan, 3DFlow, and MICMAC for each of the three image datasets with "High" quality settings is shown in Figure 5. The median RMSE of the SfM + MVS-derived sward height is consistently reduced when using higher quality settings when compared to sward height validation data (n = 228; see Figures 5 and S3, Appendix S1).
To determine if there is a significant difference, overall, in derived height measurements between replicate image datasets, a paired t test was used. It was found that there was a statistically significant difference between the SfM + MVS-derived DSMs produced between each of the three replicate image datasets (first-second, first-third, and second-third), for each of the three quality settings ("High," "Medium," and "Low"; see Table 4).

| Reproducibility across software
To understand the robustness of SfM-MVS-based workflows better, the significant differences between the resulting dense point clouds were computed using the M3C2 method (Lague et al., 2013). This was carried out between each of the software (n = 4) and the second replicate image dataset using CloudCompare (ver. 2.9.1; see Figures 6, S3 and S4, Appendix S1).

| Key statistics
The number of points per unit area is not necessarily a robust indica- image residual below half a pixel, and a GCP residual below 2 cm, though the requirements differ between use cases. Table 5 allows comparison between software and, in particular, elucidates the identification of absolute and relative difference between replicate image datasets. This is for the "High" quality settings.

| Replicated independent image datasets and different SfM software produce significantly different DSMs
Sward height measurements derived from an SfM + MVS workflow were compared to in situ validation sward height measurements (see Figure 6). The SfM + MVS-derived measurements are compared in terms of RMSE and R 2 . The RMSE ranged from 3.4 cm to 5.7 cm for MICMAC and 3DFlow, respectively, seen over the three replicate image datasets. The correlation coefficient (R 2 ) was F I G U R E 4 Spatial distribution of significant changes between replicate image datasets (n = 3) for four software (Photoscan, 3DFlow, Pix4D, and MICMAC) at "High" quality settings, respectively. *(ns = not significant, s = significant) calculated as the correlation between validation sward height and the sward height measured using the proposed SfM + MVS workflow. Using a paired t test, it was found that there was a statistically significant difference between the model with lowest RMSE and the model with the highest RMSE for the first, second, and third replicate image datasets, respectively, using "High" quality settings. While improvements are significant in statistical terms, the differences, given the magnitude, are minimally important in practice. The replicate image datasets are in order-1 to 3, from left to right (see Figure 7).

| Is there an important difference in financial cost between software?
To allow users to quantify software differences in terms of financial cost, customizability, and ease of use, a simple matrix was developed.
The first step (see Table 6) quantifies the different software in terms of (a) customizability, (b) financial cost, (c) CPU time, (d) ease of use, and (e) range of data products ranked between 1 and 4 (the higher the better. In case of tie, the same rank is given). Customizability refers to the extent a user can modify the core settings of the software and/or the type of analysis carried out. For example, in Photoscan and Pix4D a user is restricted to a limited number of key parameters (number of tie points, number of key points etc.), whereas in 3DFlow and MICMAC, a user can often adjust more than 20 different parameters at each step in the processing pipeline. MICMAC gets the higher rank, though, for its flexible processing pipeline, where different modules can be combined in several different ways depending on the user's needs. Also, worth pointing out that MICMAC gets a rank of 2 in ease of use/support for the fact that since this study was started, articles such as Rupnik et al. (2017) have been published, which simplifies the learning process.
By dividing the score for each software (n = 4) for each category (n = 5) by the total score for each category, each score can be normalized (see Table 7).
With each score normalized, the user can rank the five different categories in terms of their relative importance. The normalized value is multiplied with the user-defined rank which can be adjusted depending on the project (the example values chosen below are for the study detailed herein). The score for each software and category can then be added together. Table 8 outlines an example.

| H1. (1) Replicated independent image datasets can produce significantly different DSMs
We tested whether replicated, proximal image datasets processed using the same workflow produced statistically different topographic models. In order to test this, we collected three replicate image datasets and analyzed them using three different quality settings ("High," "Medium," and "Low"). As can be seen in Tables 4   and 5 and Figures 6 and 7  Note: DSM height measurements from each software (n = 4) were combined, which was then compared between the three replicate image datasets (first-second, first-third, and second-third).
TA B L E 4 Using a paired t test, differences between the SfM + MVSderived DSMs produced using replicate image datasets were tested for significance S5-S7, Appendix S1), we demonstrated that the above hypothesis has been statistically proven. That is, there is a statistically significant (p < 0.05) difference between each of the three replicate image datasets processed using the same workflow, including SfM + MVS software, with "High," "Medium," and "Low" settings, respectively (see Table 4). This result is something that all researchers should consider for their particular application, as the true difference could be larger in more heterogeneous systems, with a greater range of vegetation cover and more variable canopy height, for example. Reproducibility of a method is key to be able to attribute detected changes to actual changes within the system of concern, and not artificial differences over time introduced by the methodological approach. To address the variance between replicate image datasets processed using an SfM + MVS workflow, we suggest to incorporate replicate image datasets in an SfM + MVS workflow. This is something that has already been outlined as an important consideration by Dandois et al. (2015) who collected five replicate image datasets and used the average of the replicate image datasets for further analysis. However, most studies to date ignore and do not acknowledge reproducibility limitations of an SfM + MVS workflow. As such, the implications of findings of many studies (Hugenholtz et al., 2013;Mancini et al., 2013;Obanawa & Hayakawa, 2015;Ouédraogo et al., 2014;Tonkin, Midgley, Graham, & Labadz, 2014;Wang et al., 2014) are limited as the conclusions are based on a single SfM + MVS model.
Further work needs to be carried out to find the optimal number of replicate image datasets to describe potential variance and to find a compromise between reproducibility and computational cost.

| M3C2 analysis
The M3C2 analysis suggests two things: (a) that there are (systematic) patterns in the data and (b) that there are relatively few points/areas that are statistically similar across replicate image datasets. While part of this probably can be attributed to vegetation-as the algorithm was developed for scenes with bare soil, it is important to point out that potentially adverse effects associated with vegetation can be minimized with the appropriate choice of constants (Lague et al., 2013).
F I G U R E 6 Spatial distribution of significant changes between software (n = 4) for one replicate image dataset (#2) and "High" quality settings, respectively TA B L E 5 Overview of three variables of interest: (i) point cloud # points, (ii) image residual, and (iii) GCP residual for each software (n = 4) and replicate image dataset (n = 3) using "High" quality settings

Dataset ONE
Additionally, this is a cloud-to-cloud comparison in an environment that is known to have undergone no physical change in between data collections. Hence, even though the vegetation complicates the analysis, it can in this case be treated as a fixed, albeit complex surface, with fine-grain topographic patterns. Therefore, we would argue there is still validity to the patterns apparent in the M3C2 analysis.
Systematic patterns in the accuracy analysis of a SfM-MVS-derived DSM can be due to vegetation patterns, ground control point distribution, and/or the camera lens calibration model. The predominantly circular patterns present in the data presented in this study do not conform with either the vegetation pattern or the location and distribution of ground control points. Hence, it is likely that the patterns highlighted in Figure 4 (see also Figures S1 and S2, Appendix S1) are due to insufficiencies in the (internal) camera lens calibration model (James & Robson, 2014). This hypothesis is further supported by the fact that systematic patterns are largely software dependent. Hence, as each software uses a different lens calibration model, it may depict the influence of the camera calibration process. A "poor" camera lens calibration model can be improved by including oblique image data as a complement to the nadir image data (James & Robson, 2014) and/or by calibrating the camera lens distortion model using a separate (high quality) image dataset with convergent viewing angles of a textured 3D object.
In order to address the above issue, a fixed camera mount was used in this study, and this provides a greater range of camera view-  Note: The value given is, where possible, based on actual data such as CPU time in minutes and acquisition cost of software (as of 08/2018).
TA B L E 6 Each software has been given a value between 1 and 4 for each of the five categories deemed to be of importance TA B L E 7 The score for each software (n = 4) for each category (n = 5) is divided by the total score for each category  James, Robson, d'Oleire-Oltmanns, et al., 2017). The influence of wind speed and light conditions was studied in Dandois et al. (2015), and both were found not to exert an important influence on the quality of the SfM + MVS-derived DSM. Having said that, light conditions influence the image contrast (increased contrast with direct lighting) and shadows-which influence the identification of keypoints in images (Lowe, 2004). However, in this study the replicate image datasets were collected within the time span of an hour, with very similar weather conditions (2-3 m/s mean southerly wind speed, 16.8-17.9°C, cloud cover ~ 30%), so we are confident that the light, temperature, and wind conditions were similar and are thus assumed to have an insignificant effect on the results.
Yet it is possible that the light wind blowing at the time of the flight would have caused movement in the blades of grass but this is the only expected change between the three flights. Flying height has been discussed and our choice to fly at 50 m was determined to be the optimal compromise between area coverage and data quality Mesas-Carrascosa et al., 2016).
The robustness of the software is another potential explanation for the observed variance between the replicate image datasets. Given the difference in variance in RMSE for the replicate image datasets between the software (see Figures 7, S6 and S7, Appendix S1), we argue that it is likely that an important part of the variance is due to the robustness 2 of the SfM + MVS software. This warrants further studies exploring the aspect of robustness-or sensitivity, of the SfM + MVS software, including how the quality of information derived from the software depends on a combination of methodological workflow Verhoeven, 2017) and the attributes (e.g., vegetation, buildings, homogeneity of textures) in and of the surveyed scene (Furukawa & Hernández, 2015;Mancini et al., 2013;Remondino, Pizzo, Kersten, & Troisi, 2012;Ryan et al., 2015;Turner et al., 2012).

| H2. (2) Vertical and horizontal error varies significantly between different SfM + MVS software
We accept this hypothesis demonstrating that the choice of software is an important consideration which may determine the quality of the DSM (see Figures 5, 7, S3, S4, S6 and S7, and Appendix S1).
There is a statistically significant (p < 0.05) difference between the software with the lowest and highest RMSE compared to in situ validation data, respectively, for each of the replicate image datasets (n = 3) and choice of quality settings (n = 3).
However, the differences might not be of practical significance.
While centimeter differences are often important for change monitoring (Forsmoo et al., 2018;Lucieer et al., 2012) and when modeling processes such as surface runoff based on topographic variability (Mügler et al., 2011;Thompson, Katul, & Porporato, 2010), where small differences can lead to important cumulative biases (Liu et al., 2019;Lucieer et al., 2014), it is important to acknowledge that for some, if not many, purposes measurement uncertainties at the centimeter magnitude are neglectable. In fact, we would argue that these fine-grain uncertainties highlight exactly why a user would choose drones over aerial or satellite imagery for change detection. However, drone and SfM + MVS-based data can give a false sense of security due to its ease of application and visual appeal, and software factors may become more important than RMSE differences at the centimeter magnitude. It is indeed also important to acknowledge that the analysis presented herein is from a relatively small and homogenous field site, and a larger and more complex image dataset would likely influence the findings (Colomina & Molina, 2014;Remondino et al., 2012).

| H3. (3) The vertical error in SfM + MVSderived DSMs decrease with computational cost
We demonstrate (Figures 7, S6 and S7, Appendix S1) that the vertical error, on average, decreases with computational cost.
The RMSE of the SfM + MVS-derived DSM for the three replicate image datasets processed using "High" settings is, on average-seen across the software, lower when compared to when processed with "Medium" and "Low" settings, respectively (see Figures 7, S6 and S7, Appendix S1). Therefore, we can confirm that this (3) hypothesis is true. Figure 4 and Table 5 (and Figures S1,   S2 and Tables S1, S2, Appendix S1) suggest that changes to the settings affect software differently. While there is a trend toward increasing image residuals (pixels) with decreasing computational cost, Pix4D rather shows dataset-specific effects that are exacerbated with decreased computational cost (see Table 5 and Tables S1, S2, Appendix S1).
TA B L E 8 The normalized score for each category is multiplied by a user-defined rank which is based on the five different categories relative importance While there were important differences between the software, both in terms of processing time and ease of learning (see Tables   2, 6-8)-each software has its own advantages and disadvantages.
Hence, the recommended software depends on the type and requirements of the application/project in question and the relevant expertise of the user. For example, while a Pix4D license comes at a relatively high financial cost it offers straightforward and seamless integration with a range of camera types, such as the multispectral camera Sequoia and the thermal cameras Zenmuse XT and Flir VUE Pro. MICMAC on the other hand lacks the support framework of proprietary solutions, but is open source and handles large datasets well. This allows data the size of which users would normally encounter (500-2,000 images) to be processed using the highest settings on an average-specification ("consumer-grade") desktop/ workstation. Though, whether there is a significant difference in terms of cost between SfM + MVS software solutions largely depends on the project. Having said that, we show that the difference in quantified financial value between software (the higher the better) can differ by a factor close to two (see Table 8). Hence, it is clear that there can be significant differences between software, though in many use cases the difference will be neglectable.

| Implications of findings
We argue that confidence in the fine-grained resolution of drone and SfM + MVS-based outputs in vegetated areas has been undermined both by lack of ground validation data captured at similar grain size, and diversity in workflows. Indeed, this study builds on the work of Fraser and Congalton (2018) and highlights the need to develop standardized workflows within drone and SfM + MVS-based research and development. The results detailed herein represent an important step toward enabling the establishment of widespread confidence in the longevity of drone and SfM + MVS-based workflows for biotic resource management. Standardized workflows should make it possible to attribute and report differences in results between studies to variations in the methodological approach or the system studied and therefore should include factors such as number of replicate image datasets, weather conditions, camera type and settings, flying altitude, and software and settings used. This is necessary as we demonstrate that there are statistically significant differences between replicate image datasets, an effect previously largely overlooked. Centimeter-level variance in RMSE using replicate image datasets captured within the time span of one hour, under very similar conditions, processed using the same workflow limits the confidence of drone-based SfM + MVS as a simple tool to measure ultra-fine-grained changes over time when relying on a single image dataset.

| CON CLUS ION
The