Quantitative estimation in ecology
Estimation plays a major role in ecology. Investigations often require a quantitative vegetation description to monitor status, trends and dynamics. Researchers and practitioners routinely approximate species abundance in a variety of ways, including counts, percentage cover (projected foliar, canopy or basal), density and biomass (see Mueller-Dombois & Ellenberg 1974). Visual cover estimates are common because they are rapid and not labour intensive but are known to vary in quality. In some cases, this variation is tolerable, but in others, it affects the conclusions we can draw from a study. For example, Vittoz et al. (2010) determined that observer variation was such that changes in alpine vegetation would only be detected for abundant species (> 10% cover) or if relative changes were large (> 50% cover).
Measurement error should be disentangled from vegetation variation by determining the source and size of the error (Kennedy & Addison 1987). The size of the error is investigated either by i) evaluating the repeatability of estimates between observers, and assuming that high variability indicates high error (deviation from the true value), or ii) by comparing observer estimates with more precise objective measurements resembling ‘true values’, obtained from, for example, point quadrat sampling. Past studies show cover estimates to vary substantially between observers (Sykes, Horrill & Mountford 1983; Bråkenhielm & Liu 1995; Helm & Mead 2004; Cheal 2008) and also within observers over repeated judgements (Hees & Mead 2000). While the degree of variability is highly inconsistent between studies, average cover estimates tend to reflect 10–20% error (Sykes, Horrill & Mountford 1983; Kennedy & Addison 1987). Furthermore, error is not contained to novices. A one-off cover estimate of the same Triodia field from 16 experienced observers ranged from 20% to 60% in Cheal's (2008) study, even though three observers thought they could reliably discriminate to within 5% cover intervals. Estimation errors are unpredictable and vary across environments and scales (Klimeš 2003), indicating that the source and magnitude of the error depends on vegetation type, sampling area, species characteristics (e.g. morphology, misidentification), total cover, time pressure, assessment scale and recording methods (e.g. Hope-Simpson 1940; Daubenmire 1959; Sykes, Horrill & Mountford 1983).
Management decisions can only be made confidently if methods provide reliable estimates that appropriately reflect uncertainty but most do not. Common methods of abundance estimation can either promote false precision (in point estimates or too-narrowly defined classes) or sacrifice information (in large, inflexible classes). Classification boundaries can be arbitrary and lead to large boundary errors (Helm & Mead 2004) that undermine decisions. For example, boundary errors have implications for threshold-based weed management programmes (Andujar et al. 2010). The use of classes has also been criticised for overestimating cover of rare species (Floyd & Anderson 1987).
In the present study, we used a 4-point interval estimation approach that allows the estimator to quantify their own uncertainty (see 'Materials and methods'). Tested in epidemiology, marine biology and biosecurity, it reduced overconfidence from around 40–50% – typical of interval judgements (e.g. Teigen & Jørgensen 2005) to 5–12% (e.g. Speirs-Bridge et al. 2010). It has not yet been applied in ecology. We propose that using this technique will avoid some of the issues accompanying arbitrary cover interval classes, allowing the observer to reflect different levels of uncertainty associated with different species morphologies, detectability and total cover.
Feedback for improving estimation
Apart from quantifying estimation error, procedures to reduce it are warranted. Previous research shows that feedback is important for learning and generally improves estimation (Kopelman 1986). Expertise is slow to accumulate when systematic feedback is not provided to fieldworkers. However, not all types of feedback are equally effective. Indeed, some approaches to providing feedback may be detrimental, prompting a distinction between two main types: outcome feedback and cognitive feedback (Todd & Hammond 1965; Balzer, Doherty & O'Connor 1989). Outcome feedback simply refers to learning the results or true value (e.g. ‘actual’ species abundance in a quadrat). Cognitive feedback focuses on the relational aspects of the results, such as the relationship between the outcome or truth and the judgement (estimation error) or the features of the task (e.g. trends and variability) (Bolger & Önkal-Atay 2004).
Outcome feedback is common, but tests have shown it to be ineffective in improving probability forecasts (Fischer 1982), as it does not provide the information forecasters need to understand environmental relationships (Brehmer 1980), nor a series of long run outcomes for the forecaster to better calibrate their probability forecasts with relative frequencies of occurrence (Benson & Önkal 1992). Outcome-only feedback is less structured and can be ignored (e.g. Jacoby et al. 1984). Worse, it may even detract from learning under uncertainty (Brehmer 1980), because people's biases prevent them from interpreting results objectively.
Cognitive feedback, on the other hand, has seen much more success (Balzer, Doherty & O'Connor 1989; Newell et al. 2009), perhaps because engaging with the task can accelerate learning, as demonstrated in the education research discipline. We test a specific form of cognitive feedback called calibration feedback (e.g. Lichtenstein & Fischhoff 1980). It involves comparing a person's overall correct answers–known as percentage ‘hits’–with their confidence levels. If a person is 80% confident in their judgments and they answer correctly 80% of the time, they are well calibrated. If they answer less than 80% correct, they are overconfident. Note that in the case of interval judgements (which is what we use in this study), correct answers–or hits–are measured as those where the interval captures the truth.
Researchers assert that ‘calibration feedback appears to be a promising means of improving the performance of probability forecasters’ (Benson & Önkal 1992, p. 560). While much of the literature introduced earlier has been examined in forecasting and general knowledge tasks, we believe the benefits of calibration feedback would also translate to quantitative estimates in the field.
We are not aware of any research that has experimentally tested cognitive feedback in ecology and environmental science; particularly calibration feedback. On-ground training in vegetation condition assessment protocols such as the ‘Habitat Hectares’ approach (Parkes, Newell & Cheal 2003) is routinely conducted in government agencies, but training benefits are not tested. Studies suggest that experience and training can reduce error within individuals (Smith 1944; Kennedy & Addison 1987; Cropper 2009), but general field experience does not necessarily correlate with performance (Gorrod & Keith 2009) or consistency of observer biases (Sykes, Horrill & Mountford 1983). The high observer repeatability reported by Symstad, Wienk & Thorstenson (2008) was considered a product of rigorous training, although this claim was not specifically tested within the study.
Learning from the crowd
One reason why feedback is rare in reality lies in the difficulty of obtaining ‘true values’ (such as percentage cover) that can be used to learn from. As feedback about the truth is essential for building expertise, it would be useful to know whether the type of feedback that we usually have access to (other people's estimates of the same thing) functions in the same way. Fortunately, we know that the group average of multiple judgements tends to be very close to the truth, because random and systematic errors of individuals tend to cancel each other out. This statistical sampling phenomenon is remarkably robust. On examining 800 estimates of the weight of a fat ox at a country fair in England, Francis Galton (1907) marvelled that the median (and mean) was within 1% of the true value, outperforming most participants and even the best cattle experts in the crowd, a phenomenon known as the ‘Wisdom of Crowds’ (Surowiecki 2005).
Fortunately, we do not require 800 people at a country fair to see an improvement in judgement. The average judgement from two people is better than one (Soll & Larrick 2009), and even the average of two judgements from a single person tends to be closer to the truth over the long run than adopting a single estimate (Herzog & Hertwig 2009). Sykes, Horrill & Mountford (1983) found mean cover values from ten observers to correspond closely with measured point quadrat values in 4-m2 quadrats. By extension, we suggest that the group average could be substituted for the true value in feedback to improve cover estimates.
In experiment 1, we test whether feedback using different information (true values or group average estimates from participants) similarly improves estimation performance. Presumably the effectiveness of group average feedback to improve accuracy depends on how close other people's estimates are to the truth. However, feedback about inaccurate group estimates might still improve judgements if it illustrates variability and prompts the participants to adjust their interval widths and so reduce overconfidence. In experiment 2, we wished to identify the components of feedback that are critical for improving individual estimation, to explore how to best structure a feedback session when we do not have true values. We compared two formats for feeding back information about other people's judgements to participants, and hypothesise that participants will respond most positively if they actively evaluate their own performance (hereafter active feedback, based on calibration feedback), rather than simply observing other people's estimates (hereafter passive feedback, analogous to outcome feedback).