How you count counts: the importance of methods research in applied ecology

Authors


*Correspondence author. E-mail: chris.elphick@uconn.edu

Summary

  • 1Methods papers play a crucial role in advancing applied ecology. Counting organisms, in particular, has a rich history of methods development with many key advances both in field sampling and the treatment of resulting data.
  • 2Most counts, however, have associated errors due to portions of the population of interest being unavailable for detection (e.g. target population not fully sampled; individuals present but not detectable), detection mistakes (e.g. detectable individuals missed; non-existent individuals recorded), or erroneous counts (e.g. large groups miscounted; individuals misidentified).
  • 3Developments in field methods focus on reducing biases in the actual counts. Simultaneously, statisticians have developed many methods for improving inference by quantifying and correcting for biases retrospectively. Prominent examples of methods used to account for detection errors include distance sampling and multiple-observer methods.
  • 4Simulations, in which population characteristics are set by the investigator, provide an efficient means of testing methods. With good estimates of sampling biases, computer simulations can be used to evaluate how much a given counting problem affects estimates of parameters such as population size and decline, thereby allowing applied ecologists to test the efficacy of sampling designs. Combined with cost estimates for each field method, such models would allow the cost-effectiveness of alternative protocols to be assessed.
  • 5Synthesis and applications. Major advances are likely to come from research that looks for systematic patterns, across studies, in the effects of different types of bias and assumption violation on the ecological conclusions drawn. Specifically, determining how often, and under what circumstances, errors contribute to poor management and policy would greatly enhance future application of ecological knowledge.

Introduction

Much of applied ecology involves counting organisms. Population size estimates are central to topics ranging from conservation biology to game management to pest control. Estimating the number of species is also common in applied studies (e.g. O'Dea, Whittaker & Ugland 2006). Counting, however, can be difficult. Some organisms move, many are inconspicuous, and others actively avoid the counter, all of which complicate detection and accurate counting. Consequently, we are rarely in a position to obtain absolute counts and must resort to sampling the population of interest – and any sampling protocol is subject to potential biases (Greenwood 1996). Even when we think we know where all the individuals we want to count are located, it is hard to be certain that others do not occur elsewhere (cf. Murchison 2007). As a result of these problems (Table 1), a distinct body of research has focused on refining sampling techniques (e.g. Sutherland 1996) and improving methods for interpreting the resulting data (e.g. Buckland et al. 2001; Zuur, Ieno & Smith 2007).

Table 1.  Classification of problems associated with population sampling, with examples from this Special Profile that touch on each topic
Type of problemExamples
Availability compromised
 Sampled locations do not include entire population of interestNewson et al. 2008
 Individuals present but not available for detectionBrook et al. 2008
MacSwiney et al. 2008
Detection compromised
 Individuals detectable, but missedAll papers
 Individuals imaginedAlldredge et al. 2008
Counting compromised
 Individuals detected, but counts have errors (double-counting, groups miscounted, recording errors, etc.)Alldredge et al. 2008
Jenkins & Manly 2008
 Individuals detected, but misidentifiedMacSwiney et al. 2008

Methods research, therefore, is fundamental to all that we do, and without it, progress would be hampered no end. For instance, hundreds of thousands, perhaps millions, of point counts – a method commonly used to survey birds (Bibby, Burgess & Hill 2000) – are conducted annually for numerous purposes (Bart 2005; Alldredge et al. 2008). Understanding what counts really tell us about populations, therefore, is clearly critical information.

Methods research usually addresses either the improvement of data-gathering so as to reduce biases or increase precision, or the development of statistical methods that can account for biases or uncertainty in the collected data. The major challenge for field ecologists is to ensure that their data meet the assumptions of the statistical analyses used. As statistical methods become more sophisticated, understanding and testing these assumptions is especially important. Simultaneously, when developing analytical methods, statistical ecologists must grapple with the logistical constraints that field workers face. Statisticians also need to explain analytical methods sufficiently well so that field workers understand the assumptions their data must meet and the limits of the inferences they can make. Balancing these concerns requires constant collaboration and communication between ecologists who really know their study organisms, and those who really understand the quantitative techniques.

For some time, the Journal of Applied Ecology has highlighted papers that offer broad methodological insights with a goal of encouraging better ‘communication and development of methods in applied ecology’ (Ormerod et al. 2003). Among the first such papers were those that laid out the design for the most comprehensive tests to date of the ecological effects of genetically modified crops (Firbank et al. 2003; Perry et al. 2003). Subsequent papers have addressed diverse issues, ranging from new sampling technologies (Parker, Harding & Berger 2004), to the consequences of sampling scale for ecological interpretations (Hill & Hamer 2004), to the unanticipated consequences of marking techniques (McCarthy & Parris 2004), among many others.

In this Special Profile, the journal brings together six papers that address issues associated with counting organisms or species. The papers cover various problems, involve disparate taxa, and range from the deeply theoretical to the blatantly empirical. Yet, all are united by the broad problem of needing to get better counts in order to address key questions that applied ecologists face daily.

Improving counts

Improving methods can take many forms. The adoption of new technology often has the greatest immediate impact. For instance, development of radio telemetry and autonomous data loggers that can be attached to animals has revolutionized understanding in several areas of population biology (Millspaugh & Marzluff 2001; Ropert-Coudert & Wilson 2005). The introduction of completely novel approaches is rare, especially for a well-developed topic such as counting organisms, but the application of existing methods to new circumstances can be just as important. For example, researchers have adopted algorithms developed to study the night sky for use in mark–recapture analyses (Arzoumanian, Holmberg & Norman 2005), used molecular methods to estimate population size (Frantz et al. 2004) or improve species detection (Gariepy et al. 2008), and have used image analysis to count organisms (Hooper et al. 2006).

Even minor refinements of methods can improve counts substantially. For example, Brook et al. (2008) present experiments designed to increase the efficiency of suction sampling, whereby invertebrates are vacuumed from the vegetation to estimate their abundance and community composition. By comparing sampling options, the authors determined how long each bout of vacuuming should last in order to collect the bulk of the individuals in a patch of grass, how many subsamples are required to detect most of the species in an area, how these numbers differed among different types of invertebrate, and how vegetation height interfered with the sampling technique.

Many studies address the differences between methods in an ad hoc fashion. Those that, like Brook et al.'s, directly compare methods under controlled conditions provide clearer insights into the relative value of each method, and into how they can be combined to complement one another. In another study, MacSwiney et al. (2008) compared conventional capture methods with acoustic sampling to measure the composition of bat communities in southern Mexico and found substantial differences in the species detected by each approach. They found that using bat detectors increased the number of species known for the region by 40%. The acoustic sampling, however, failed to detect many of the species sampled through more traditional capture techniques, and only a fifth of the species were detected by both approaches. Clearly, complete descriptions of Neotropical bat faunas require the use of multiple methods in combination. In contrast, for three of four invertebrate groups, Brook et al. (2008) found little difference in the detected assemblage structure when they compared data obtained using suction sampling vs. turf removal. In this case, it seems that the more destructive turf removal method added little, even though this method provides more complete sampling.

Nichols et al. (2008) took the comparison of alternative methods a step further. As their starting point, these authors pointed out two problems. First, simply analysing data from each sampling method separately is statistically inefficient. Secondly, combining data from different types of sampling conducted in the same area can be problematic when the same individuals might be sampled by more than one of the methods. To solve these problems, they have developed a formal statistical approach for combining data from different sampling methods. One benefit of this approach is that separate detection probabilities can be estimated for each method, allowing direct comparisons of the methods. Striped skunks Mephitis mephitis, for example, are shown to be more easily detected using remote cameras with infrared sensors or with enclosed track plates than with hair removal traps (Nichols et al. 2008). The magnitude of the differences, and identifying which method had the highest detection rate, however, depended on the seasonal conditions under which surveys take place, suggesting that a mixture of methods might still be warranted. In contrast, a second case study found only weak evidence for a difference in detection rates between methods used to sample stream salamanders (Nichols et al. 2008).

Improving inference

Because counts are often unsatisfactory in ways that cannot be resolved in the field, many methods of analysis are designed to improve the inferences that can be made from field data. Imperfect detection has received particular attention, and various methods have been introduced to help researchers estimate what portion of the target population is detected so that appropriate adjustments can be made (Thompson 2002; Simons et al. 2007). One common method, distance sampling, allows researchers to estimate declining detection rates at specified distances from the observer and thus extrapolate true population density from detections (Buckland et al. 2001). Because they are based on estimating detection probabilities associated with different sampling protocols, the methods developed by Nichols et al. (2008) provide similar advantages.

Newson et al. (2008) used distance sampling on a grander scale than most – to improve population estimates for an entire nation's avifauna. Using data from the extraordinarily rigorous UK Breeding Bird Survey, they calculated national population sizes for 92 species. By comparing their estimates to those derived through other means, they also tested several hypotheses about the sources of bias in population estimates. As expected, distance sampling revealed evidence for various biases in earlier estimates. Nonetheless, several positive conclusions arose from the results. First, the earlier results were well correlated with the new ones, suggesting that overall, the old numbers were not misleading. Secondly, the revised population estimates for most species were higher than the previous numbers. There were exceptions, including a few species with populations now thought to be half the size previously estimated, and it is possible that distance sampling systematically causes population overestimates when most detections are based on sound (Alldredge et al. 2008; see below); but in general, this result suggests a somewhat rosier picture than expected. Finally, most biases that were detected had been anticipated, suggesting that ecologists have good intuition about the limitations of their inferences (although, of course, unanticipated biases are less likely to be studied). The most serious errors were related to situations where extrapolations had been made beyond the sampled population, such as when the survey was used to estimate population sizes for species that primarily occur in habitats that were not targeted by the design. This finding reinforces the need for investigators to be clear on what populations their inferences actually refer to. In contrast, potential problems that prior analyses had attempted to address – such as those caused by geography – were not an issue, suggesting that previous corrections had worked.

Like all extrapolation methods, distance sampling comes with clear assumptions that should be met by the data to which it is applied. One assumption is that all objects that are of no distance from the observer are detected (Buckland et al. 2001; but see Laake & Borchers 2004). Sometimes, however, this assumption is not met, for example, when visually surveying marine mammals that spend much of their time underwater (Laake et al. 1997; Skaug & Schweder 1999). In their study of ungulate populations, Jenkins & Manly (2008) showed that detection of deer faecal pellets, especially when in small groups, can be quite low, even when they lie on the survey transect. In a similar test, Bächler & Liechti (2007) found that even when the location of birds is known with certainty, because they have been fitted with radio transmitters, they can be very hard to detect. More worrying than these isolated results for individual species, is that, in a review of 28 papers in which distance sampling was used to estimate population densities, no studies tested the assumption, only one explicitly attempted to maximize detection at distance zero, and more than half did not even mention the issue (Bächler & Liechti 2007).

Another important assumption of distance sampling is that distances from the observer to the observed can be determined accurately. For studies of items that are fixed in space (e.g. tracks, nests, sessile organisms), distance accuracy should not be a problem as distances can be measured directly. For moving organisms and species detected by sound, however, distances are often estimated. Training and the use of range finders can improve estimates (Kepler & Scott 1981; Scott, Ramsey & Kepler 1981), especially when the organism is seen; but even trained observers appear not to be very good at accurately locating organisms detected only by sound (Alldredge, Simons & Pollock 2007a, 2008).

Distance from the observer is just one of several factors that affect detection, and accounting for other factors also influences population estimates (Jenkins & Manly 2008; Newson et al. 2008). Another method for correcting detection errors is to have multiple observers conduct independent surveys simultaneously at the same location (Cook & Jacobson 1979; Graham & Bell 1989). Their respective observations can then be used in a removal (Nichols et al. 2000) or capture–recapture (Alldredge, Pollock & Simons 2006) framework to estimate the rate at which each observer missed individuals that could have been detected. Jenkins & Manly (2008) used double-observer methods to estimate faecal pellet abundance, thereby replacing key assumptions of distance sampling that were violated in their study with assumptions that seemed more reasonable. Because different approaches have different advantages, combining methods – such as the use of double-observer techniques to improve distance sampling when detection at the survey location is imperfect (Laake & Borchers 2004) – may become increasingly useful.

Double-observer methods revealed clear differences among observers in faecal pellet surveys (Jenkins & Manly 2008), and other evidence suggest that observer effects are widespread (e.g. Nichols et al. 2000; Diefenbach, Brauning & Mattice 2003). That these differences have consequences is shown by the annual US Breeding Bird Survey, in which the number of birds detected dips in the year after a new observer takes over a survey route, causing positive bias in trend estimates that must be accounted for (Kendall, Peterjohn & Sauer 1996). Many survey protocols require careful training, and computer simulations have been developed to help researchers practice their skills in preparation for field work (e.g. Hodges 1993; Wildlife Counts 2003). Training appears to be unable to remove some observer effects, however, as shown by Alldredge et al. (2008) who found substantial differences among paired observers during avian point counts, even when experimental trials involved highly experienced field workers.

Although there is clearly more work to do, the detection of available individuals has received extensive treatment. Other sampling problems (Table 1) would benefit from a similar level of scrutiny on a systematic basis. Plenty of ad hoc approaches exist. For example, by combining remote sensing data with information on habitat use, investigators can better identify the full range of a species they wanted to enumerate, or test whether a sampled population is representative of a broader target population. But well-developed solutions to the full range of counting problems are not in widespread use, and we lack a good understanding of the nature of some problems. How often, and under what circumstances, species are misidentified during ecological surveys, for example, is not well known. Likewise, the effects of miscounting, or making estimation errors, when faced with groups of organisms has received limited study despite discussion of the problem for half a century (cf. Matthews 1960).

Studies have estimated errors when groups are counted by using situations where the true number of individuals is known, for example by having observers estimate numbers from photographs (Prater 1979; Erwin 1982) or scaled models (Frederick et al. 2003). Although these studies consistently show that people tend to underestimate the number of individuals in large groups (Frederick et al. 2003), the magnitude and nature of the error vary among studies. Some data also suggest that error varies with group size, with small flocks overestimated and large flocks underestimated (Prater 1979). If this pattern is widespread, it raises serious concerns because a decline in average group size over time would create a shift from a situation where population size is systematically underestimated to one where it is systematically overestimated. Thus, the estimate of the decline would be too low (Fig. 1). Other studies, however, have not found such systematic errors (e.g. Frederick et al. 2003). Another difference among studies is that some suggest that overestimates and underestimates cancel out, whereas others do not (Frederick et al. 2003). A more general understanding of the nature of such errors, rather than relying on isolated case studies such as those described here, would allow us to better recognize how miscounts might affect inferences and how to develop solutions.

Figure 1.

If observers consistently overestimate when counting relatively small groups, and underestimate when counting large groups (a; modified from Prater 1979), and if group size declines proportional to population size, then counting errors could cause population declines to be consistently underestimated (b). How often this problem actually occurs is not known.

Using simulations

The ultimate test of any sampling method is to compare the estimate to the true number of items in the population. In ecology, this is generally impossible because the true population size is unknown, which, of course, is why sampling is used. For instance, Jenkins & Manly (2008) found that their population estimates compared favourably with previous guesstimates, but had no firm benchmark for comparison. Although consistent estimates from different methods are encouraging, confidence will always be limited when there is uncertainty associated with every method used (Newson et al. 2008). Imaginative simulations, however, can provide insights into the absolute value of different methods. In some cases, simulation can involve physical models. Frederick et al. (2003), for example, built a model of a large wading bird colony wherein everything was scaled so that a person standing over the model had a view equivalent to that of an observer surveying from a plane. The model was seeded with alfalfa ‘birds’, and counts were conducted both by biologists circling around the model and from photographs.

Brook et al. (2008) took a similar approach, simulating invertebrate sampling by scattering known quantities of small plastic beads in hay meadows and determining how the recovery rate when using suction sampling varied in response to sward height. The use of beads not only provided certainty over the true population size, but also removed a potentially confounding factor by controlling for any biological differences that exist between the invertebrates that live in the subtly different microhabitats sampled.

Literal models are not the only type of simulation, of course. Computer simulations are often used to explore ecological questions that cannot easily be addressed empirically, and to make predictions about the future. Likewise, simulations are useful for evaluating estimation methods because the true values of parameters can be pre-set in test data. Computer simulations can also be used to compare sampling options (e.g. Gruber et al. 2008). Münzbergová & Ehrlén (2005) used this approach to evaluate the efficiency of different methods of collecting data for demographic models. Simulations have also demonstrated that violating a key assumption of mark–recapture models can be warranted, because the sample size increases that are possible when the assumption is relaxed can improve precision without excessive bias (O’Brien, Robert & Tiandry 2005).

With these examples in mind, simulation should perhaps be used more often when designing monitoring programmes. Power analysis using simulations is already commonly used to determine sampling effort (Diefenbach et al. 2003; Roy, Rothery & Brereton 2007). But by simulating populations and ‘monitoring’ them using known detection rates and biases associated with different sampling methods, one could do far more to test the ability of alternative protocols to detect predetermined trends or population sizes. Such computer experiments could rank the accuracy of each method, evaluate the benefits of combining methods, etc. If the cost of each method was known, one could extend the evaluation to identify the most cost-effective protocol (Nichols et al. 2008). This ability to directly compare methods in both monetary and biological terms is of particular value in applied ecology where the goal is not just to generate knowledge but also to help managers in their decisions about how best to direct limited resources. Consideration of costs is important to many management questions, ranging from local issues such as crop depredation (Vickery, Watkinson & Sutherland 1994) or the control of introduced species (Smith, Henderson & Robertson 2005; Ellis & Elphick 2007), to global questions about resource allocation (Wilson et al. 2006). Clearly, there is a case for addressing economics in the design of sampling protocols too (Campbell, Swanson & Sales 2004).

Despite their many advantages, neither computer nor physical models can ever capture the full complexity of real sampling. The final paper in this Special Profile goes further than most in simulating the reality that field workers face. Alldredge and colleagues have developed a computer-operated system for broadcasting bird songs that they can use to test the abilities of observers in the field, and to determine the consequences of observer errors (Alldredge et al. 2007a,b,c; Simons et al. 2007). Using this system, they estimated how accurately observers could locate singing birds (Alldredge et al. 2008). Despite using experienced, trained observers, they found much error in the mapping of the simulated bird songs, and a high double-counting rate. More troubling is that, after analysis using distance methods, their density estimates were ~70% greater than the true values (averaged across experiments in their Table 2). Considerable variation in the degree of bias across species and among observers precludes simple corrections of these overestimates. Because the authors knew the true location of each simulated bird, they could repeat their analyses using the correct distances and parse out the importance of different errors. Doing this shows that the average bias drops to ~5% when the true distances are used, clearly implicating poor location of songs as the main problem.

The same study also found that substantial errors in population estimates persist when double-observer methods are used. The averaged error across all experiments was small, but for individual tests, it ranged from estimates that missed a third of the birds, to those that overestimated by half. Moreover, less than half the estimates included the true population size within the 95% confidence interval (Alldredge et al. 2008).

The best studies not only identify problems, but also point to solutions. By comparing scenarios with different numbers of species and individuals in the surveyed population, Alldredge et al. (2008) showed that the complexity of the surveying situation affects accuracy. This result is unsurprising, but the method provides a means of determining just how big a difference complexity makes, and a way of quantifying the trade-off between surveying more species vs. getting better information on fewer species. From Alldredge et al.'s results, one cannot separate the effects of species number (which is easy for an investigator to modify, by collecting data on fewer species) and number of individuals in the sampled population (which is harder to control). That relatively modest reductions in the sampled population had an appreciable effect, however, should be sobering for those who routinely conduct surveys in which dozens of species are sampled. For example, one wonders whether the increased population estimates for British birds reported by Newson et al. (2008) could, in part, be attributed to systematic overestimation caused by distance estimation errors.

Conclusions

The lessons that can be drawn from these papers are not new, but they bear repeating. First, the papers reiterate basic truths about sampling: biases are ubiquitous; assumptions must be recognized and, ideally, met; the sampled population must be clearly identified, and caution used when extending inferences beyond that population. Most ecologists know all this, of course, but the realities of data collection often result in compromise even when rigour is at the forefront of an investigator's mind. Biases cannot always be identified or eliminated; assumptions often cannot be met, sometimes simply because of constraints inherent to a species’ biology; and often (perhaps usually), resources are insufficient to sample the entire population of interest. Simply giving up and deciding that a study should not be done because of these problems is often unacceptable to managers and policy makers who seek information, and who will make decisions whether there is scientific input or not. Indeed, the very notion that our work has applications often dictates that we must make available methods fit the problems, rather than seeking idealized study systems that fit the needs of our methods.

Under these circumstances, how should applied ecologists proceed? If their assumptions are violated, should we use analytical solutions such as distance sampling and double-observer surveys anyway, or should we resort to treating raw counts as indices and simply do a good job of identifying their limitations? And when are the flaws so great that studies or past data sets really should be abandoned? What we really need to know to choose between these options is not just whether problems exist, which is often obvious, but whether they result in flawed understanding and bad decision-making. If biases lead to erroneous population counts, but the counts are generally conservative and rank species fairly accurately, as appears to be the case for British birds (Newson et al. 2008), then addressing those biases is far less important than when biased methods cause us to miss much of the ecological picture (cf. MacSwiney et al. 2008). Even serious biases may not result in bad management decisions (but see McKelvey, Aubry & Schwartz 2008).

Currently, we lack good data on when methodological limitations are most likely to compromise policy. Without this information, it is hard to know when problems are likely to be so bad that we really should abandon studies until we have better methods. Likewise, without information on which types of problem most often result in poor decisions, it is hard to know where most effort should be put when developing better methods. For example, huge advances have been made in addressing detection problems (Buckland et al. 2001, 2004; Williams, Nichols & Conroy 2002), and more could clearly be done. But, it is less clear which detection issues are most pressing (Alldredge et al. 2007d), or whether further advances in this arena would create greater benefits than focusing attention on other problems, such as those associated with linking the sampled population to the population of interest, counting organisms in groups, or identification accuracy.

The papers discussed here, and many others like them, show that substantial advances have been made in the development of methods used in applied ecology. Many more improvements can be expected in the future. The biggest advances, however, are perhaps most likely to come from studies that replicate tests under different circumstances, or that pool and synthesize information from multiple sources. Moving beyond the examination of isolated cases, which can give the impression of idiosyncratic results without clear pattern, to studies that seek generalities across the field might provide better insights into where new methods research would be most profitable. Important questions include:

  • 1How accurate do estimates need to be to ensure that good decisions are made? Improving methods is never a bad thing, but continued incremental improvements may not substantially affect our understanding of a problem. Identifying and recognizing when data collection is good enough for the desired purpose will hasten advances in the application of ecological information.
  • 2Which types of bias or assumption violation are most ubiquitous, and which are most likely to result in poor inferences or policy decisions? If we can isolate certain types of problem that consistently have serious consequences, then we would be better equipped to focus methods development where it will be most efficient. Compiling information from studies that differ in the specific questions asked, organisms studied and data collected, but that use similar techniques, could prove especially helpful in generating broad insights.
  • 3What are the relative costs of different methods, and how does cost trade-off against improved knowledge? The cost of research looms over us constantly, and yet there are few systematic cost–benefit analyses that try to provide general guidelines on what approaches make most sense for particular types of question. In a world of tight budgets, and no shortage of ecological problems (Sutherland et al. 2006, 2008), such research would seem very relevant.
  • 4How often do counting errors result in poor management? Ultimately, this is the question that matters most to applied ecologists. When errors are small, or do not influence the conclusions drawn, then maybe we have reached the point of diminishing returns and should move on to other problems. Any assessment of this question, however, should carefully consider the rate of bad management decisions after the use of flawed analyses relative to the rate expected without any analysis at all.

Editors would also benefit from studies that address questions like those listed above because they pertain to the thorny problem of whether we should always reject methodologically flawed papers, or whether we should sometimes accept those papers (with recognition that flaws exist) on the grounds that even slightly flawed papers can sometimes contribute more insights than would be gained if the data never saw the light of day.

Acknowledgments

Thanks to E.J. Milner-Gulland, Rob Freckleton and Gill Kerby for the opportunity to write this editorial and for editorial comments, and to the authors of the featured papers for making me think more carefully about the importance of studying ecological methods and about my own data collection methods.

Ancillary