A method for measuring the relative information content of data from different monitoring protocols


Correspondence author. E-mail: mmunson@cs.cornell.edu


1. Species monitoring is an essential component of assessing conservation status, predicting effects of habitat change and establishing management and conservation priorities. The pervasive access to the Internet has led to the development of several extensive monitoring projects that engage massive networks of volunteers who provide observations following relatively unstructured protocols. However, the value of these data is largely unknown.
2. We develop a novel cross-data validation method for measuring the value of survey data from one source (e.g. an Internet checklist program) relative to a second, benchmark data source. The method fits a model to the data of interest and validates the model using benchmark data, allowing us to isolate the training data's information content from its biases. We also define a data efficiency ratio to quantify the relative efficiency of the data sources.
3. We apply our cross-data validation method to quantify the value of data collected in eBird – a western hemisphere, year-round citizen science bird checklist project – relative to data from the highly standardized North American Breeding Bird Survey (BBS). The results show that eBird data contain information similar in quality to that in BBS data, while the information per BBS datum is higher.
4. We suggest that these methods have more general use in evaluating the suitability of sources of data for addressing specific questions for taxa of interest.