False positive errors are a significant component of many ecological data sets, which in combination with false negative errors, can lead to severe biases in conclusions about ecological systems. We present results of a field experiment where observers recorded observations for known combinations of electronically broadcast calling anurans under conditions mimicking field surveys to determine species occurrence. Our objectives were to characterize false positive error probabilities for auditory methods based on a large number of observers, to determine if targeted instruction could be used to reduce false positive error rates, and to establish useful predictors of among-observer and among-species differences in error rates. We recruited 31 observers, ranging in abilities from novice to expert, who recorded detections for 12 species during 180 calling trials (66 960 total observations). All observers made multiple false positive errors, and on average 8.1% of recorded detections in the experiment were false positive errors. Additional instruction had only minor effects on error rates. After instruction, false positive error probabilities decreased by 16% for treatment individuals compared to controls with broad confidence interval overlap of 0 (95% CI: −46 to 30%). This coincided with an increase in false negative errors due to the treatment (26%; −3 to 61%). Differences among observers in false positive and in false negative error rates were best predicted by scores from an online test and a self-assessment of observer ability completed prior to the field experiment. In contrast, years of experience conducting call surveys was a weak predictor of error rates. False positive errors were also more common for species that were played more frequently but were not related to the dominant spectral frequency of the call. Our results corroborate other work that demonstrates false positives are a significant component of species occurrence data collected by auditory methods. Instructing observers to only report detections they are completely certain are correct is not sufficient to eliminate errors. As a result, analytical methods that account for false positive errors will be needed, and independent testing of observer ability is a useful predictor for among-observer variation in observation error rates.