We evaluated the variability and sensitivity of a suite of biological metrics for detecting the ecological effects of metals in streams. The variability of these metrics was evaluated by partitioning the total variance in a three-way analysis of variance among spatial, seasonal, annual, and temporal–spatial interaction components using data from 6 years of biomonitoring on the Arkansas River, Colorado, USA. We then calculated the statistical power of these metrics given a likely experimental design aimed at detecting metal-pollution effects in streams and using estimates of variability from field data. Finally, we experimentally tested the sensitivity of these metrics to a metal mixture in stream microcosms. More than one half of the variation in richness and scraper functional feeding group metrics was explained by differences among sampling sites, which were presumably due to the presence of metal pollution. Statistical power was highest for richness measures and low for all other metrics examined. Experimental exposures revealed that richness measures were also more sensitive than functional group metrics. Our results support those of previous, comparative studies that show the superiority (in terms of sensitivity, variability, and statistical power) of taxa richness measures. Most abundance, ratio, and functional group metrics were either insensitive to metal pollution, highly variable, or both. We conclude that similar systematic testing on a variety of metrics with other stressors will greatly enhance the utility of biological metrics in assessing the ecological effects of contaminants and establishing biological criteria.