Stream restoration researchers have a complex array of alternative assessment methodologies using macroinvertebrates. We examined sources of variation among three field sampling methods and five metrics in three networked streams impacted by a circumneutral coal mine discharge treated midway through an 11-year study. Constructed wetlands captured the primary stressor, 700 kg iron/day. Before pollution abatement, copious iron hydroxide smothered downstream sites for decades. Two second-order streams and one fourth-order receiving stream, each with matching locations upstream and downstream, were monitored midsummer from 1994 to 2004. We compared taxa density (TD) (number/sample), abundance, expected taxa richness (ETR), U.S. regional pollution tolerance values (RTV), and community similarity (CS) indices from 3 to 11 replicate samples/site using grab samples (i.e. D-nets, rock washes) and incubated leaf packs. Variation due to sampling method, metric, location, and year significantly influenced outcomes when analyzed using regression and analysis of variance. TD, RTV, and CS indicated biological recovery lagged 6 years behind chemical improvement; ETR and abundance showed more severe, persistent impairment in the two, highly impacted second-order streams compared to the fourth-order stream. Incubated leaf packs offered a preview of stream recovery in downstream sites, providing clean food (leaves) and substrate (mesh) and attracted more taxa and abundance than grab samples. In light of the worldwide distribution of coal mining often accompanied by metal hydroxide deposits into streams, we suggest restoration project managers use a variety of sampling methods, metrics, and models to evaluate remediation of physical as well as chemical impairment from mining.