Adequacy of Visually Classified Particle Count Statistics From Regional Stream Habitat Surveys1

Authors

  • John M. Faustini,

    1. Respectively, Postdoctoral Research Associate, Department of Fisheries and Wildlife, Oregon State University, Corvallis, Oregon 97331
    Search for more papers by this author
  • Philip R. Kaufmann

    1. Research Physical Scientist, Western Ecology Division, National Health and Environmental Effects Research Laboratory, U.S. Environmental Protection Agency, 200 SW 35th Street, Corvallis, Oregon 97333
    Search for more papers by this author

  • 1

    Paper No. J06065 of the Journal of the American Water Resources Association (JAWRA). Discussions are open until April 1, 2008.

(E-Mail/Faustini: faustini.john@epa.gov).

Abstract

Abstract:  Streamlined sampling procedures must be used to achieve a sufficient sample size with limited resources in studies undertaken to evaluate habitat status and potential management-related habitat degradation at a regional scale. At the same time, these sampling procedures must achieve sufficient precision to answer science and policy-relevant questions with an acceptable and statistically quantifiable level of uncertainty. In this paper, we examine precision and sources of error in streambed substrate characterization using data from the Environmental Monitoring and Assessment Program (EMAP) of the U.S. Environmental Protection Agency, which uses a modified “pebble count” method in which particle sizes are visually estimated rather than measured. While the coarse (2ϕ) size classes used in EMAP have little effect on the precision of estimated geometric mean (Dgm) or median (D50) particle diameter, variable classification bias among observers can contribute as much as 0.3ϕ, or about 15-20%, to the root-mean-square error (RMSE) of Dgm or D50 estimates. Dgm and D50 estimates based on EMAP data are nearly equal when fine sediments (<2 mm) are excluded, but otherwise can differ by up to a factor of 2 or more, with Dgm < D50 for gravel-bed streams. The RMSE of reach-scale particle size estimates based on visually classified particle count data from EMAP surveys, including variability associated with reoccupying unmarked sample reaches during revisits, is up to five to seven times higher than that reported for traditional measured pebble counts by multiple observers at a plot scale. Nonetheless, a variance partitioning analysis shows that the ratio of among site to revisit variance for several EMAP substrate metrics exceeds 8 for many potential regions of interest, suggesting that the data have adequate precision to be useful in regional assessments of channel morphology, habitat quality, or ecological condition.

Introduction

Background and Motivation

Since the mid-1980s, a large number of procedures to survey and monitor aquatic and riparian habitat quality in rivers and streams have been proposed and applied, typically focusing on habitat quality for fish (Platts et al., 1983, 1987; Bain and Stevenson, 1999; Kaufmann et al., 1999; Johnson et al., 2001). In 1988, the U.S. Environmental Protection Agency’s (USEPA) Science Advisory Board recognized the need to develop methods for monitoring ecological status and trends to provide better assessments of the condition of the nation’s ecological resources, leading to initiation of the U.S. Environmental Protection Agency's (USEPA's) Environmental Monitoring and Assessment Program (EMAP) (Paulsen et al., 1991). Implementation of the Northwest Forest Plan (USDA and USDI, 1994), developed in the mid-1990s in an effort to resolve an escalating conflict between protection of endangered species and timber production in the Pacific Northwest, also provided a major regional impetus for ecological monitoring. These developments marked a shift in federal land management policy “away from piecemeal agency or program mandates toward management of ecological systems within a geographic area as an integrated whole” (Baker et al., 1995). One indication of the resultant explosion of interest and activity in the area of aquatic and riparian habitat assessment is provided by Johnson et al. (2001), who inventoried 429 sampling protocols drawn from 112 documents for monitoring physical and biological habitat attributes relevant to salmonids in the Pacific Northwest alone.

Streambed substrate is a key aspect of physical habitat in lotic ecosystems that is assessed in some form by many if not most stream habitat survey protocols (Johnson et al., 2001). Here, “substrate” refers to the material comprising the surface layer of the streambed, predominantly fluvial gravels, fine sediments, and bedrock, but also including wood and fine and coarse organic matter. Substrate is a key factor determining habitat quality for both benthic invertebrates and aquatic vertebrates. In general, gravel or cobble substrates with a low abundance of fine sediment (<2 mm) provides the highest quality habitat for both benthic macroinvertebrates and fish, particularly salmonids (Wood and Armitage, 1997). Benthic macroinvertebrate community structure is strongly influenced by substrate, and macroinvertebrate diversity and richness generally increases with both substrate size and heterogeneity (Hynes, 1970; Minshall, 1984; Biesel et al., 2000). Salmon preferentially select spawning sites with low amounts of fine sediment (Coulombe-Pontbriand and Lapointe, 2004), and deposition of fine sediments in spawning gravels decreases egg-to-emergence survival (Lapointe et al., 2004) and growth and survival of juvenile salmonids (Suttle et al., 2004).

Bed surface particle size in gravel bed streams can change in response to changes in sediment supply to streams, decreasing as sediment supply is increased or vice versa (Dietrich et al., 1989; Lisle et al., 1993; Buffington and Montgomery, 1999). Many human activities on the landscape (e.g., logging, agriculture, mining, road construction) tend to increase erosion and supply of sediment to streams (Sidle et al., 1985; Grant and Wolff, 1991; Allan et al., 1997; Sidorchuk and Golosov, 2003), and an increase in the percentage of fine particles in bed surface sediments can occur in response (Lisle, 1982; Dietrich et al., 1989). Jackson et al. (2001) present preharvest streambed and postharvest streambed particle size data from six small headwater streams in the Coast Ranges of western Washington whose catchments were 100% clearcut, showing a nearly fourfold average increase in the percentage of fine sediments (<2 mm), from 12 to 44%. The D50 in one stream decreased by a factor of 5, from 25 to 5 mm, while the D50 in three others decreased from medium gravel (10-42 mm) to sand size (<2 mm). Madej (2001) documented two to threefold changes in D50 (both increases and decreases) using a 16-year record of pebble counts at gauging stations in Redwood Creek, California, associated with the passing of a sediment wave attributed to the combined effects of surface erosion, gullying, and road crossing and road-fill failures triggered by a series of large floods following the advent of widespread commercial timber harvest.

Particle counts (or pebble counts, as they are commonly, if inaccurately, called) are widely used to determine the surface particle-size distribution in streams with predominantly coarse (gravel or larger) bed material. A particle count is conducted by selecting a preset number of particles, typically at regular intervals along a grid or a series of transects (Wolman, 1954), or in a longitudinal zigzag pattern along a stream reach (Bevenger and King, 1995). Particles are selected either by pacing and selecting the first particle touched by the observer’s vertical finger with eyes averted (to avoid bias), or at marked points along tape or the intersections of an established grid (Bunte and Abt, 2001). Particle diameters are typically determined either by measuring the intermediate (b-axis) diameter with a ruler or calipers or using a template with square holes graduated in logarithmic increments (Bunte and Abt, 2001).

Monitoring programs intended to cover a large geographic area commonly employ streamlined sampling procedures to enable sampling of a large number of sites within practical constraints of limited manpower, time, and funds. Sampling protocols must balance at-a-site detail and precision against the landscape-scale resolution and precision gained from increased sample size. Simplifying sampling procedures can reduce the sampling effort per site and thereby increase the number of sites sampled for a given level of effort. While increasing the number of sites sampled increases the spatial and/or temporal resolution of a monitoring program, questions have been raised about the precision and reliability of data derived from streamlined habitat monitoring protocols (Platts et al., 1987; Bauer and Ralph, 2001; USFS, 2004). In response, a small body of literature evaluating and comparing habitat assessment methods has emerged (Roper and Scarnecchia, 1995; Wang et al., 1996; Roper et al., 2002; Sennatt et al., 2006). In this paper, we evaluate the EMAP physical habitat survey protocol for wadeable streams (Lazorchak et al., 1998; Peck et al., 2006), in which particles sampled in a systematic particle count are tallied by size class on the basis of visual estimates, as a tool for assessing streambed surface particle size and abundance of fine sediments. (Note that this is quite different from visual estimation methods used in some habitat monitoring protocols, such as the Instream Flow Incremental Methodology (Bovee, 1982), in which an observer is required to estimate the relative percentages of various size classes of sediment within a defined area.) Since the early 1990s, this sampling protocol has been applied at more than 5,000 sites across the United States (U.S.) by a range of federal and state agencies and cooperators on projects covering areas ranging from single ecoregions (e.g., Hayslip et al., 2004) to a national assessment of wadeable streams (USEPA, 2006). These data collectively represent a major ecological data resource. However, while assessments of the validity and precision of various aspects of the EMAP protocol have been performed (Kaufmann et al., 1999; Larsen et al., 2004; Sennatt et al., 2006), a comprehensive evaluation of the substrate monitoring portion of the protocol and its accuracy and precision in relation to traditional pebble counts, in which particle diameters are individually measured, has not been previously reported.

Objectives

The objectives of this paper are (1) to assess the sources and relative magnitudes of bias and uncertainty in reach-scale estimates of particle size or abundance of fine sediments derived from stream surveys using EMAP sampling protocol, (2) to quantify the precision of such visual classification-based particle size metrics and compare it to the precision of data from traditional pebble counts, and (3) to evaluate whether such data have sufficient accuracy and precision to address science and management questions at a regional scale and discuss appropriate uses for such data.

Sources of Error in Particle Size Data

In any environmental sampling, there is random sampling error due to natural variability (Gilbert, 1987) and what we call methodological error, or error attributable to the way in which samples are collected and analyzed. Random sampling error arises because independent random samples of size n will have mean values that differ slightly from each other and from the true population mean even if there is no sampling bias or measurement error due to the “luck of the draw.” This error is unavoidable and is wholly determined by the variability of the population of interest (e.g., diameters of particles on the bed of a stream) and the sample size. Methodological error arises from the sampling protocol and analysis methods used, and therefore is to some degree under the researcher’s control.

The accuracy of a sample (i.e., a random or systematic set of observations or measurements of a specified attribute) depends upon the precision and bias of the observations or measurements comprising the sample. Precision denotes the typical size of deviations from the mean value obtained by repeated sampling or measurements of the same quantity (e.g., particle diameter) at the same location, and is a measure of the uncertainty associated with the estimated mean value of that quantity. Bias refers to the difference between the mean value of repeated measurements or samples and the true value of the quantity being measured or sampled. The true value (e.g., the median diameter of particles on the surface of a streambed within a defined area) is generally unknown and can only be estimated; hence, bias and accuracy can only be estimated but not known with certainty. A sampling procedure has high accuracy if it has low bias and high precision (i.e., low variance); it has low accuracy if it has either high bias or low precision. However, if observations have low bias, then increasing sample size can increase the precision and accuracy of a sample even if the precision of individual observations is low.

Because we are working with particle count data, in which particles are visually classified using broad size classes, some of the aspects of sampling error we focus on in this study are unique to methods that use visually estimated size class data, while others are common to all particle count methods. Visual classification error arises when an observer incorrectly estimates the size of a particle and assigns it to the wrong size class, which can contribute to both bias and loss of precision. Binning error arises from aggregating observations into size classes prior to computing particle size statistics [e.g., the median (D50) or geometric mean particle diameter], which effectively reduces the precision of individual particle size observations and can lead to errors in whole-sample particle-size statistics. This occurs whenever particle sizes are lumped in classes, regardless of whether the classification is done by visual estimation or by using a template or sieve. In addition, because the standard summary metric for the central tendency of streambed particle size in EMAP datasets, lsub_dmm, (Kaufmann et al., 1999) is the log of the geometric mean particle size (Dgm) rather than the median particle diameter (D50) typically reported from pebble count data, we examine differences between Dgm and D50 (as two alternative measures of central tendency of a particle size distribution) and the errors that could result from using these statistics interchangeably. Finally, because lsub_dmm in EMAP datasets and assessments (e.g., Hayslip et al., 2004; Kaufmann and Hughes, 2006) includes bedrock and hardpan as a size class (see next section and Discussion), we examine the effects of including vs. excluding these “nonparticle” observations in the computation of Dgm or D50 from EMAP data.

Methods

Data Sources and Field Methods

The data discussed in this paper are drawn principally from the EMAP West study (Stoddard et al., 2005), a probability survey of perennial streams and rivers covering 12 western states (Figure 1). We focus here on the wadeable streams from this survey (first- to third-order streams on 1:100,000 scale USGS topographic maps). A total of 1,232 wadeable stream reaches (872 randomly selected and 360 hand-picked sites) were sampled as part of this project from 2000 through 2004. A subset of sites were sampled multiple times (113 total revisits), for a total of 1,345 site visits. Sites were sampled between May and October each year during an “index period” of approximately 90 days that varied from region to region to coincide with relatively stable low flow conditions. We also examine substrate measurement data from 15 sites in the Necanicum River and E. Beaver Creek basins in the northern Oregon Coast Range (method comparison sites, Figure 1) that were sampled by the lead author in summer 2003 using EMAP protocol in addition to template measurements of substrate size.

Figure 1.

 Map Showing Wadeable Stream Locations Sampled for EMAP West Assessment, 2000-2004 (Stoddard et al., 2005). Regions discussed in text are indicated by shading and outlines.

Under the survey protocols used in the EMAP West survey (Peck et al., 2006), the channel bed surface material (substrate) is sampled at five locations (at 0, 25, 50, 75, and 100% of wetted channel width) on each of 21 evenly spaced transects, for a sample size of 105 observations per site. The streambed substrate at each location is visually classified into one of seven size classes (fine sediment, sand, fine gravel, coarse gravel, cobble, small boulder, large boulder) or one of six other nonparticle substrate classes such as bedrock, hardpan and wood (Table 1). For coarse sediments (coarse gravel to large boulder), EMAP size classes span uniform logarithmic increments of 2ψ units. [We use ψ = −ϕ = log2 D, where D is particle diameter in mm (Parker and Andrews, 1985), as ψ yields positive values for particles >1 mm in diameter and hence is more convenient than ϕ for coarse sediments. Note that ψ = log10D/log10 2 = 3.32 log10D].

Table 1.   Visually Estimated Size Classes Used to Compute Geometric Mean (Dgmv) and Median (D50v) Particle Size Estimates.
EMAP Size ClassClass Limits ψ (approx.)Class Limits (mm)Geometric Mean (mm)
CodeDescriptionLowerUpperLowerUpper
  1. 1All nonparticle observations are omitted when computing Dgmv (no bedrock) and D50v (no bedrock). SA, FN, and nonparticle observations are omitted when computing Dgmv (no fines) and D50v (no fines).

  2. 2Prior to 2002, the XB and SB classes were combined in a single boulder (BL) size class with a geometric mean diameter of 1,000 mm.

Nonparticle
 WD, OTWood (any size), otherNot used
 RR, RS, RC, HP1Rough bedrock, smooth bedrock, concrete, hardpan12134,0008,0005,657
Particles
 XB2Large boulder10121,0004,0002,000
 SB2Small boulder8102501,000500
 CBCobble6864250126.5
 GCCoarse gravel46166432
 GFFine gravel142165.66
 SA1Sand−410.0620.346
 FN1Fine sediment−10−40.0010.060.0078

Particle size estimates based on visual classification were compared with estimates based on actual measurements at 15 sites (Figure 1). At these sites, particles were first visually classified into EMAP substrate classes (Table 1) and then measured using a square-hole template or “gravelometer” (Bunte and Abt, 2001, p. 25) with openings in increments of 0.5ψ, ranging from 2 to 181 mm. The nominal size of the largest opening on the template through which the particle would not pass (retaining sieve size, Sr) was recorded. Sr for particles too embedded to be removed from the streambed was estimated in situ and the observation flagged as an estimate. For particles larger than the maximum template opening (up to 1,024 mm), Sr was estimated using a meter stick.

We also obtained data from a test reported by the Stream Systems Technology Center (USFS, 2004), in which five observers ranging from inexperienced to very experienced measured and visually classified a single sample of 100 riverbed gravel particles using EMAP size classes (John Potyondy, personal communication). In that test, each observer first tallied particles by EMAP size classes (Table 1) using visual estimates, then tallied the same particles by 0.5ψ increments using a square-hole template, and finally repeated the process one more time using a ruler to measure the intermediate (b-axis) diameter (again tallying numbers of particles by 0.5ψ increments).

Finally, to separate the potential effect of the broad visual size classes used in EMAP on the precision of particle size estimates from the effect of observer classification bias, we performed a Monte Carlo analysis using virtual samples drawn from lognormal distributions with specified combinations of geometric mean and standard deviation of particle diameter. We generated samples of 105 virtual particles and binned these particles into 0.5ψ and EMAP size classes, then independently computed geometric mean and median particle diameters (see next section) for each sample from the “actual” (continuous) and binned (0.5ψ and EMAP) size data. For each of 48 combinations of population geometric mean diameter (10, 25, 50, 75, 100, 150, 200, and 300 mm) and population standard deviation (0.5, 0.75, 1.0, 1.5, 2.0, and 2.5ψ), we generated 500 random samples and computed the standard deviation of the mean and median log-transformed particle diameters for the continuous and binned data.

Particle Size Metric Definitions

In this paper, we discuss several variations on each of two statistical measures of the central tendency of the size-frequency distribution of a sample of bed surface particles, the geometric mean diameter (Dgm) and the median diameter (D50). While D50 is the most commonly reported statistic for streambed particle size analyses, Dgm has been proposed as the best statistic to use in evaluating spawning habitat (Platts et al., 1979; Shirazi and Seim, 1981) and is the statistic reported in EMAP data sets and assessments. We use the term metric to refer to these different variations – that is, to a particular statistic (e.g., Dgm) calculated in a specific way (e.g., including vs. excluding bedrock and hardpan). We computed the geometric mean particle size using the entire size distribution, by taking the frequency-weighted arithmetic mean of the log-transformed particle sizes (representing each size class by its midpoint in log units) and exponentiating (Bunte and Abt, 2001, p. 65). We computed the D50 by linear interpolation from the cumulative frequency distribution using log-transformed particle diameter (Bunte and Abt, 2001, p. 41). We also computed two metrics of fine sediment abundance derived from the visually classified substrate data representing the percent of streambed area occupied by sediment finer than 2 mm [i.e., in the FN and SA classes (Table 1)] and 16 mm (FN, SA, and GF classes), designated Pfs2 and Pfs16, respectively.

We denote particle size metrics computed from visually classified data using a subscript v (e.g., Dgmv and D50v for the geometric mean and median particle size, respectively). Unless otherwise specified, these visually based estimates exclude all nonparticle substrate observations (Table 1). We also discuss Dgm and D50 metrics based on visually classified substrate data including bedrock, concrete and hardpan substrate classes as the largest “size class” (Table 1, Row 2), which we designate as Dgmv (with bedrock) and D50v (with bedrock), respectively. Dgmv (with bedrock) is the particle size estimate that has been used in EMAP studies in lieu of D50 as measure of the dominant or representative particle size (Herger and Hayslip, 2000; Hayslip et al., 2004; Kaufmann and Hughes, 2006); log Dgmv (with bedrock) is equivalent to lsub_dmm in Kaufmann et al. (1999), although it is computed slightly differently. We also computed Dgm and D50 metrics based on visually classified particle size data after excluding fine sediments (<2 mm) in addition to bedrock and hardpan; these are designated as Dgmv (no fines) and D50v (no fines), respectively. Dgm or D50 estimates based on particle sizes measured with a ruler and those measured using a template graduated in 0.5ψ increments are designated Dgmr or D50r and Dgmt or D50t, respectively. To separate the effects of binning error associated with the coarse size classes used in the visual classification-based metrics from classification error, we computed metrics of Dgm and D50 (excluding bedrock) based on ruler or template measurements subsequently binned by EMAP size classes (Table 1) prior to analysis. These metrics are designated Dgmr (EMAP) and D50r (EMAP) for ruler measurements and Dgmt (EMAP) and D50t (EMAP) for template measurements.

Statistical Analyses

We computed the absolute and relative differences between each of these metrics, testing for significant differences using paired t-tests for zero mean difference between metrics in ψ units (i.e., log-transformed values). Although this test assumes that differences are normally distributed, it is relatively robust against deviations from normality for moderate to large sample sizes (Ramsey and Schafer, 2002, p. 73). We report these test results both directly (i.e., as a mean difference between pairs of log-transformed metrics and a 95% confidence interval for this mean difference) and as back-transformed (exponentiated) values in the original units. Assuming the log-transformed data are reasonably symmetrically distributed, the back-transformed values can be interpreted as estimates of the median (not mean) ratio of the two metrics in their original units and a 95% confidence interval for this median ratio (see Ramsey and Schafer, 2002, pp. 70-74). Thus, differences are expressed in terms of a multiplicative factor [e.g., if mean (X1 – X2) = 0.5ψ with a 95% confidence interval of ±0.1ψ, then median (X1/X2) = 20.5 = 1.41 with a 95% confidence interval of 20.5−0.1 = 1.32 to 20.5+0.1 = 1.52].

We based our evaluation of the precision of visually based particle size metrics on data from within-season site revisits in the EMAP West probability survey. After excluding hand-picked sites and sites with fewer than 55 particles [excluding “nonparticle” observations (Table 1)], the dataset consisted of 957 site visits with 848 unique site locations, 69 within-season revisits, and 40 between-year revisits. Following Larsen et al. (2004), we computed four major components of variation for particle size metrics: Site, Year, Site × Year interaction, and residual. Site variance, inline image, is the component of total variance due to persistent differences among stream reaches across a region, and is a measure of the inherent spatial variability of the response variable of interest within the region. Year variance is the synchronous or coherent year-to-year variation among all sites in a region that might reflect regional influences, such as climate. Site × Year interaction variance is the independent, unsynchronized year-to-year variation among sites in a region, reflecting local-scale influences. Finally, residual variance, inline image, is the variance for within-year revisits to the same site during the index period (in effect, replicate samples). This term captures all remaining variation, including measurement error, team-to-team differences in applying the sampling protocol, imprecision in exactly relocating the unmarked sampling locations, and real short-term variation in the response variable during the temporal window when measurements are made within a given sampling season. We estimated these variance components using restricted maximum likelihood estimation in the Mixed Procedure in the SAS statistical package (Version 9.1, SAS Institute, Inc., Cary, North Carolina).

We defined two measures of precision for substrate metrics based on these variance components. The first, σrep, is simply the root-mean-squared error (RMSE) of the residual, which is equivalent to the pooled standard deviation of within-season revisit values of the response variable. This is an absolute measure of precision of within-season revisits in the units of the response variable. For our analysis of substrate data from the EMAP West dataset, σrep was based on 69 same-season revisits to randomly selected sites over the five-year sampling effort. The second measure of precision, the “signal-to-noise ratio,”inline image is a relative measure. It is so called because it is the ratio of among site variance, inline image (the “signal” of interest in a regional survey) to residual variance, inline image (a measure of effective sampling error, or “noise”). Because a uniform methodology was applied at all sites and the number of within-season revisits precludes reliable estimates of σrep at smaller spatial scales, we make the assumption that σrep is uniform across geographic subregions of the whole sampling region. S:N is a relative measure of precision in that it depends upon the range of variation of the parameter of interest within the sampling domain or a specified region of interest relative to the effective measurement precision (inline image) of that parameter for same-season revisits. The higher a metric’s S:N, the greater its ability to detect differences among sites.

Metric Comparison Results

We examined the relative magnitudes of errors in or differences between the previously described metrics of median and geometric mean particle size that can arise from several different sources. In particular, we evaluated the following: (1) differences between metrics based on visually classified vs. measured particle sizes, including effects of classification error and binning error; (2) differences between geometric mean (Dgm) and median (D50) particle size and various estimates of each; and (3) effects of including/excluding bedrock and hardpan on Dgm derived from visually classified particle count data.

Visual Classification vs. Measurements

For the 15-method comparison sites where we estimated Dgm and D50 using both visually classified and measured particle size data, there was little difference between visual classification-based estimates (Dgmv, D50v) and measurement-based estimates (Dgmt, D50t), where all estimates excluded bedrock (Figures 2 and 3). Nonetheless, pair-wise comparisons between particle size metrics for the same site based on log-transformed data found that geometric mean diameter Dgm differed significantly (p = 0.003) between template and visual estimates, albeit only by 5%, while there was no significant difference between measured and visually estimated median diameter D50 (p = 0.59) (Table 2, Section 1).

Figure 2.

 Box Plots Showing Distributions of Reach-Average Geometric Mean (Dgm) and Median (D50) Particle Size Metrics Based on Particle Sizes Estimated Visually and Measured With a Template for 15 Sites in the Oregon Coast Range (“Method comparison sites” in Figure 1). Box midline and lower and upper ends show median and 25th and 75th percentile values, respectively; whiskers show maximum and minimum observations; plus indicates mean.

Figure 3.

 Scatter Plots Comparing Selected Metrics From Visually Classified and Measured Particle Size Data for 15 Sites in the Oregon Coast Range. In both plots, vertical axis is Dgm estimated from visually classified particle size samples (excluding bedrock). Horizontal axis shows Dgm (A) and D50 (B) for the same samples with particle sizes measured using a template. Short-dashed line is 1:1 reference line; long-dashed line in (B) is linear regression fit.

Table 2.   Comparison of Differences Between Alternate Estimates of Reach-Scale Geometric Mean (Dgm) and Median (D50) Particle Size (n = 105) for 15 Sites in the Northern Oregon Coast Range.
ComparisonRatio (Var1/Var2)1p-Value2
Var1Var2Var1 − Var2Geometric Mean95% LCL95% UCL
ψlog10
  1. Notes: See text for variable definitions.

  2. 1Relative differences are based on back-transformed mean differences between log-transformed variables. For symmetrically distributed differences, this quantity may be interpreted as an estimate of the median ratio of the two unlogged metrics (Var1/Var2) (Ramsey and Schafer, 2002).

  3. 2From paired t-test for zero mean difference of log-transformed values (n = 15); p-values <0.01 are in bold.

1. Visually classified vs. measurement-based metrics
  DgmvDgmt−0.069−0.0210.950.930.980.003
  DgmvD50t−0.51−0.150.700.660.75<0.0001
  D50vD50t−0.013−0.0040.990.961.030.59
2. Comparisons to evaluate classification error
  DgmvDgmt(EMAP)0.0150.0051.010.991.030.19
  D50vD50t(EMAP)0.0440.0131.031.001.060.03
3. Comparisons to evaluate binning error
  Dgmt(EMAP)Dgmt−0.084−0.0250.940.910.980.003
  D50t(EMAP)D50t−0.056−0.0170.960.931.000.03
4. Comparisons between Dgmand D50
  DgmvD50v−0.50−0.150.710.660.76<0.0001
  DgmtD50t−0.44−0.130.740.690.79<0.0001

Estimated Dgm based on visually classified particle sizes, Dgmv, was highly correlated with estimated D50 based on measured particle sizes, D50t, but tended to be approximately 30% smaller for the 15-method comparison sites. That is, on average, Dgmv – D50t = −0.51ψ (Figure 3B, Table 2), which is equivalent to a median ratio of Dgmv/D50t of 0.70. This difference was principally due to a real difference between Dgm and D50 for the comparison sites (rather than visual classification bias), as is apparent from inspection of Figure 3A. The very close agreement between Dgmv and Dgmt confirms that classification error and binning error effects were small and nearly unbiased.

We found only statistically insignificant and/or small effects (6% or less) due to classification bias and binning error. For Dgm estimated from visually classified particles, there was no evidence of significant classification bias (i.e., that log2Dgmv − log2Dgmt (EMAP) ≠ 0ψ; p = 0.19, Table 2). For D50 from visually classified particles there was moderate evidence (p = 0.03) for a small positive classification bias (D50v/[D50t(EMAP)] = 1.03). There was strong evidence (p = 0.003) of a small negative binning bias for Dgm (Dgmt[EMAP]/Dgmt = 0.94), and there was moderate evidence (p = 0.03) for a similar binning bias for D50 (D50t[EMAP]/D50t = 0.96, Table 2).

While our test for the 15 Oregon Coast Range sites using a single experienced observer revealed only minor to nondetectable classification error effects on estimated Dgm and D50 from visually classified particle count data, visual estimation bias for less experienced observers can be much larger. In a test using five observers ranging from inexperienced to very experienced to measure and classify a single sample of 100 riverbed gravel particles using EMAP size classes (USFS, 2004), visually estimated particle size distributions were biased toward finer particle size for all five observers relative to distributions based on both template and ruler measurements (Figure 4A). Mean estimated D50 from measured particle sizes was 38.0 mm for template measurements (range: 37.3-38.4 mm) and 42.1 mm for ruler measurements (range: 41.7-42.4 mm), while visual classification-based estimates of D50 averaged 30.3 mm (range: 24.5-35.8 mm). For the same observer, the visually based D50 estimates averaged 7.7 mm (20%) lower than estimates from template measurements and 11.8 mm (28%) lower than estimates from ruler measurements, differences that were statistically significant (p-values of 0.03 and 0.006, respectively, from paired t-test, Table 3). For Dgm, estimates ranged from 35.6 to 36.8 mm for template measurements and from 38.9 to 40.6 mm for ruler measurements, while visual classification-based estimates averaged 27.1 mm (range: 20.6-33.2 mm). For the same observer, visual classification-based Dgm estimates averaged 8.9 mm (25%) lower than the template-based estimates and 12.3 mm (31%) lower than the ruler-based estimates (p-values = 0.02 and 0.009, respectively, Table 3).

Figure 4.

 Particle Size Distribution for 100 Streambed Particles Determined by Five Observers Using a Template, a Ruler, and Visual Classification (modified from Figure 1 in USFS, 2004). (A) Comparison of particle size distribution curves from visual classification vs. composite (average) curves from measured values. Particle size distributions from ruler and template measurements were nearly identical among observers. (B) Comparison of composite particle size distribution curves (averaged among five observers) from visual classification vs. template and ruler measurements binned at 0.5ψ increments and by EMAP size classes (see Table 1).

Table 3.   Mean Differences and Associated p-Values for Comparisons Between Visually Estimated and Measured Particle Size Metrics Determined by the Same Observer (based on 100 streambed particles measured by five observers; data from USFS, 2004).
Comparison Variables1Diff. (Var1 – Var2)p-value2
Var1Var2mm%
  1. 1Subscripts: v, t, r– visual, template, and ruler, respectively. Template and ruler data binned at 0.5ψ intervals; Dxxt(EMAP) and Dxxr(EMAP) are based on template and ruler measurements, respectively, binned by EMAP size classes (Table 1).

  2. 2p-Values from paired t-test (n = 5); boldface values are significant at α = 0.05.

Visual vs. measured D50
 D50vD50t−7.7−200.03
 D50vD50r−11.8−280.006
Visual vs. measured Dgm
 DgmvDgmt−8.9−250.02
 DgmvDgmr−12.3−310.009
Classification error – D50
 D50vD50t(EMAP)−4.4−130.10
 D50vD50r(EMAP)−10.2−250.02
Classification error – Dgm
 DgmvDgmt(EMAP)−6.9−200.03
 DgmvDgmr(EMAP)−12.8−320.01

The difference between Dgm and D50 values based on visually classified vs. measured particle size data can be attributed to a combination of observer bias (classification error) and binning error. As before, we assessed observer bias using the differences between the D50 and Dgm estimates from visual classification (D50v and Dgmv, respectively) and the corresponding estimates based on measured values binned by EMAP size classes (D50r[EMAP], Dgmr[EMAP] and D50t[EMAP], Dgmt[EMAP] for ruler and template measurements, respectively; Figure 4B). For D50, the visual estimates averaged 10 mm (25%) smaller than the ruler-based estimates and 4.4 mm (13%) smaller than the template-based estimates binned by EMAP size classes for paired comparisons by the same observer, with corresponding p-values of 0.02 and 0.10 from a paired t-test (Table 3). For Dgm the evidence for observer bias was somewhat stronger, with visual estimates averaging 13 mm (32%) smaller than ruler-based estimates and 7 mm (20%) smaller than template-based estimates and corresponding p-values of 0.01 and 0.03, respectively, for these differences.

Due to the variable classification bias among observers, visual classification-based estimates of Dgm and D50 were much more variable among observers (i.e., less precise) than estimates based on measured particle sizes. The visual classification-based estimates of D50 had a standard deviation of 5.2 mm (17%) among observers, compared with 0.3 mm (0.7%) and 0.4 mm (1.1%) for ruler and template-based measurements, respectively. For Dgm, the standard deviation among observer estimates was 5.5 mm (20%) for visual vs. 0.7 mm (1.8%) and 0.5 mm (1.3%) for ruler and template-based estimates, respectively. Thus, visual classification yielded estimates that were roughly an order of magnitude more variable among observers than estimates based on either ruler or template measurements.

To evaluate the extent to which binning error due to the broad visual size classes affects the precision of particle size estimates separately from the issue of observer classification bias, we performed a Monte Carlo analysis as described in Methods to compare the precision of Dgm estimates based on exact particle diameters vs. data binned in 0.5ψ and EMAP size classes. Results of this analysis are presented in Figure 5 for four values of population standard deviation that represent a range from “moderate” to “very poor” sorting by the criteria of Folk and Ward (1957). The Monte Carlo analysis shows that the standard deviation among replicate samples (i.e., σrep) is consistently lower for samples binned in 0.5ψ size classes than for those binned by EMAP size classes when the population standard deviation σ is smaller than the EMAP size class interval. Thus, the difference in precision for samples binned by EMAP vs. 0.5ψ size classes disappears for σ = 2.5ψ (Figure 5) when the geometric mean particle size falls within the cobble to boulder size classes (which span intervals of 2ψ), but still persists where the geometric mean falls within the fine to coarse gravel size classes (which span 3ψ; Table 1). More broadly, the difference in σrep between Dgm binned in EMAP vs. 0.5ψ size classes decreases as the heterogeneity of the sampled population of particles increases, from an average of 0.021ψ for σ = 0.75ψ to 0.016, 0.015, and 0.008ψ for σ = 1.5, 2.0, and 2.5ψ, respectively.

Figure 5.

 Standard Deviation of Dgm(ψ) Computed Using EMAP Size Classes (solid lines) and 0.5ψ Size Classes (dashed lines) for 500 Replicate Samples of 105 Virtual Particles From Lognormal Distributions With Indicated Values of Population Geometric Mean (horizontal axis) and Standard Deviation (labels). Thin dashed lines show value of σ/√105 for various values of σ (labels at right) representative of streambed gravels having moderate (bottom) to very poor sorting (top).

Geometric Mean vs. Median Particle Size

For the 15-method comparison sites, Dgm was consistently smaller than D50 (Figure 2) by approximately 20-30%, a difference that was statistically significant (p < 0.0001) for all comparable metrics (Table 2, Section 4). For the visual classification-based metrics (Dgmv, D50v) and the measured values (Dgmt, D50t), the estimated median ratio of Dgm to D50 was 0.71 (95% CI = 0.66-0.76) and 0.74 (95% CI = 0.69-0.79), respectively. The difference between Dgm and D50 is driven by fine sediment, which skews the particle-size distributions toward finer particle size and affects Dgm much more than D50, which is insensitive to extreme values. When sand and silt, which average just 4.3% of the bed surface for these sites (range: 0-11.5%) were excluded, the difference between Dgm and D50 decreased. The median ratio of Dgmv (no fines) to D50v (no fines) was 0.83 (95% CI = 0.79-0.88) and the median ratio of Dgmt (no fines) to D50t (no fines) was 0.88 (95% CI = 0.84-0.93).

For the EMAP West sites, where only the visual size class data were available, we found a similar but somewhat larger difference between Dgm and D50 for streams whose beds were predominantly gravel. In gravel-bed streams (D50 > 2 mm), the Dgm metrics based on visually classified particle size data, both including and excluding bedrock, were approximately half as great as their D50 counterparts (i.e., log2[Dgm/D50]  = log2Dgm – log2D50 ≈ −1.1ψ, or Dgm/D50 ≈ 0.5; Table 4, Section 1). For sand- and silt-bed streams (D50 < 2 mm), Dgm was greater than D50 by a somewhat smaller factor of approximately 1.5-1.6 (log2Dgm – log2D50 ≈ 0.6 to 0.7ψ; Table 4). However, there was considerable variation among sites in the ratio of Dgm/D50 for both fine- and coarse-bedded streams (Figure 6).

Table 4.   Comparisons Between Selected Dgm and D50 Particle Size Metrics for the EMAP West Sites in Figure 1.
ComparisonFilter1NMean of Var1 – Var2Ratio (Var1/Var2)2p-value3
Var1Var2 (ψ)log10Geom. Mean95% LCL95% UCL
  1. 1Criteria used to filter data for comparison. ‘D > 2 mm’ indicates that fine sediment (≤ 2 mm) was excluded from metric calculations; D50v ≤ 2 mm and D50v > 2 mm indicate that comparison applies to subset of sites meeting those criteria.

  2. 2Ratio of Var1 to Var2 in original units (mm) back-transformed from differences in ψ units. Confidence limits should be considered approximate due to the often skewed distribution of the test statistics.

  3. 3From paired t-test for zero mean difference of log-transformed values.

1. Comparisons between Dgm and D50
(including bedrock)
  DgmvD50vD50v ≤ 2 mm4100.710.211.641.521.76<0.0001
D50v > 2 mm897−1.07−0.320.480.460.50<0.0001
(excluding bedrock)
  DgmvD50vD50v ≤ 2 mm4200.620.191.541.441.64<0.0001
D50v > 2 mm887−1.18−0.350.440.430.46<0.0001
(excluding fines and bedrock)
  DgmvD50vD > 2 mm808−0.044−0.0130.970.960.98<0.0001
2. Comparisons between metrics that exclude vs. include fine sediments (<2 mm)
(no fines)(with fines)  
  DgmvDgmvD > 2 mm8081.790.543.453.253.67<0.0001
  D50vD50vD > 2 mm8080.690.211.611.561.67<0.0001
3. Comparisons to evaluate effect of including vs. excluding bedrock and hardpan
(bedrock)(no bedrock)  
  DgmvDgmv%BR + HP < 51,0780.0560.0171.041.031.05<0.0001
5 ≤ %BR + HP < 10910.620.191.531.481.58<0.0001
10 ≤ %BR + HP < 20761.250.382.382.192.59<0.0001
BR + HP ≥ 20622.570.785.954.687.57<0.0001
All sites1,3070.280.0861.221.191.25<0.0001
Figure 6.

 Distribution Dgm(ψ) – D50(ψ) From Visually Classified Particle Size Data (excluding bedrock and hardpan) for the EMAP West Sites in Figure 1. Left and middle plots show data for sites with D50v < 2 mm and >2 mm, respectively. Right plot shows same comparison when fine sediment (<2 mm) is excluded from samples. Box midline and lower and upper ends show median and 25th and 75th percentile values, respectively; whiskers show maximum and minimum observations within 1.5 times the interquartile range above/below box ends; asterisks show outliers; plus indicates mean. Numbers above box plots indicate sample size.

Differences between Dgm and D50 were much smaller and less variable when fine sediments (FN and SA size classes) (i.e., particles <2 mm) were excluded from the metrics (Figure 6). Dgm and D50 estimates from visually classified particles excluding fine sediments were highly correlated (R = 0.98), with the vast majority of sites plotting close to the 1:1 line over the entire range of median particle size in a log-log plot (Figure 7). While the difference between the two metrics was still statistically significant due to the large sample size (log2Dgmv [no fines] – log2D50v [no fines] = −0.044ψ, p < 0.0001, N = 808), it was equivalent to a mean difference of only 3% (Dgmv [no fines]/D50v [no fines] = 0.97, Table 4, Section 1). Excluding fine sediments had a much greater effect on Dgm than on D50. Dgm estimates that exclude fine sediments were approximately 3.5 times as great as those that include fine sediment, while the corresponding ratio for D50 was 1.6 (Table 4, Section 2).

Figure 7.

 Scatter Plot of Dgmvs. D50 Estimated From Visually Classified Substrate Data Excluding Fine Sediments (<2 mm) and Bedrock for EMAP West Sites (Figure 1). Only sites with ≥55 particles after excluding particles <2 mm are included (N = 808). Heavy dashed line is 1:1 line; light solid and dashed lines show regression fit and 95% prediction limits, respectively.

Influence of Bedrock and Hardpan

To explore the effects of including bedrock and hardpan (primarily bedrock) as a size class in computing the geometric mean diameter used in EMAP, we compared estimates of Dgm based on visually classified particle sizes including vs. excluding bedrock and hardpan. Inclusion of bedrock and hardpan increased estimated Dgm for streams where these substrates were present, with the magnitude of the increase increasing as the percentage of bedrock and hardpan increased (Figure 8). For sites with >20% bedrock and/or hardpan, Dgmv (with bedrock) ranged from 0.9 to 8.2ψ larger than Dgmv (no bedrock). In other words, including bedrock and hardpan as a size class at these sites increased the estimated Dgm by a factor ranging from about 2 to nearly 300 (for a site with 46% silt/clay and 46% bedrock/hardpan substrate). For most sites, however, the difference in Dgm estimated from visually classified particle sizes including vs. excluding bedrock was negligible, with a median value of zero. The mean difference of 0.284ψ is equivalent to a 22% difference between Dgmv (with bedrock) and Dgmv (no bedrock) (Table 4, Section 3). Ten percent of sites differed by more than 0.9ψ, equivalent to a nearly twofold difference in the unlogged metrics. For the 82.5% of sites having <5% bedrock and/or hardpan, Dgmv with and without bedrock differed by an average of 0.056ψ, equivalent to a 4% difference (i.e., Dgmv [with bedrock] = 1.04 times Dgmv [no bedrock]; Table 4). Not surprisingly, inclusion of bedrock and hardpan as a large substrate class (Table 1) in the computation of Dgmv has a very large effect on the estimated value of Dgm for sites with a substantial amount of bedrock substrate, but has a negligible effect for the large majority of sites with little or no exposed bedrock in the low-flow streambed.

Figure 8.

 Box Plots Showing Distribution of Dgmv (including bedrock) (ψ) – log Dgmv (excluding bedrock) (ψ) Estimated From Visually Classified Substrate Data for EMAP West Sites (Figure 1) vs. Percent of Substrate Sample Points Classified as Bedrock or Hardpan (see Table 4, Section 3). Plot definition as in Figure 6.

Effective Precision of EMAP Substrate Sampling

EMAP substrate metrics from replicate (within-season) visits to the same site show generally good agreement between visits, with a Pearson correlation coefficient of 0.961 for first vs. second visit values of Dgmv (in ψ units) and 0.947 for Pfs2 (Figure 9). Figure 9 also reveals a small but statistically significant difference between first and second visit estimates of Dgm and Pfs2 (p = 0.01 and 0.05, respectively), with Dgmv increasing and fine sediments decreasing on revisits, which occurred an average of 39 days (interquartile range: 21-47 days) later during the sampling season. At its maximum, the estimated difference (distance of the regression line from the 1:1 line in Figure 9) was about 0.7ψ for Dgmv (equivalent to a 60% increase in second vs. first visit values of Dgmv) at Dgmv ≈ −6ψ (Figure 9A) and approximately −5.5% for Pfs2 at Pfs2 = 100% (Figure 9B). Within-season revisit values differed by less than 1ψ (long-dashed lines on Figure 9A) for Dgmv at 50 of 70 sites (71%), while the median absolute difference was 0.53ψ and the maximum was 2.97ψ. For Pfs2, 59 of 70 sites (84%) had absolute differences between revisit values of less than 10 percentage points, with a median absolute difference of 5.7 points.

Figure 9.

 Visit 1 vs. Visit 2 Values of Selected Habitat Metrics for 70 Same-Year Revisits for the EMAP West Sites. (A) Dgm based on visually estimated particle sizes (excluding bedrock). (B) Percent fine sediment <2 mm, Pfs2. (C) Mean wetted width. Solid line shows linear regression. Inner dashed lines (short dash) show 95% confidence limits for regression line; outer dashed lines are 1ψ (A), 10 percentage points (B) or 0.176 log units (C) above and below a 1:1 line. For reference, the long dashed lines in A are equivalent to a twofold difference in Dgm [i.e., a doubling (upper dashed line) or halving (lower dashed line)] between the first and second visit values, while in C the long dashed lines represent a 1.5-fold difference in wetted width between visits.

The apparent within-season coarsening of substrate (increase of Dgmv and decrease of Pfs2) revealed by the within season resampling data is likely due to a corresponding small but statistically significant (p = 0.0002) decrease in mean wetted channel width between visits (Figure 9C). The mean difference in log (width) was −0.039, equivalent to a median 9% decrease in wetted width between visits. When we compared second vs. first visit values of Dgmv (ψ) and Pfs2 after excluding the first and last observations from each transect (collected at the margins of the wetted channel) we found no significant difference between visits (p = 0.79 and 0.87, respectively). This result, in combination with the observed decrease in wetted width (Figure 9C), suggests that the apparent decrease in Dgm and increase in Pfs2 (Figure 9A and B, respectively) were likely due to changes in the sampling domain (i.e., narrowing of the wetted channel) rather than to any actual coarsening of the streambed. As wetted width decreases, fine sediments preferentially deposited along the channel margins are increasingly less likely to be sampled.

We used the data from pooled same-site revisits within the same season to quantify effective sampling precision of EMAP substrate metrics in absolute and relative terms (Table 5), as described in Methods. We use the term “effective precision” because σrep (a key element of both precision metrics in Table 5) may include actual differences due to changes in the measured site characteristics between visits (as suggested by the small but statistically significant difference in Dgmv and Pfs2 among within-season visits to the same site, discussed above) and/or differences in the exact location of the sampled reach on each visit, in addition to sampling error. The signal-to-noise ratio, S:N, defined previously, is a relative measure of sampling precision that depends on the among-site variance of a metric within the region of interest. Hence, we present S:N values for the entire 12-state EMAP West sampling domain and for selected geographic regions within it. The entire region was partitioned into three domains (Mountains, Xeric, and Plains) by aggregating similar Level 3 ecoregions (Omernik and Gallant, 1986), while the Rocky Mountains and Cascade Mountains are smaller distinct geographic regions (Figure 1). These regions illustrate a range of spatial scales at which the data might be analyzed depending upon the questions or geographic regions of interest.

Table 5.   Precision Metrics and Regional Mean and Between-Site Variance for Selected Substrate Variables Based on Visually Classified Particle Size Data From EMAP West Surveys in the Western United States and Selected SubRegions (2000-2004).1
Study RegionVariableNo. of SitesRegional MeanRegional Std DevσrepS:N Ratio
ψ or %2log10ψ or %2log10ψ or %2log10
  1. 1Total site visits = 957, including 69 same-year revisits and 40 between-year revisits.

  2. 2Units are ψ for Dgm and D50 and % for fine sediment metrics.

Central tendency metrics (Dgm and D50)
 EMAP-WestDgmv8482.200.663.971.200.690.2133
 Mountains(incl. bedrock)5144.041.212.990.9019
 Xeric1770.910.273.801.1430
 Plains157−2.17−0.652.710.8115
 Rocky Mountains1683.561.073.050.9219
 Cascade Mountains824.951.492.140.649
 EMAP-WestDgmv8481.940.583.871.170.700.2130
 Mountains(no bedrock)5143.761.132.860.8616
 Xeric1770.710.213.741.1228
 Plains157−2.44−0.732.590.7814
 Rocky Mountains1683.471.043.020.9118
 Cascade Mountains824.601.391.930.588
 EMAP-WestD50v8482.590.784.291.291.100.3315
 Mountains(no bedrock)5144.641.402.840.857
 Xeric1771.310.394.521.3617
 Plains157−2.48−0.752.920.887
 Rocky Mountains1684.401.323.351.019
 Cascade Mountains825.201.562.010.613
Fine sediment metrics
 EMAP-WestPfs284836.2 28.8 6.0 23
 Mountains51422.4 18.8  10
 Xeric17745.2 30.1  25
 Plains15769.4 21.8  13
 Rocky Mountains16825.2 21.5  13
 Cascade Mountains8218.4 12.0  4
 EMAP-WestPfs1684846.7 28.6 8.2 12
 Mountains51433.0 20.3  6
 Xeric17754.5 27.5  11
 Plains15781.2 16.8  4
 Rocky Mountains16836.0 24.1  9
 Cascade Mountains8227.7 15.8  4

Absolute precision of Dgm based on visually classified particle sizes as measured by σrep was 0.69ψ and 0.70ψ with and without bedrock, respectively (Table 5), corresponding to an approximately 1.6-fold variation in Dgmv. Over the entire 12-state sampling region, the signal-to-noise ratio (S:N) values were 33 and 30 including and excluding bedrock, respectively. D50v (no bedrock) had somewhat lower absolute and relative precision than the Dgm metrics, with a σrep of 1.10ψ and an S:N of 15. For the fine sediment metrics, Pfs2 had better absolute precision than Pfs16 (σrep = 6.0 and 8.2%, respectively), as well as better relative precision (S:N = 23 and 12, respectively). For these metrics, the theoretically achievable precision assuming no classification error or selection bias depends upon the unknown true proportion of particles in the sample population, p, and sample size, n, and is given by [p(1-p)/n]0.5 (Kaufmann et al., 1999, p. 64). Thus, substituting 105 for n and the regional mean values for Pfs2 and Pfs16 in Table 5 for p, the σrep values of 6.0 and 8.2% for Pfs2 and Pfs16 are 1.3 and 1.7 times their respective theoretically achievable values of 4.7 and 4.9% for a 105-particle sample size. In other words, the precision of Pfs2 is within 30% of the best one could hope to achieve for a sample size of 105 observations per site.

Signal-to-noise ratios (S:N) for Dgm based on visually classified particle sizes (with and without bedrock) were generally higher and less variable among regions than were the corresponding values for D50 and the fine sediment metrics. For Dgmv (with and without bedrock), S:N was smaller for each of the individual regions than for the 12-state EMAP West sampling region as a whole (Figure 1, Table 5). This was not surprising, because as geographic extent decreases so too, in general, does the diversity of conditions encountered, leading to lower among-site variance inline image, the numerator of the S:N ratio. For Dgmv expressed in ψ units (or equivalently, in any logarithmic units, regardless of the log base), S:N was 9 and 8 with and without bedrock, respectively, in the smallest region (the Cascade Mountains, N = 82), and well over 10 for the larger regions, indicating that these metrics are sufficiently sensitive to detect and quantify differences in substrate size among sites within these regions (see Discussion). For D50v(ψ) and the fine substrate metrics, the variation of S:N among regions was considerably greater, and the pattern of decreasing S:N with decreasing geographic extent did not hold. While S:N for these metrics was higher for the entire West than for most regions, D50v(ψ) and Pfs2 for the Xeric region were exceptions (Table 5). The percentage of bed surface sediments of sand size and finer (<2 mm), Pfs2, had the highest S:N values of the fine sediment metrics, with only one region that had a consistently low proportion of fine sediments (the Cascade Mountains) having S:N < 10. In contrast, S:N exceeded 10 in only a single region (Xeric) for D50v(ψ) and Pfs16. Thus, among substrate metrics derived from visually classified particle counts for the EMAP West sites, estimates of geometric mean particle size Dgm were significantly more precise than estimates of median particle diameter D50, while among the fine sediment metrics a 2 mm cutoff for “fine sediments” yielded a more useful and precise metric (Pfs2) than did a 16 mm cutoff (Pfs16).

The EMAP sampling protocol prior to the year 2000 included only 55 substrate sample points (11 transects instead of 21). We evaluated the sampling precision for the same substrate metrics under the old protocol by excluding data from the 10 supplemental transects added to the protocol in 2000. The RMSE (σrep) for the 55-particle substrate metrics was only very slightly larger than the 105-particle metrics for Dgmv (0.05ψ, with or without bedrock) and for D50v(0.06ψ). For the fine sediment metrics as well, σrep was only slightly higher for the smaller sample size, increasing from 6.0 to 7.0 percentage points for Pfs2 and remaining at 8.2 for Pfs16 (Table 6). Similarly, S:N values for the 55-particle substrate metrics were only slightly to moderately lower than the 105-particle versions, with Pfs2 showing the biggest decrease (from 23 to 17). These results suggest, somewhat surprisingly, that the visually classified 55-particle estimates of Dgm and D50 have only slightly lower precision that the 105-particle estimates.

Table 6.   Precision Metrics for Selected Substrate Variables Based on Visually Classified Particle Size Data From EMAP West Surveys in the Western United States (2000-2004) for Two At-a-Site Substrate Sampling Intensities.
VariableUnits105 Sample Points55 Sample Points
σstσrepS:NσstσrepS:N
  1. Note: Data represent 957 site visits to 848 sites, including 69 same-year revisits and 40 between-year revisits.

Dgmv (incl. bedrock)ψ3.970.69333.880.7428
log101.200.21 1.170.22 
Dgmv (no bedrock)ψ3.870.70303.770.7526
log101.170.21 1.130.22 
D50v (no bedrock)ψ4.291.10154.201.1613
log101.290.33 1.260.35 
Pfs2%296.023297.017
Pfs16%298.212288.211

Discussion

Sources of Error in Visually Classified Particle Count Data

We assessed the influence of a number of potential sources of error on estimates of the representative bed surface particle size (D50 or Dgm) derived from visually classified particle count data. These potential sources of error include visual classification bias, binning error due to coarse size classes used in visual size-classification schemes, differences between Dgm and D50, and inclusion of nonparticulate substrates (bedrock, concrete, and hardpan) in the computation of Dgm. We discuss each of these potential sources of error below, focusing primarily on differences among metrics and sources of bias.

Classification Error and Observer Bias In our test comparing visually classified vs. measured particle size data for the same set of particles at 15 sites in the Oregon Coast Range, the visually estimated particle size metrics, Dgmv and D50v, were very similar to the measurement-based metrics, Dgmt and D50t (Figure 2), and the median observer bias was negligible at approximately 1-3% (Table 2, Section 2). However, this is a best-case scenario in that all data were collected by a single experienced observer and each particle was measured using a template immediately after assigning a visually estimated size class, providing an ongoing “eyeball recalibration.” For less experienced observers visual estimation bias can be much larger, as reported by the Stream Systems Technology Center (USFS, 2004), which found that Dgm and D50 estimated from visually classified data for five observers averaged 20-30% smaller than estimates based on measured particle sizes for the same sample (Table 3). The cause of this discrepancy was an apparent tendency of all five observers to misclassify some fraction of the “coarse gravel” particles (16-64 mm) as “fine gravel” and some “cobble” particles (64-250 mm) as “coarse gravel” (Figure 4A). This test intentionally included inexperienced as well as experienced observers, and so may represent a “worst-case scenario” (albeit perhaps a realistic one for long-term or large-scale monitoring studies) with respect to observer bias and observer-to-observer variability.

Previous studies have found statistically significant differences among observers even for replicate particle counts in which particle sizes were measured. These differences have been attributed to observer bias, including both measurement bias and particle selection bias (mainly the latter). Based on a random effects analysis of variance, Roper et al. (2002) found that observer variability comprised 11% of total variation in D50 based on 100-particle counts in six Idaho streams each sampled by six or seven different teams, while among-stream variation comprised 89% of the total. Using similar methods, Olsen et al. (2005) reported that observer variability (“sample error”) comprised 14% of total variation in D50 in 20 streams in the Upper Columbia River Basin of the northwestern U.S. that were each sampled by seven observers. Marcus et al. (1995) reported significant differences among observers in D50 (as well as other percentiles) estimated from particle diameters measured using a ruler in a test, in which eight observers each collected five 100-particle samples at two sites. Mean estimates for individual observers ranged from approximately 60-85 mm at one site and from 15 to 30 mm at the other. In a test involving five observers ranging from “veteran” to “novice,”Wohl et al. (1996) also reported significant differences among observers in D50 estimated from independent 100-particle counts (with b-axis diameters measured using a ruler) at a single site, although the magnitude of the differences was not reported. Hey and Thorne (1983) found no significant difference in D50 estimates among eight observers conducting particle counts using a finely graded template on a single point bar for samples of 100 or fewer particles, but did find significant differences for samples of size 120 to 300. They point out that differences among observers become statistically significant with increasing sample size even though the absolute differences remain unchanged.

Binning Error We found that for particle sizes measured using a template, coarsely binned data (EMAP size classes, Table 1) yielded smaller estimates of geometric mean particle size than estimates based on data binned at smaller, uniform intervals of 0.5ψ for 15 sites in the Oregon Coast Range. This bias was only slightly larger than (and opposite in sign to) the observer classification bias, averaging 4-6% (Table 2). In principle, coarsely binned particle size data can yield either a positive or negative bias in D50 and Dgm relative to continuous size data or data binned in smaller intervals. For D50, whether coarsely binned data biases results positively or negatively depends upon whether the true particle size distribution at the finer resolution has convex curvature (Figure 10, Sample 1) or concave curvature (Figure 10, Sample 2) within the coarsely binned size class that contains the median particle size. For a sample whose cumulative size distribution curve is convex within the size class that contains the median particle size, the coarsely binned data will overestimate D50, while for a sample whose cumulative size distribution has concave curvature within the size class that contains the median particle size the reverse will be true. The potential magnitude of binning bias increases as bin width becomes large relative to the range of particle sizes within the sample.

Figure 10.

 Hypothetical Particle Size Distributions for Two Samples, Showing Effect of Binning on Apparent Particle Size Distribution. For Sample 1, binning particles into EMAP size classes shifts the curve to the right (coarser particle size) relative to binning in 0.5ψ increments, while the opposite is true for Sample 2.

Differences Between Dgm and D50 Most summaries of EMAP data report Dgm, which some have argued is the preferred statistic for purposes of spawning habitat assessment (Platts et al., 1979; Shirazi and Seim, 1981), rather than the more widely used D50 (Kaufmann et al., 1999). For samples exhibiting a lognormal particle size distribution, Dgm and D50 are equivalent. Although natural streambed gravels may in some instances be lognormally distributed, and a lognormal distribution is often implicitly or explicitly assumed (Bunte and Abt, 2001), many streams have bed material that exhibits bimodal distributions or other deviations from a lognormal distribution (Church et al., 1987; Rice and Church, 1996). For such streams, significant differences may occur between Dgm and D50. Our data, based on particle counts with visually classified particle sizes from a regional survey of western U.S. streams (Figure 1), show that Dgm is significantly smaller than D50 for most gravel-bed streams in the western U.S., but that the difference largely disappears when particles <2 mm are excluded (Figure 6). For samples truncated at a size of 2 mm, Dgm and D50 estimates based on visually classified particle counts differed by only 3% on average (Table 4). This suggests that Dgm and D50 can be used more-or-less interchangeably as a measure of “average” bed surface particle size without introducing large errors so long as both statistics are based on samples that exclude particles <2 mm, but not otherwise.

However, in excluding fine sediments from particle size statistics, one loses important information about the full particle size distribution. Particularly in studies with an ecological emphasis, such as EMAP, fine sediments cannot be ignored, as they have important ecological significance in terms of habitat suitability for benthic organisms and for fish spawning and rearing (Wood and Armitage, 1997; Kondolf, 2000; Suttle et al., 2004). For such studies it is important to include fine sediments in assessing streambed substrate size, and Dgm, which is more strongly influenced by fine sediments than is D50, may be the more ecologically relevant measure of sediment size where fine sediment is of particular concern. For streams with predominantly fine bed material, however (e.g., the 420 out of 1307 sites in the EMAP West dataset with D50v < 2 mm, Figure 6), Dgm and D50 estimates based on visually classified particle count data should be used with caution, as they may be based on the relative proportions of as few as one or two broad size classes (e.g., FN and SA, Table 1). For such streams, estimates of the percentage of streambed substrate less than a specified size, such as Pfs2, are probably more reliable (and less potentially misleading) quantitative summaries of visually classified particle count data.

Nonparticulate Substrate The EMAP metric of central tendency of substrate size distribution, lsub_dmm (Kaufmann et al., 1999), equivalent to log Dgmv (with bedrock), is somewhat atypical as a measure of streambed substrate size in its inclusion of bedrock and hardpan as substrate size categories. This reflects its origin as a biologically relevant measure of substrate characteristics for stream habitat studies, adapted from the approach of Bain et al. (1985). Whether or not bedrock and hardpan should be included in the computation of a metric that purports to measure substrate size may depend upon the intended use of the data. From a biological perspective (e.g., that of aquatic invertebrates) it makes sense to treat bedrock and hardpan surfaces as equivalent to very large particles. However, this approach clearly is not appropriate if the substrate data are intended primarily to characterize alluvial streambed material for use in sediment transport particle mobility analyses. On the other hand, if one is interested in stream bed stability (i.e., resistance to erosion) as opposed to the mobility or stability of streambed sediments, then Dgmv (with bedrock) may be an appropriate metric to use. Including bedrock and hardpan can greatly increase estimated Dgm when these nonparticulate substrates comprise a significant portion of the streambed (Figure 8).

Precision of EMAP Substrate Metrics

The effective precision of the estimated geometric mean particle size from EMAP West substrate data, expressed as the root mean squared error for same-season revisits, σrep, was 0.7ψ for Dgmv (either including or excluding bedrock, Table 5). This is equivalent to a 1.6-fold variation in Dgmv (e.g., if Dgm were 100 mm, a ± 1 SD range would be from 100 ÷ 1.6 = 62.5 mm to 100 × 1.6 = 160 mm), or roughly half the magnitude of change in D50 due to land-use-related disturbance in Redwood Creek, California, documented by Madej (2001) and much smaller than the roughly order-of-magnitude decrease in D50 reported by Jackson et al. (2001) following clearcut logging in several streams in the Washington Coast Range (see Introduction). This precision compares favorably to values of σrep reported by Kaufmann et al. (1999) for lsub_dmm [equivalent to log Dgmv (with bedrock)] of 0.26 (0.86ψ) for mid-Atlantic streams (n=169 with 50 replicates) and 0.32 (1.06ψ) for Oregon streams (n = 44 with 22 replicates) based on 55-particle samples, which would be expected to be less precise. The EMAP West σrep values for Pfs2 and Pfs16 (6.0 and 8.2, respectively) also compare favorably with corresponding values of 7.7 (Mid-Atlantic) and 11 (Oregon) for PCT_SAFN (Pfs2) and 7.5 and 12 for PCT_SFGF (Pfs16) reported by Kaufmann et al. (1999), and are much smaller than the average increase in fine sediment abundance of 32 percentage points reported by Jackson et al. (2001) following clearcut logging in six catchments in the Coast Ranges of Washington.

How does the precision for the classification-based EMAP particle size metrics compare with precision of standard particle count methods in which particles are measured? In his seminal paper introducing the particle count method, Wolman (1954) reported results of nine samples of 100 particles each from a single stream that were measured by one observer using a ruler and had a mean D50 of 82 mm with a standard deviation of 6.7 mm, or 8.2%. Based on the data in Table 2 of Wolman (1954), σrep for D50 from these nine samples would be 0.12ψ, or approximately one-sixth as large as σrep for Dgmv from the EMAP West study (Table 7). Hey and Thorne (1983) reported data collected by eight observers from a single point bar on the River Severn, in which each observer collected three separate 100-particle samples by pacing within a 5 × 25 m area and measured the particles using a gravelometer apparently graduated in increments of 0.25ψ or less. Their data, reported in log units, had a pooled standard deviation (i.e., including observer variability) of 0.045 log units or 0.15ψ (computed from the data in their Figure 2), or 21% as large as σrep for Dgmv from the EMAP West sites. Marcus et al. (1995) reported data from 100-particle counts with b-axis diameters measured using a ruler by six to ten observers at each of 11 cross sections on two stream reaches. The standard deviation of replicate samples at a cross-section (σrep) for D50 in their data ranged from 0.16 to 0.63ψ and averaged 0.34ψ (calculated from data provided by A. Marcus, personal communication), just under half the σrep value of 0.70ψ for Dgmv in the EMAP West study.

Table 7.   Estimates of the Contribution to Sample Error (standard deviation of replicate samples) in Dgm and D50 From Selected Sources of Variability.
Component of VariabilityDgmD50Reference
ψlog10ψlog10
Total EMAP revisit error (reach-scale)0.700.211.100.33This study (Table 5)
Particle selection and measurement (single channel unit or bar)
 One observer0.120.036Wolman (1954)
 Multiple observers0.150.045Hey and Thorne (1983)
 Multiple observers0.16–0.630.05–0.19Marcus et al. (1995)
Measurement only
 Ruler0.0260.0080.0110.005USFS (2004)
 Template0.0180.0060.0150.003USFS (2004)
 Visual classification0.290.090.250.07USFS (2004)
Binning (EMAP size classes)<0.03<0.01This study (Figure 5)

The visually classified EMAP data exhibit variability that is five to seven times greater than that reported by Wolman (1954) and Hey and Thorne (1983) and about double the average standard deviation for replicate samples obtained by Marcus et al. (1995). However, the latter studies are based on sampling a single channel unit, bar or cross section on a single day or several consecutive days (also, the Wolman data do not include observer variability), so they would be expected to have inherently lower variability than the EMAP samples, which incorporate greater spatial (reach-scale) and temporal variability (typically, several weeks between within-season revisits). Assuming only random sampling error and excluding any error due to sampling methodology (e.g., particle selection or classification bias), sampling precision is determined by the population standard deviation of particle diameter, σ, and the number of particles collected, n. For sediments with a lognormal size distribution, the expected standard deviation for replicate samples (i.e., σrep) is σ/n½. Thus, σrep will be higher for more poorly sorted sediments than for sediments with better sorting (Figure 5). In mountain gravel-bed streams with grain sizes ranging from sand to boulders, σ is typically in the range of 1.5-2ψ (Bunte and Abt, 2001, p. 69), so expected σrep for samples of 105 particles from random sampling error alone would be in the range of 0.15-0.2ψ (Figure 5). On the other hand, the reach-scale design of EMAP particle size metrics may provide greater reliability as an assessment tool. Sennatt et al. (2006), for example, compared five commonly used methods for assessing embeddedness and reported that the EMAP method (not evaluated in this article) and one other transect-based method (USGS NAWQA) performed better than the other methods, which were not transect-based. They suggest that this is because “mulitiple channel transects capture both the lateral and longitudinal variability that exists between stream morphologies in a given reach” and thus avoid potential bias of methods that may sample only within a particular channel unit or morphologic type (Sennatt et al., 2006:1679).

Classification and binning error also account for some of the additional variability of Dgmv in the EMAP West dataset relative to values reported for traditional pebble counts in which particles are measured using a template or ruler. Based on our Monte Carlo simulation analysis, however, the potential effect of binning by EMAP size classes on sampling precision is very small (<0.03ψ) (Figure 5), and we conclude that binning error makes a relatively insignificant contribution to overall effective sampling precision of EMAP substrate data (Table 7). Hence, the among-observer variability for visually classified particle sizes in the Stream Systems Technology Center test (USFS, 2004) can be attributed primarily to visual classification bias that varies among observers (Figure 4A). This error (0.25ψ and 0.29ψ for D50 and Dgm, respectively) is approximately double the combined errors associated with particle selection plus measurement (0.15ψ) reported for multiple observers by Hey and Thorne (1983) (Table 7) and slightly less than the average value of 0.34ψ based on the data of Marcus et al. (1995). These results contrast with the very low classification bias that we obtained for a single experienced observer (Table 2) and highlight the importance of thorough and consistent training to reduce bias and variability when relying on visually based particle size estimates.

It is likely that the largest contributor to σrep for the EMAP substrate metrics is spatial imprecision in locating sample reaches in the field. Because the reaches are typically not flagged in the field or otherwise monumented, but are instead identified by latitude and longitude or map location and verified by a combination of GPS coordinates (in most cases) and field notes (e.g., distances upstream/downstream of tributary confluences or location relative to other local landscape or cultural features), revisits to the same target sampling location may often not sample exactly the same reach as previous or subsequent visits. In addition, because the EMAP sampling protocol calls for a reach length of 40 times the average wetted channel width rounded to the nearest meter (Lazorchak et al., 1998), variations in flow conditions or differences in starting location may cause the sampled reach length to vary between visits to a given site. Hence, the actual sampled reach for different visits to the same site may only partially overlap, or in rare cases may not overlap at all. Significant differences in measured stream characteristics that are unlikely to change within a sampling season (e.g., gradient, abundance of instream wood, riparian vegetation characteristics, etc.) and notes on the field forms suggest that this is a contributing factor to most of the larger outliers in Figure 9.

The signal-to-noise ratio is a relative measure of precision that provides a useful measure of the capability of a habitat metric to discern differences among streams. If anthropogenic changes in habitat characteristics are similar in magnitude to the differences in those characteristics observed among streams within a region, then S:N is also a useful predictor of the metric’s potential to detect the impacts of anthropogenic disturbances on stream habitat within a region (Kaufmann et al., 1999). S:N also determines the potential usefulness of a metric in regression analyses, because it determines the r2-value for two perfectly correlated but imperfectly known variables that are subject to random, unbiased measurement errors that are uncorrelated between the two variables (Allen et al., 1999; Kaufmann et al., 1999). Under those assumptions, the theoretical maximum r2 that can be obtained between two variables that have signal-to-noise ratios of S:N1 and S:N2, respectively, is given by Kaufmann et al. (1999)

image(1)

By Equation (1), the theoretical maximum r2 for two variables each having an S:N of 1 is only 0.25, while r2max increases to 0.56 for two variables having S:N = 3, 0.69 for S:N = 5, 0.77 for S:N = 8, and 0.83 for S:N = 10. It is clear from these values that sampling error and/or short-term fluctuations (within the index period) limit the usefulness of metrics with signal-to-noise ratios below about 3 for associational analyses involving regression or correlation techniques, since such metrics would yield low r2-values even if they were strongly correlated with some predictor variable(s). In contrast, sampling error would have minimal impact on the usefulness of metrics with signal-to-noise ratios above 8-10.

Potential Uses of EMAP-Style Substrate Monitoring Data and Suggested Improvements

What are suitable uses of classification-based substrate data, such as the EMAP substrate metrics? Given that S:N values for log Dgmv (with and without bedrock) were 8 or greater (in most cases, well above 10) for all regions for which we computed precision metrics (Table 5), these metrics have adequate precision for use in associational analyses relating substrate characteristics to other sampled or derived physical characteristics, such as channel slope or bankfull shear stress; to biological characteristics, such as “biotic integrity” indices for aquatic invertebrates or fish; or to landscape or riparian condition/disturbance metrics, such as road density or land cover/land use information. For example, Kaufmann and Hughes (2006) found that a fish-based index of biotic integrity was significantly inversely related to an index of bed stability based on log Dgmv (with bedrock), particularly in streams with sedimentary lithology. Among the fine sediment metrics, Pfs2 had the best precision both in absolute terms (σrep) and in relative terms, with S:N ≥ 10 for all regions tested except the Cascade Mountains (where the abundance of fine sediments was uniformly low). Based on EMAP data including only 55 (rather than 105) substrate observations per site, Larsen et al. (2004) estimated that a trend in the regional mean of Pfs2 of 2% per year (e.g., a change from 20 to 20.4% over one year) could be detected with 80% likelihood and 95% confidence within 15 years using a network of as few as 20 sites monitored annually. The above examples suggest that the visually classified particle count data used in EMAP surveys have adequate precision for monitoring status and trend of streambed substrate at the regional scale and for relating spatial or temporal variations in substrate conditions to the biological condition of streams and to riparian and watershed disturbances, such as land use change.

However, several simple changes could significantly improve the substrate monitoring component of the EMAP sampling protocol to maximize the usefulness of the data obtained for multiple monitoring objectives. First, σrep could be decreased (i.e., sampling precision could be increased) simply by monumenting the location and fixing the length of reaches where revisits are planned. Second, it is clear that sampling precision could be significantly improved for a very modest increase in effort by measuring particles using a ruler or template rather than relying on visual classification, replacing visual classification error with a much smaller measurement error (Table 7). Measuring particles, whether by ruler or template, should add no more than 15-20 min of field time for a 100-particle count. By decreasing σrep, this change could significantly increase the statistical power of EMAP monitoring data to detect temporal trends in bed surface particle size (Larsen et al., 2004). Finally, for sand- or silt-bed streams, a composite grab sample (e.g., one sample per transect) could be collected for laboratory sieve analysis if particle size distribution data are desired. Such data would be particularly useful for surveys in areas, such as the Great Plains, where bed material in most streams consists predominantly of fine sediments (Table 5) and particle counts have limited utility.

A separate issue is the representativeness of the substrate sample. The EMAP protocol uses transect-based samples tied to the wetted channel dimensions. This is representative for purposes of assessing aquatic habitat under relatively low-flow conditions occurring for most of the year, but may not accurately characterize the particle size distribution for the entire streambed that is the population of interest for sediment transport studies (and which also comprises benthic habitat at higher flows). Also, sampling only within the wetted channel makes the results flow-dependent, presumably contributing to the small but statistically significant difference we found between earlier and later same-season samples in the EMAP West dataset. Sampling substrate across the entire active (unvegetated) channel width would overcome these limitations and would yield data that might be more suitable for a broader range of uses. This could be accomplished either by collecting a second set of substrate observations from sample locations tied to the active channel width or by collecting a single set of substrate observations on transects across the entire active channel at an increased frequency (e.g., nine observations per transect rather than five) that would ensure a sufficient number of observations to characterize both the wetted and active channel. The trade-off is that either approach would require a significantly increased level of sampling effort.

Summary

We examined the overall precision of particle size and fine sediment abundance estimates based on visually classified particle count data collected as part of the USEPA’s EMAP program and the relative contribution of several sources of error or uncertainty in these estimates using a large dataset including samples from more than 1,200 sites in 12 western states. The EMAP data contain several potential sources of error in addition to the random sampling error (due to natural variability) and observer bias in particle selection present in standard pebble count methods. These include classification error (misclassification of particle size), binning error resulting from the use of broad size classes to quantify the size of individual particles, and the effect of including (or not) bedrock as the largest size class. In addition to these factors, we also assessed the difference between the geometric mean particle size (Dgm) commonly used in EMAP assessments and the more commonly used median particle size, D50.

We found that the broad (2ψ) size classes used in EMAP have a relatively insignificant effect on the precision of estimated Dgm or D50, but that variable classification bias among observers can contribute as much as 20% to the RMSE of particle size estimates, although this can be reduced by careful training. Including bedrock as a size class can substantially increase particle size estimates where bedrock covers a significant fraction of the bed – at nearly 10% of sites, estimated Dgm increased by a factor of two or more when bedrock was included vs. when it was excluded. However, the vast majority (82.5%) of sites in our regional survey had <5% bedrock, and for these streams Dgm estimates including and excluding bedrock differed by an average of only 4%. The difference between Dgm and D50 can also be quite large. For gravel-bed streams (D50 > 2 mm), Dgm was, on average, slightly less half as large as D50. Excluding fine sediments (<2 mm) reduced the difference between Dgm and D50 to only about 3%. Therefore, Dgm and D50 cannot be used interchangeably in general, but it may be safe to do so if fine sediments are a minor component of the particle size distribution or if they have been excluded from both samples.

The EMAP particle size estimates have significantly lower precision than traditional pebble count methods, with an RMSE ranging from approximately double to as much as seven times as great as that reported in published studies we are aware of evaluating the precision of particle count methods with multiple observers. However, the EMAP particle size data are reach-scale estimates and would be expected to have greater variability than the plot-scale values in reported in these other studies due to the greater range of particle sizes likely to be encountered in a reach-scale sample. We conclude that the precision of EMAP estimates of Dgm and D50, as well as the EMAP estimate of percent fine (<2 mm) sediments, Pfs2, is adequate for use in regional habitat assessments and long-term monitoring studies, although this precision (particularly for purposes of trend monitoring) could be improved significantly by implementing some fairly simple suggested changes to the monitoring protocol.

Acknowledgments

We thank Ellen Wohl, Tom Lisle, and John Potyondy for thoughtful reviews that improved this paper. We are also indebted to George Robison, Phil Larsen, and Mostafa Shirazi for helpful reviews of early drafts of the manuscript. We thank John Potyondy for providing the data used in Table 3 and Figure 4, and Andrew Marcus for providing an electronic copy of data from Marcus et al. (1995) used in Table 7. The authors are indebted to John Van Sickle and Tom Kincaid for statistical assistance, Curt Seeliger and David Cassell for SAS programming help, and Sue Pierson for GIS and graphics support. Curt Seeliger, Marlys Cappaert, Meghan Arbogast, Suzanne San Romani, and Jana Seeliger provided essential data management and QA services in assembling the EMAP West dataset used in this study; and Ryan Taylor assisted in collecting field data for the “method comparison sites.” We are indebted EMAP for project design and implementation and to USEPA Regions 8, 9, and 10 and cooperating agencies in the states of Arizona, California, Colorado, Idaho, Montana, Nevada, North Dakota, Oregon, South Dakota, Utah, Washington and Wyoming, which cooperated in data acquisition for the EMAP West assessment. John Faustini’s participation in this project was supported by a Post-doctoral training coop (CR 831682-01) at Oregon State University funded by a USEPA Office of Research and Development Regional Methods Initiative (RMI-99-B-1b). This manuscript has been subjected to review by the National Health and Environmental Effects Research Laboratory’s Western Ecology Division and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.

Ancillary