A method for deriving water-quality benchmarks using field data



The authors describe a methodology that characterizes effects to individual genera observed in the field and estimate the concentration at which 5% of genera are adversely affected. Ionic strength, measured as specific conductance, is used to illustrate the methodology. Assuming some resilience in the population, 95% of the genera are afforded protection. The authors selected an unambiguous effect, the presence or absence of a genus from sampling locations. The absence of a genus, extirpation, is operationally defined as the point above which only 5% of the observations of a genus occurs. The concentrations that cause extirpation of each genus are rank-ordered from least to greatest, and the benchmark is estimated at the 5th percentile of the distribution using two-point interpolation. When a full range of exposures and many taxa are included in the model of taxonomic sensitivity, the model broadly characterizes how species in general respond to a concentration gradient of the causal agent. This recognized U.S. Environmental Protection Agency methodology has many advantages. Observations from field studies include the full range of conditions, effects, species, and interactions that occur in the environment and can be used to model some causal relationships that laboratory studies cannot. Environ. Toxicol. Chem. 2013;32:255–262. © 2012 SETAC


Over the last 30 years, chemical, physical, and biological monitoring programs have been developed to document conditions and trends in the environment 1. Among the potential uses of this empirical information is the characterization of causal relationships and prediction of effects 2–5. These field-based data sets include paired measurements of physical, chemical, and biological attributes that can be used to develop benchmarks for desired or adverse effects. Commonly, benchmarks and criteria have been derived by modeling the distribution of species sensitivities 6 as expressed by standard acute and chronic toxicity test data 7. In the United States, criteria are derived from 5th percentile hazardous concentration (HC05) of those species sensitivity distributions (SSDs), so they are intended to protect at least 95% of species in an exposed community based on a set of representative genera.

Laboratory-based methods have been an effective approach for developing criteria for many pollutants 2, 3, 7. However, some pollutants and effects do not lend themselves to laboratory testing. Migration, predation, and other behaviors are not expressed; tests of large species are logistically prohibitive. Some life stages, such as spawning, are seldom included. Endangered species are protected from routine testing. Complex exposure pathways and bioaccumulative chemicals are not readily tested. Susceptible species and sensitive life stages may be difficult to maintain and test in the laboratory. Effects that involve interactions among species are not included. Long-term effects due to short-term exposure (e.g., reproductive effects resulting from exposure during a critical stage of development) are not measured. In addition, the relative sensitivity of most species is not known a priori, and it is impractical to test even a substantial fraction of the species inhabiting an ecosystem. Also, some exposures are impractical to replicate, such as highly variable concentrations and interactions within mixtures and with the environment. In sum, SSDs based on laboratory tests cannot replicate the full range of ambient exposures, effects, and interactions.

A potential solution is to use field observations in addition to laboratory toxicity tests. Approaches using field data have properties that are advantageous 6, 8. The option of using field data is permitted by the European Water Framework Directive 3 and has been recommended by the U.S. Environmental Protection Agency (U.S. EPA) Science Advisory Board 9. It has been specifically recommended for suspended sediment benchmarks and nutrient criteria in the United States 4, 10, 11.

The present study describes an approach for identifying protective thresholds from field data that is inspired by the U.S. EPA's laboratory-based method 7, 12. For both the laboratory-based method and the field-based method, the exposure–response relationship is an SSD. A benchmark differs from criteria or standards in that it is not mandated by regulation but does provide scientific information to support decision making in various contexts. The method has been used to derive a benchmark for ionic strength as measured by specific conductance. That derivation and its data set are described in detail in a companion article in this issue 13, but the derivation is also used to illustrate aspects of the method in the present article.


A decision to develop a benchmark using a field-based approach is likely to be initiated because undesirable biological effects have been observed in the field. The same monitoring programs that detect the effects also measure biological and water-quality parameters that can be used to develop a field-based benchmark. However, before using field data to derive a benchmark that will be used to avoid or reduce undesirable effects, it is necessary to determine if a field-based approach is suitable. Some considerations are size of the data set, ability to characterize effects on many sensitive taxa, and a range of exposures sufficient to cause a full range of effects. For more discussion, see the section Data Set Selection and Adequacy in the present report and an example in Cormier and Suter 8, this issue.

Overview of method

We followed the standard U.S. EPA methodology in aggregating species to genera and using the 5th percentile of the SSD as a protective threshold 7. However, instead of using median lethal concentrations (LC50s) or other laboratory test end points, we developed a measure of extirpation. By extirpation, we mean the depletion of a population of a genus to the point that it is no longer a viable resource or is unlikely to fulfill its function in the ecosystem 14. An extirpation concentration (XC) is the ambient concentration beyond which a taxon is rarely observed in its original habitat or range. The 95th percentile of the observed occurrence of a genus is estimated and referred to as the XC95. These XC95 values are the foundation of a field-based SSD (Fig. 1).

Figure 1.

Example of a species sensitivity distribution (SSD) depicting the proportion of genera extirpated with increasing ionic strength measured as specific conductance. Insets are examples of generalized additive models of the occurrence of four genera, Ephemerella, Stenonema, Isonychia, and Cheumatopsyche. Vertical lines indicate 95th percentile extirpation concentration (XC95) values. Each point on the SSD plot represents an XC95 value of one of the 163 genera arranged from the most to the least sensitive. The 5th percentile hazardous concentration (HC05) used to define the benchmark is at the 5th percentile of the SSD (dotted horizontal line) and occurs at 295 µS/cm. Data source: West Virginia Department of Environmental Protection. Graphs adapted from U.S. Environmental Protection Agency 12.

Criterion assessment

At the core of a criterion assessment (assessments that derive standards, criteria, benchmarks, etc.) is the idea that models of causal relationships can be used to estimate exposure levels that will provide a desired level of protection. Performing an assessment involves three main activities: planning and problem formulation, performing the analysis, and interpretation, synthesis, and reporting of the findings 5, 15.


Before any analysis can begin, there needs to be a clear understanding of what will be assessed and how it will be assessed. By identifying the important components of the problem, the assessor is more likely to skillfully address the key issues and provide a solution for an environmental problem that is credible and compelling. The outcome of the planning and problem formulation process is a plan for quantitatively modeling the relationship between the causal agent and the biological effect that is used to identify the benchmark. The plan also includes selection of thresholds for characterizing the biological effect and for identifying the benchmark.

Reason for developing a benchmark

A benchmark is developed because there is a request or mandate to set limits or remedial goals for a particular agent or as a screening value to determine whether it should be further assessed in a particular case. The impetus for a benchmark may include information from prior condition or causal assessments that identifies a causal agent and biological effect. It may also include policy goals and constraints. All of this information is used by assessors in the problem formulation.

For our example case, based on the derivation of a benchmark for ionic strength, the initiator was ultimately the U.S. Clean Water Act; but, more immediately, U.S. state and federal agencies discovered reduced diversity of aquatic life and undesirable changes in Appalachian stream communities associated with minerals leaching from mountaintop mines and valley fills 16, 17. We chose to develop a benchmark for ionic strength in freshwater streams to inform decisions regarding protection of streams with respect to evaluations of mine permits.

General description of the causal relationship

The causal relationship between the causal agent and the deleterious effect needs to be deeply understood by the assessor and communicated to the audience of the assessment. In the process of assembling the information, the assessor gains a deeper understanding of the problem and insights about the best ways to measure and model the causal relationship. Factors that may affect development of the benchmark emerge. For example, for ionic strength, reports in the physiological literature might indicate that available laboratory test species prefer moderately hard water. Therefore, these species would not be useful for modeling effects that occur in soft water in which ionic strength measured as specific conductance is low. The assessor would then consider either using a field-based approach or finding suitable laboratory test species.

A conceptual diagram can provide a framework for data collection and analysis and a template on which to organize and present results. The diagram depicts the linkages among the sources and the intermediate steps leading to the exposure of the causal agent and the resulting biological effect. It is particularly useful for describing chemicals with complex environmental chemistries, dietary exposures, or indirect effects. Also, if a causal assessment is necessary, a conceptual diagram depicts the network of prior causal relationships that led to the exposure–response relationship in the field and may suggest how to develop evidence that the agent does or does not cause the effect. Conceptual diagrams are available for many chemical and physical agents that cause biological impairments 18 and need not be developed de novo.

The outcome of describing what is known about the causal relationship is a clearer understanding of the problem and the background information needed to perform the assessment. The causal agent is described, as well as what is known about the sources, fate and transport, exposure routes, background levels, the agent's mode of action and effects, and any factors that can affect the exposure. The entities that are affected are identified, and the alteration that occurs due to the exposure is described. In general, a detailed description of the relationship between the cause and the effect emerges. Based on that knowledge, the assessor chooses the assessment end points and the measures that represent those end points.

For the ionic strength benchmark assessment, the biological effect of concern was reduced benthic invertebrate diversity 16, 17, which was attributed to increased ionic strength with a specific ionic matrix of Ca2+, Mg2+, equation image, Cl, and SOmath image. The ionic matrix was identified based on published reports 16, 19 and data sets used in the benchmark report 13. Specific conductance, hereafter referred to as “conductivity,” was selected as the measurement method because the biological effect is due to the mixture, not a single ion, and because measuring conductivity is fast, inexpensive, and reliable. A literature review 19 and a regional causal assessment 8 were performed to document what was known about the toxicity of the ionic mixture, to ensure that the benchmark would address a true cause, and to communicate the extensive but dispersed scientific knowledge about the causal relationship. The causal assessment described sources of bicarbonate and sulfate ions, mechanisms of action, and specific effects. It showed that high conductivity causes extirpation of sensitive species 8. Conceptual models of the source to effect pathways were provided along with a literature review 19.

This characterization of the causal relationship need not be repeated if a benchmark is developed for a different region. Now, if an ionic strength benchmark is developed for a region outside of Appalachia with a similar ionic matrix of Ca2+, Mg2+, equation image, Cl, and SO2−, the new assessment would simply summarize salient points and cite previously published sources. This is because the causal assessment characterizes the general response of representative organisms to elevated ionic strength from this mixture 13.

Assessment end points

Assessment end points represent the actual environmental values to be protected and are defined by an ecological entity (species, community, etc.) and an attribute, such as survival, growth, and reproduction 20. The valued resource for this aquatic life benchmark is biotic communities. For this case, the entities are macroinvertebrate genera and the desired attribute is occurrence. Macroinvertebrates are appropriate assessment entities because they occur in all but the poorest-quality streams, they are highly diverse, they are commonly monitored, and they are affected by many different natural and human-made agents. Extirpation of genera is an appropriate assessment attribute because it is easy to understand that a serious adverse effect has occurred when a genus is lost from an ecosystem.

Genus extirpation concentration

Selection of a taxonomic level

Species are aggregated to the genus level because species within the same genus tend to have similar susceptibility 21. Although this assumption is not always valid, 5% of affected genera has become the generally accepted threshold of environmental change in the United States 7. Effect levels may be different for species within a genus due to niche partitioning afforded by naturally occurring causal agents such as ionic strength 22, 23. Hence, an apparently salt-tolerant genus may contain both sensitive species and tolerant species. However, genus is often the lowest level at which invertebrates are identified in field studies, so the genus level may be imposed by the data.

Selection of a model of the exposure–response relationship

In this field-based method, the occurrence of a genus at sampling sites relative to levels of the causal agent is modeled by a weighted cumulative distribution of frequency (CDF). Prior to selection of this model, we compared results using alternate models for estimating an XC, including logistic regression, a generalized additive model, unweighted CDF, and other options. The HC05 varied between 260 and 313 µS/cm depending on the method. A weighted CDF model was selected because the HC05 value fell within the range of the other methods and it was computationally simple and better understood by reviewers than either logistic regression or a generalized additive model.

Selection of the threshold of extirpation of a genus

Extirpation is operationally defined as the point at which 95% of the observations of the genus occur. The 95th percentile was selected because it is more stable than the maximum value, yet it still represents the extreme of an organism's tolerance of the agent. The maximum value is sensitive to phenomena such as organism drift from clean tributaries, misidentifications, episodic dilution of the agent by high flows, and other false occurrences.

Estimation of the measurement end point for each genus, the XC

The concentration that causes extirpation of a genus at the 95th percentile, XC95, is the exposure level that extirpates a genus—that is, the level beyond which the least sensitive among them is rarely observed. The XC95 is estimated by two-point interpolation at the extirpation threshold for a genus, the 95th percentile. For additional details see the section Analytical Methods and the example application in Cormier et al. 13, this issue.

Extirpation of a proportion of genera

Selection of a model to estimate extirpation of a proportion of invertebrate assemblage

In this field-based method, the exposure–response model is an SSD constructed from the ranked XC95 values.

Selection of the benchmark threshold

Like the laboratory-based method 7, this field-based method uses the 5th percentile as the threshold proportion of genera; hence, no more than one in 20 genera should be lost from an aquatic community due to the benchmark agent.


A field-based approach requires a large data set with certain characteristics. The adequacy of the data set can be judged by the following attributes: measurements of the agent must be paired in space and time with biological sampling; high-quality sites are included in the data set; background levels (e.g., of conductivity) are similar throughout the region; characteristics of the agent (e.g., ionic matrix) are similar across the region; some biological sampling occurs when sensitive genera are likely to be collected (e.g., March through June); the gradient is broad enough to include weak effects and strong effects; data are available to evaluate potential confounding factors; and there is an independent data set or other means to validate the benchmark.

An objective of the assessment is to include as many appropriate genera as possible; however, some exclusions are inevitably necessary. Analyses can be performed to assess adequate sample size. In our experience, a genus occurring at a minimum of 20 sites provides a reliable estimate of extirpation 13. Usually, an SSD that includes a minimum of 100 invertebrate genera provides a reliable estimate of HC 13. Inclusion of many genera and the proportionate inclusion of sensitive genera ensure that the model of the SSD represents the 5th percentile. The lack of sensitive genera in the SSD can be a problem in regions where many species are already extirpated or samples do not represent enough high-quality sites. Species that require disturbance or pollution are excluded by eliminating those that are not observed at reference sites. The data set may be truncated to minimize confounders. Methods for evaluating potential confounders of the model of the causal relationship are described in the section on related assessments in Suter and Cormier 24, this issue.

In the case of the ionic strength benchmark assessment, sites were removed that were not in the defined geographic region, that had a different ionic matrix, that had a pH < 6, and that were sampled with a different method. The data set was screened so that all taxa were identified to genus and were observed at least 25 times in the data set. Genera were excluded from analysis if they did not occur at least once at a reference site. The final SSD contained 163 genera.


Analytical methods used to develop a benchmark include developing exposure–response models for each genus and estimating XC95 values, developing exposure–response models for the proportion of genera extirpated, and estimating the HC05 (Fig. 2). Other analyses estimate natural background, characterize uncertainties, and characterize the effect of confounders on the causal models.

Figure 2.

Details of the estimation of the exposure level that causes harm to the valued resource. Multiple arrows indicate multiple inputs. Single arrow indicates single input.

Derivation of the XC95

For each genus meeting the data-selection conditions, a cumulative distribution function was constructed that was weighted to correct for any potential bias from the unequal distribution of sampling of sites across the range of conductivity 13. This weighted CDF represents the proportion of observations of a genus with respect to increasing exposure levels. The extirpation effect threshold for a genus was 95% of the total occurrence of the genus. The two exposure levels bracketing the 95th percentile were interpolated to give an XC95 for a genus. For example, if there was no weighting and 200 observations of a genus, the exposure level (e.g., conductivity) for the 190th highest rank-ordered occurrence of that genus would be the XC95.

The assessor first defines equally sized bins. Bin size depends on the data set and requires balancing the requirements of sufficient observations in a bin to define the proportion and sufficient bins to define the form of the response. In the example case using conductivity, each bin was 0.017 log10 conductivity units wide, which spanned the range of observed conductivity values, a total of 60 bins.

Next, the assessor weights the bins. The assigned weights are wi = 1/ni, where ni is the number of samples in the ith bin.

The value of the weighted cumulative distribution function, F(x), is computed using the following equation for each unique observed value of the agent x associated with observations of a particular genus

equation image(1)

where xij is the agent's level in the jth sample of bin i, Nb is the total number of bins, Mi is the number of samples in the ith bin, Gij is true if the genus of interest was observed in the jth sample of bin i, and I is an indicator function that equals 1 if the indicated conditions are true and 0 otherwise.

The XC95 value is defined as the value of x for which F(x) = 0.95. Equation 1 is an empirical cumulative distribution function, and the output is the proportion of observations of the genus that occur at or below a given exposure level. However, the individual observations are weighted to account for the uneven distribution of observations across the range of conductivities. See Figure 3 for examples of weighted CDFs.

Figure 3.

Examples of weighted cumulative distribution of frequencies. Extirpation can be estimated from the graph on the left but not from the graph on the right. Horizontal broken line is at the 95th percentile. Vertical line intercepts the x axis at the 95th percentile extirpation concentration for each genus. Data source: West Virginia Department of Environmental Protection. Graphs from U.S. Environmental Protection Agency 12.

This method for calculating the XC95 will generate a value even if the genus is not extirpated. For example, the occurrence of Nigronia changes little with increasing conductivity (i.e., the cumulative distribution is linear [see Fig. 3]). Because of the data distributions, not all 95th percentiles correspond to extirpation, and some imprecisely estimate the extirpation threshold. The CDFs (Fig. 3) and scatter plots (Fig. 1) should be visually inspected for anomalies; if there is no clear trend in the response or if the response does not include extirpation, the XC95 can be given a qualifying assignation such as either ∼ or >. The assignation of > or ∼ does not affect the HC05 if they are above the 5th percentile, but it alerts users of the uncertainty of the XC95 values 13.

Derivation of the HC05

The SSDs are cumulative distribution plots of XC95 values for each genus relative to levels of the agent (Fig. 1). The cumulative proportion for each genus P is calculated as P = R/(N + 1), where R is the rank of the genus and N is the number of genera. Tolerant genera coded as ∼ or > are included. They are reported as “greater than” values. Their inclusion assures that N is the correct number of genera, but they do not contribute substantially to the HC05 because they fall in the upper portion of the SSD. The HC05 is derived by two-point interpolation between the XC95 values bracketing P = 0.05 (i.e., the 5th percentile of modeled genera).

Uncertainty of the benchmark

The uncertainty of the benchmark and other model outputs can be estimated as depicted in Figure 4. Bootstrap estimates of the XC95 are derived for each genus used in the derivation of the benchmark by sampling with replacement to obtain a number of samples equal to the number of observations in the data set 25. From each bootstrap sample, the XC95 is calculated for each genus by the same method applied to the original data. Typically, that process is repeated 1,000 times to create a distribution of XC95 values for each genus. These distributions are then used to calculate a two-tailed 95% confidence interval on the XC95 for each genus.

Figure 4.

Diagram depicting the process for estimating the uncertainty of the 5th percentile hazardous concentration (HC05). This same process can be adapted to evaluate different numbers of occurrences of genera, numbers of samples, or other parameters of the model such as exclusion of potential confounders.

The uncertainty in the HC05 value is evaluated by generating an HC05 from each of the 1,000 sets of bootstrapped XC95 estimates. The distribution of 1,000 HC05 values is used to generate two-tailed 95% confidence bounds on these bootstrap-derived values (Fig. 5). The same process is used to evaluate the effect of different sample sizes or exclusion of genera using different database selection criteria [for an example, see 13].

Figure 5.

Cumulative distribution of the 95th percentile extirpation concentration values for the 36 most sensitive genera (dark circles) and 95% confidence intervals (dotted lines) based on 1,000 bootstrapping results. Each small gray dot represents a species sensitivity distribution probability for each bootstrapping iteration.

For the ionic strength benchmark assessment, the HC05 was 295 µS/cm. The benchmark varied <5% when SSD models were constructed from ≥20 occurrences of each genus. The HC05 stabilized with >800 samples and >110 genera in the SSD.

Background estimation

If background for a causal agent is not available from the literature or previous assessments, it should be estimated from the data set used to develop the benchmark. If background is already known, confirmation of the background using the working data set strengthens the assessment. By “background,” we mean the levels of the agent that represent natural conditions. In some cases, the current baseline for an agent may be greater than natural background, and this poses challenges for estimating natural background for a region.

Characterization of the unaltered background of the agent is important to ensure that benchmarks are not set within that background range and to define where the benchmarks are relevant. Background levels may be estimated from reference sites, which are sites that are judged to be among the best within a category. However, because disturbance is pervasive, reference sites are not necessarily pristine or representative of natural background. Many reference sites have unrecognized disturbances in their watersheds or recognized disturbances that are less than most others in their category. Some may have extreme values of an agent or unusual conditions at the time the sample was taken. When estimating background concentrations, it is conventional to use only the best 75% of reference values. That cut-off percentile is based on precedent and on the collective experience of U.S. EPA field ecologists 4.

Alternatively, background values may be estimated using samples from a random or probability-based design 26. Such samples include all waters within the sampling frame, including impaired sites. To characterize the best streams, the 25th percentile is commonly used by U.S. EPA field ecologists 4. In some regions, there may be no undisturbed streams. If land cover modification is pervasive, a percentile <25% may be justifiable.

For the ionic strength benchmark assessment, background was estimated using both reference site samples and a random sample. The background was well below the final benchmark, and lower estimates were obtained using the random sample (116 µS/cm) and selected reference sites (150 µS/cm).

Treatment of confounders

Because field observations are uncontrolled, unreplicated, and unrandomized, they are subject to random relationships and to confounding. Confounding is the appearance of apparently causal relationships that are due to noncausal correlations. Noncausal correlations and the inherent noisiness of environmental data can obscure true causal relationships. We suggest reducing confounding, as far as possible, by identifying potential confounding variables; determining their contributions, if any, to the relationships of interest; and eliminating their influence when possible and as appropriate based on credible and objective scientific reasoning. A method to assess the potential effect of confounders is described separately 24. For the ionic strength benchmark assessment, pH < 6 was identified as a likely confounder and the data set was truncated to minimize the influence of acidity and associated dissolved metals.


The analytical results include the list of genera and their XC95 values, the HC05 and confidence bounds, and plots of distributions of the occurrences of genera with respect to the causal agent. A separate concise summary of this set of results and the background is usually appreciated by most audiences.

Synthesis of considerations

Because many decisions must be made in deriving a benchmark, it is useful to provide an accounting of the more substantive ones. For example, in the ionic strength benchmark assessment, rationales were provided for the use of a field-based method, treatment of mixtures, region of applicability, influence of uncertainties, background, choice of taxa, special cases such as important species, inclusion of reference sites, range of exposure, seasonality, life history, sampling methods, and treatment of causation and potential confounding. These considerations may have an effect on the final benchmark, such as where and when it is relevant.

Describe the benchmark

The final benchmark is provided with uncertainty bounds. The limits of the geographical range, season, water body type, or other constraining factors are described. Qualitative uncertainties may also be explained. The level of scientific review should be stated.

For the ionic strength benchmark assessment, the benchmark was obtained by rounding the HC05 to two significant figures 7. This resulted in a benchmark of 300 µS/cm. The benchmark was limited to three ecoregions within the states of Kentucky and West Virginia for waters having an ion matrix in which equation image + equation image ≥ Cl on a chemical mass basis. The process and analyses were subject to a panel review by representatives from academia, industry, and government as members of the Science Advisory Board 9.


We have adapted the U.S. EPA's standard method for deriving water-quality criteria so that it can be used with data from the field. In a separate article in this issue 13, we demonstrated its application to aqueous ionic strength in two regions of West Virginia, USA. The standard method is well supported by experience in application and is based on the SSD approach, which is scientifically defensible and used internationally 6. We hypothesize that if the waters are managed to the 5th percentile of the field-derived SSD, the invertebrate assemblage will be similar to reference conditions and will support designated uses.

This field-based method helps to address some of the difficulties encountered using SSDs derived from toxicity testing when the agent is not amenable to toxicity testing or effects are not measured by standard toxicity tests 8. Use of field data allowed us to evaluate realistic exposures to mixtures for many species throughout all life stages and complex interactions involving mating, competition, and predation. Because more species are represented compared to a laboratory-derived SSD, there is more confidence that the SSD represents the effects of the more susceptible taxonomic groups. In particular, some aquatic insects were found to be much more sensitive to ionic strength than the standard test species.

Local extirpation of species is an unambiguous end point that can be clearly communicated to nonscientists. However, the implications of extirpation are still matters of research. As previously mentioned, an invertebrate genus may represent several species, and this approach identifies the level that extirpates all species within that genus (i.e., it is the level at which even the least sensitive species is rarely observed). Thus, a false sense of protection may occur. In exploratory studies by us and as exemplified by the plots for each genus (Figs. 1 and 3, and Appendix E 12), the frequency of occurrence declines well below a genus's XC95 value. For some groups, such as fish, birds, and mammals, identification to species is relatively easy and would alleviate concerns about underprotection. However, it is important to remember that this is a concern only if the genera below the 5th percentile have more species than those above or have a wider range of sensitivities among species. Otherwise, the proportion of species extirpated is the same as the proportion of genera.

Alternative end points should be explored. For example, a percentage reduction in abundance may be used in place of the XC values. The U.S. EPA used a 20% reduction of abundance of distinct taxa and incorporated that into an SSD for sediment 10, and Relyea et al. 27 developed a sediment-sensitive index. Linton et al. 28 used a 20% reduction in abundances of individual macroinvertebrate families in the field, estimated by quantile regression. Changes in abundance may be more appropriate for highly valued taxa such as fish. It is important to avoid aggregate metrics such as EPT richness (the number of taxa of Ephemeroptera, Plecoptera, and Trichoptera) and indices such as the Index of Biotic Integrity because they obscure the responses of susceptible taxa even more than an analysis at a genus level of aggregation.

Ecological relationships are dependent on environmental conditions. For that reason, when we used the method, we restricted it to a well-defined region and a relatively homogeneous set of streams with similar sources and similar ionic mixture. It will be important to develop experience and guidelines for developing and using field-derived benchmarks. Development using data from regions where many genera are already extirpated would result in a benchmark that would protect only the remaining tolerant species. Field data that are collected after susceptible taxa have emerged, such as flying insects (usually in the spring), will miss sensitive taxa that are present only as eggs or early instars and could result in the extirpation of many genera. In the case of ionic strength measured as conductivity, inclusion of other ionic mixtures may also lead to greater XC95 values that are not protective for the ionic mixture evaluated in the case example 13.

Data selection is clearly important to the success of this method. Data should be abundant, consistent, and of high quality. Quality assurance is essential. Filtering, truncating, and other manipulations of the data are important to avoid bias, reduce confounding, minimize extraneous variance, and so on. However, it is important to justify such data set manipulations. In all cases, the results for truncated data sets should be compared to those for the full data set so that the implications are understood.

The field-derived form of the SSD method has different limitations from the toxicity test method for developing benchmarks or other protective thresholds. This method requires a different set of analytical skills and greater application of judgment. As with epidemiology, this method requires judgments concerning data selection, treatment of confounding variables, and characterization of cumulative exposures. Validation with an independent data set increases confidence in the model. Field-derived and toxicity test SSDs may provide complementary information. If both laboratory and field methods can be performed, this may provide a check for the protectiveness of benchmarks derived by either method. The field method incidentally identifies particularly sensitive taxa that may be useful for elucidating mechanisms and for targeted laboratory testing or field monitoring.

As is the case for developing criteria from laboratory toxicity tests, the sensitivity distribution is a model of how species in general respond to an agent and does not require that the species or genera be the same in all applications or at all locations. Therefore, a benchmark may be applicable outside of the region for which it was derived if there is no contradictory information, such as evidence that undisturbed background is naturally greater than the benchmark or the characteristics of the agent are dissimilar.

The methodology may be practical for issues besides ionic strength. It is most likely to be useful for agents that are commonly measured and commonly cause effects. We expect that it would be amenable for dissolved oxygen, suspended and deposited sediment 10, 27, nutrients, temperature 29, organic matter, metals, and other naturally occurring agents. It is less likely to be useful for new anthropogenic agents because a full range of exposures and effects might not yet have occurred. De novo field-based benchmarks require that the ranges of exposures and effects are sufficient to model the causal relationship, that causation is determined, that confounding factors can be reasonably controlled, and that appropriate exposure and response data are available or obtainable.

We have presented a method for developing benchmarks from field data. We believe it provides a method for setting achievable management goals that are directly relevant to real biotic communities. We encourage further development of methods for modeling exposure–response relationships in the field and using the method with other agents. As more difficult cases are addressed, more complex methods may be required. Nevertheless, we expect that causal relationships observed in the natural environment will create a stronger connection between what we know about the environment and what we do to protect it.


We appreciated comments from our many reviewers. In particular, we thank C. Delos, M. Passmore, J. VanSickle, P. White, C. Schmitt, C. Menzie, C. Hawkins, and members of the U.S. EPA Biological Advisory Committee. We also thank the following members of the U.S. EPA Science Advisory Board for their careful review, interdisciplinary insights, and encouragement: D. Patten, E. Boyer, W. Clements, J. Dinger, G. Geidel, K. Hartman, R. Hilderbrand, A. Huryn, L. Johnson, T.W. La Point, S.N. Luoma, D. McLaughlin, M.C. Newman, T. Petty, E. Rankin, D. Soucek, B. Sweeney, P. Townsend, and R. Warner. The article is based on work supported by the U.S. EPA. The views expressed in the present study are those of the authors and do not necessarily represent the views or policies of the U.S. EPA.