We compileda data base of 250 dischargers across the United States and examined relationships between standardized Ceriodaphnia dubia and Pimephales promelas (fathead minnows), whole effluent toxicity (WET) test endpoints, and instream biological condition as measured by benthic macroinvertebrate assessments. Sites were included in the analysis if the effluents were not manipulated before testing (e.g., dechlorination), and standardized biological and physical habitat assessment methods were used upstream and directly downstream of the discharge. Several analyses indicated that fish endpoints were more related to instream biological condition than Ceriodaphnia WET endpoints. Dischargers that failed <25% of their tests had ≤15% chance of exhibiting instream impairment. Effluent dilution was the strongest factor affecting relationships between WET and observed biological conditions. Effluents that comprised >80% of the stream under low-flow conditions exhibited better relationships between WET and instream condition than effluents with greater dilution. Effluents that comprised <20% of the stream had a low probability of exhibiting impairment, even if several WET test failures were observed over a 1-year period. Fish acute and chronic WET information could predict instream biological conditions; however, WET compliance, based on 7Q10 stream flow, was consistently conservative. Our results indicate that WET was more predictive of instream biological condition if several tests were conducted, more than one type of test was conducted, and endpoints within a test were relatively consistent over time.
As part of the U.S. Environmental Protection Agency's (U.S. EPA) goal to ensure that designated beneficial uses of the nation's surface waters are met, whole effluent toxicity (WET) tests have been incorporated within the National Pollutant Discharge Elimination System permit system for many industrial and municipal dischargers. Incorporating WET tests into the permit process implies that the results obtained from such tests are an acceptable means of determining compliance with designated uses. One of the underlying issues is the ability of laboratory WET endpoints to efficiently predict effluent effects on receiving-stream biota .
The degree of relationship between WET results and actual instream ecological effects of an effluent discharge is a critical question that cuts to the very heart of the debate concerning the U.S. EPA's policy of independent application . Under this policy, wastewater discharge compliance is judged by the independent results of each form of monitoring (chemical, biological, and toxicological) against its own standards or criteria. If any one form of monitoring suggests noncompliance with its standard or compliance criteria, the discharge is typically considered noncompliant with federal and state regulations. Many have suggested that a weight-of-evidence approach, integrating results of all monitoring information, may be a more appropriate way to judge wastewater (or ambient water) compliance with the Clean Water Act. One obstacle in implementing a weight-of-evidence approach in permit compliance evaluation is the uncertainty as to when (i.e., under what conditions) certain WET endpoints or WET compliance limits are effective predictors of effluent effects on receiving stream resources. To help address this issue, the Water Environment Research Foundation supported this project to examine relationships between standard freshwater end-of-pipe WET tests typically required of permit holders and actual receiving-stream effects as judged by standard measures of biological condition.
1. Effluents that were subjected to more WET tests have stronger relationships between WET and instream conditions than effluents subjected to fewer tests. State agencies currently require different monitoring schedules ranging from annual or semiannual to quarterly or monthly . It is not clear whether these different monitoring frequencies yield similar predictions of impact potential for a water body, even for the same facility. Furthermore, WET tests are inevitably a snapshot in time, whereas instream biological assessments generally integrate results over longer exposures (weeks to months [4–6]). Therefore, if WET tests are performed relatively infrequently, or actual effluent toxicity varies considerably over time, a skewed interpretation of potential instream effects may result. We assumed that better-characterized effluents (via more frequent WET testing) should exhibit stronger relationships between WET and instream conditions if the test endpoints are ecologically relevant.
2. The relationship between WET and stream biota will be strongest for effluent-dominated systems and weakest for systems that greatly dilute the effluent. Biota in effluent-dominated sites were assumed to be more prone to toxicity effects, if present, because there is greater exposure to the effluent . Also, effluent-dominated sites typically have a restricted species pool available, which would result in slower recovery, less resilience, and, thus, a higher likelihood of biological impairment in the presence of effluent toxicity .
3. The relationship between WET and stream biota will be strongest for acute test endpoints and weakest for chronic test endpoints, particularly sublethal endpoints. We assumed that an effluent that is acutely toxic will be more likely to exhibit effects instream than a chronically toxic effluent, depending on the effluent dilution available, because acutely toxic effluents are likely to have greater effects on biota.
4. The relationship between WET and instream biota will be weak or nonexistent when there is high variability in the WET endpoint value among tests over time for a particular effluent. We assumed that an effluent with consistent results over time probably has a more consistent toxicity exposure instream and should, therefore, have more predictable effects than an effluent that was reportedly variable. Consistently non-toxic effluents should be associated with nonimpairment, whereas consistently toxic effluents should be associated with instream impairment, assuming there is appropriate effluent exposure.
This study focused on freshwater systems only. We examined U.S. EPA acute and chronic (7-d) fathead minnow and Ceriodaphnia dubia WET tests because these are the most commonly used freshwater WET tests in the United States . Furthermore, because these tests have been conducted by many laboratories for several years, quality assurance procedures are well documented , thus enabling us to use high-quality WET test data in this study. Most of the WET data used in this study were based on definitive multiple concentration tests rather than pass/fail screening tests. However, we made no attempt to gather and evaluate actual dose–response data for each test in this study but rather relied on the test endpoints reported by the state for a given test.
Instream biological condition was characterized using data from the benthic macroinvertebrate assemblage, as was done in previous comparative studies [8–10]. Benthic invertebrates are recognized as sensitive, relatively nonmobile, ecologically important members of aquatic systems and are therefore generally regarded as appropriate indicators of aquatic ecological condition as a whole [5,11,12]. Collection and data analysis methods for this assemblage have been standardized and routinely used by more regulatory organizations in the United States than methods for most other types of fauna. The U.S. EPA rapid bioassessment protocols (1989  and more recent modifications) and similarly rigorous sampling and analysis methods used by the U.S. Geological Survey National Water-Quality Assessment Program [14,15], Ohio Environmental Protection Agency , and Florida Department of Environmental Protection [17,18], for example, have been shown to be relatively precise and sensitive to a number of types of stressors. In addition, these methods use habitat evaluation procedures designed to separate habitat effects (such as stream channelization or livestock impacts) from water quality effects caused by a point source, such as an effluent [13,19]. Furthermore, quality assurance procedures for these methods are well established, which ensured the quality of instream aquatic data used in this project. Whenever possible, biological data were obtained only from sources using well-established, standardized methods.
Because this was a retrospective study, no new data were collected. The vast majority of data (97% of all dischargers identified) were obtained directly from state agencies to ensure that valid WET test and acceptable biological assessment data were used. Several types of supporting data were also collected in this study to help characterize effluents and the discharge conditions, including design and average dilution instream, types of contaminants of concern in the effluent, and instream habitat quality. In many cases, these data were also provided by state agencies. In some cases, information was obtained through other databases (e.g., types of discharge contaminants from the U.S. EPA's Permit Compliance System or Publicly Owned Treatment Works facility information from the U.S. EPA's NEEDS Survey).
Data were evaluated using several predetermined criteria to ensure that results derived from our analyses were robust: (1) WET tests used effluent representative of what was discharged (i.e., no laboratory effluent dechlorination or pH adjustment before or during WET testing); (2) WET tests met current U.S. EPA quality assurance requirements; β,) WET tests were performed before but no more than 1 year before the bioassessment; (4) upstream and downstream bioassess-ment data were available, and the downstream collection site was within the mixing zone of the effluent; (5) stream habitat quality was similar up- and downstream of the discharge; and (6) no point or nonpoint sources, other than the discharge of interest, were present between the up- and downstream sites.
These criteria were intended to minimize many of the limitations associated with previous studies that potentially confused or masked relationships between WET and instream biological condition. Of the 250 municipal and industrial waste-water dischargers from which information was initially obtained, 92 dischargers met all of the criteria listed above and had sufficient supporting information for in-depth statistical analyses of the relationship between WET and instream conditions.
Data manipulation and statistical analyses
Although the raw data were useful for some analyses in this study, most of the data needed to be modified in some way (e.g., categorized, averaged, or compared with some other data) to address our hypotheses. Several WET variables were derived on the basis of the raw data to formulate expressions of cumulative toxicity (either by WET endpoint or for all endpoints combined) for a given facility. On the basis of the characteristics of all dischargers represented in the database, we empirically derived classifications or categories for effluent dilution, pollutant type, and a variety of other potential non-WET factors that were used in statistical analyses in this study.
A key component of this study was to define the degree of effluent toxicity for each facility. To accomplish this, we often needed to integrate results of several tests for a given facility, both for a given WET endpoint and among endpoints. Whole effluent toxicity results were integrated for each facility in two general ways. First, we computed an average value for each WET endpoint reported and then compared the average value to certain pass/fail criteria commonly used by state agencies (Table 1). Second, for each facility, we determined whether each test endpoint passed or failed its criterion and then summed the number of failed tests. This sum was then compared with the total number of tests reported by that facility for that endpoint and expressed as the percentage of tests failed for each endpoint. We also summed all acute or chronic test failures (both species) for each facility to derive a total percentage of failure for all acute and chronic tests. In addition, a final sum and percentage of all test failures was computed for each facility.
Table Table 1.. Summary of whole effluent toxicity test endpoints and test pass/fail criteria used in this researcha
Test pass/fail criteria
a IWC = instream waste concentration; LC50 = median lethal concentration; NOEC = no-observed-effect concentration.
Average relative difference between up- and downstream ≥0.15, fail
To test our hypotheses, we statistically compared data for many dischargers to properly evaluate effects of various factors, such as effluent dilution, type of WET endpoint, and types of WET results on relationships between WET and instream biological condition. In a statistical sense, then, the various dischargers in the database were treated as samples, and those having similar characteristics for the particular analysis under investigation were treated as replicates. Whole effluent toxicity data for a given endpoint were comparable across sites because standardized test procedures were used.
For instream biological data, data standardization was necessary because overall assessment scores (e.g., rapid bioas-sessment protocol score) for instream biological condition may have used different metrics and/or scoring criteria in different states and because receiving-stream fauna were not likely to be similar across sites. To address this challenge, we established an objective scoring system to define the relative measure of biological condition up- and downstream of a discharge point that could then provide a standardized measure of relative impairment for each facility.
We determined that the most reliable measure of biological effects due to an effluent was a relative difference between up- and downstream biological metrics or scores for each site. The relative difference measure is commonly used by analytical chemists to express the precision (or, alternatively, the disparity) between replicate samples. This was a reasonable approach for comparing different sites because, for any given facility, identical instream assessment methods were used in the up- and downstream sites. Therefore, a measure of the relative change in biological condition between up- and downstream should be comparable across dischargers and should be a measure of relative biological impairment due to a discharge. If only individual metrics were available for a facility, the average relative difference was calculated as follows:
where N is the total number of metrics measured at each site. All macroinvertebrate assessments used in this study measured several sensitive community metrics, including number of taxa present; number of Ephemeroptera (mayfly), Plecoptera (stone-fly), and Trichoptera (caddisfly) taxa; some form of pollution tolerance biotic index (e.g., Hilsenhoff's Biotic Index); and dominance (based on percentage) of a single taxon. If a biological score was available (e.g., invertebrate community index or rapid bioassessment protocol score), we then computed the relative difference on the basis of the up- and downstream scores, and no summing or averaging was necessary.
Using the above formula, the relative difference ranged between −1.0 and +1.0, where a positive value indicated a higher degree of downstream impairment. A value near 0.0 indicated little or no difference in faunal composition up- and downstream of the discharge.
Both categorical and continuous WET endpoints and biological condition endpoints were used in our analysis. Standard WET endpoints (e.g., median lethal concentration [LC50] and no-observed-effect concentration [NOEC]) are continuous variables in the sense that they could theoretically be any value between zero and 100% effluent. Other continuous variables were the biological relative difference values and percentage and number of failed tests per site. These variables were used in linear regression analyses (Statistica, version 5.0, Statsoft, Tulsa, OK, USA). A significance level of 0.05 was used in all analyses, and the number of independent factors in regression models was limited to those that increased the R2 value of the model to decrease the chance of a type I error . Categorical variables were pass/fail interpretations of WET endpoints or biological condition that were used in χ2, log-linear analysis of variance and multiple discriminant analysis (Statistica). Other categorical variables used in this study included habitat quality and facility type. These analyses also used a significance level of 0.05. Both categorical and continuous variables were analyzed in this study because relationships between WET and instream biological condition might take the form of a continuous response in some cases but could involve threshold or noncontinuous responses in other cases.
RESULTS AND DISCUSSION
Characteristics of dischargers in the database
The initial database included 250 dischargers from 15 states (Fig. 1), representing a total of 1,311 WET tests (acute and chronic) and 304 instream macroinvertebrate assessments. A large proportion (47.5%) of the dischargers had a design instream waste concentration (IWC; based on stream 7Q10 and average effluent flow values reported by the state) of <20% effluent, and many of the remaining dischargers (27%) had a design IWC >80%. Nearly half of the facilities examined (50.6%) had effluent flow of <1.0 million gal/d; 78.2% had flows of <5.0 million gal/d, and 5.7% had flows of >20 million gal/d. The largest facility in this database had an effluent flow of 150 million gal/d.
Type of WET tests
Ceriodaphnia dubia acute and chronic tests were more commonly performed than the minnow tests (Table 2). In fact, most dischargers (64%) reported data for only one or two types of WET test. Only 16% of dischargers reported data for all four types of WET test. Thus, the available data were limited in the sense that we could not achieve a balanced statistical design in this study. For this reason, analyses are presented for individual WET endpoints as well as for WET test results as a whole. The large percentage of dischargers conducting Ceriodaphnia acute or chronic tests in our database may be a reflection of increased reliance by states on Ceriodaphnia as a test species (in some state programs, Ceriodaphnia is the only test species required). This result may also be due to the greater sensitivity of Ceriodaphnia to a number of physico-chemical constituents as compared with the fathead minnow , which could lead to more frequent testing with Ceriodaphnia by National Pollutant Discharge Elimination System permit holders.
For the majority (83%) of those dischargers conducting Ceriodaphnia chronic tests, the reproduction NOEC was lower than the survival NOEC and was, therefore, the most sensitive NOEC endpoint used in our analyses. The fish chronic sublethal endpoint (growth) was often less sensitive than survival; growth was more sensitive than survival in only 18% of cases. Thus, the most sensitive fish chronic NOEC for a given facility was mostly based on survival.
Table Table 2.. Summary of pass/fail percentages by whole effluent toxicity (WET) test endpoint for all facilities
% Facilities passing on average
% Facilities failing on average
Total no. of facilities
a IWC = instream waste concentration; LC50 = median lethal concentration; NOEC = no-observed-effect concentration.
c Pass/fail criteria for chronic endpoint (NOEC): pass = NOEC/IWC > 1; fail = NOEC/IWC < 1.
Biological assessment interpretation
Our analyses required distinguishing impairment from no impairment instream among dischargers. After much investigation, we determined that a relative difference of 0.15 was the most efficient threshold value. We confirmed the use of the relative difference approach and this impairment threshold value in two ways. First, we compared biological results for the 43 sites examined by Eagelson et al.  with results using the relative difference approach. Excellent agreement was demonstrated between the C. dubia chronic test and benthic macroinvertebrate results. Figure 2 shows the range of relative difference values we calculated in relation to the assessments previously made for the same dischargers by Eagleson et al. Of the sites assessed as unimpaired (based on a relative difference value of <0.15), 2.5% were reported as impaired by the state. Chi-square analyses indicated excellent agreement (p < 0.01) using a relative difference impairment threshold of 0.15; substantially less agreement was observed when a relative difference threshold value of either 0.2 or 0.1 or values outside this range were used (p > 0.3). Thus, overall agreement between our classification of these sites (based on a relative difference threshold value of 0.15) and the state's assessment was 97.5%. Similar agreement was observed using data from Ohio and Virginia.
Second, we examined the sensitivity of the relative difference calculation by comparing the effect of relative difference threshold values of 0.12, 0.15, and 0.17 on the resulting assessment using data from sites in North Carolina, Ohio, and Virginia (N = 100 sites). We observed slight differences in the percentage of sites judged as impaired (4-12%) depending on whether we used 0.12 or 0.17 as the threshold value as compared to 0.15, which indicated that the potential misclas-sification rate was very small. Therefore, results of analyses, using 0.15 as the biological impairment threshold value, appeared to be appropriate.
Hypothesis 1: Is there an overall relationship between WET and instream biological condition?
Our initial examination of WET and instream relationships examined the percentage of agreement between overall WET results and instream biological condition for each facility. Similar to what was presented by the U.S. EPA  and Eagleson et al. , four possible outcomes were considered: (1) WET information for a facility indicated unacceptable toxicity and the stream was classified as impaired (agreement on detecting impairment); (2) WET information indicated no toxicity and the stream was classified as unimpaired (agreement on detecting nonimpairment); (3) WET indicated unacceptable toxicity and the stream was not impaired (disagreement on detecting impairment); and (4) WET indicated nontoxic conditions and the receiving stream was impaired (disagreement on detecting nonimpairment).
Whole effluent toxicity can consist of multiple endpoints for a given facility depending on what types of tests are conducted. Therefore, to determine the percentage of agreement between WET and instream condition, we initially chose a test failure rate of 20% as our threshold for assessing a given effluent as having unacceptable toxicity: Dischargers that failed >20% of their tests (based on the standard pass/fail criteria listed in Table 1) were judged to have unacceptable toxicity. This threshold is consistent with the way in which many states currently regulate WET testing. We observed that nearly all 92 dischargers in the selected database had conducted at least five WET tests, which suggested that the 20% WET test fail rate would be appropriate.
Unlike previous results, we observed relatively poor agreement between WET and instream results overall using a simplistic analysis (Fig. 3a). To test the uncertainty surrounding this result, we conducted several additional analyses using slightly higher or lower relative difference impairment threshold values (0.12 and 0.18 instead of 0.15) to categorize impaired versus unimpaired sites, and we used more conservative and more lenient test fail rates (10 and 30% instead of 20%) to determine whether an effluent was classified as having unacceptable toxicity. Results of these additional analyses indicated that, at most, there was an 8% difference in our original percentage agreement estimates. Similar results were obtained when we based WET results on only acute or chronic toxicity test results (Fig. 3b).
One factor affecting this analysis was the preponderance of dischargers with little or no observed toxicity in WET tests, unlike many of the effluents tested in previous studies. Another factor affecting this analysis was that most of the dischargers in our database did not perform all four types of WET tests, as mentioned previously. Of dischargers that passed their WET requirements and that exhibited instream impairment, more than half (55%) did not report chronic test data of any kind. In contrast, of dischargers that failed their WET requirements and that exhibited no apparent change in biological condition, most (62%) conducted Ceriodaphnia chronic and acute testing but not fathead minnow testing.
The above results suggest two important ramifications for WET programs as currently practiced. First, there is nearly a 50% probability that toxicity exhibited in WET tests may not be reflected instream, even for those effluents exhibiting a relatively high test failure rate (>90%). Second, there is roughly a 20% probability that impairment may be observed instream even though WET did not indicate reasonable toxicity potential, depending on which type of tests were conducted.
Hypothesis 2: Are relationships between WET and instream condition stronger with increased WET test frequency?
The proportion of dischargers that demonstrated agreement between WET results and instream biological condition generally increased with WET test frequency as hypothesized (Fig. 4). Nearly 50% of dischargers conducting more than seven WET tests over a one-year period exhibited agreement between WET and instream results, whereas <10% of dischargers that conducted only one test exhibited agreement. These results, however, also indicate that there is a significant amount of disagreement between WET and instream results even for effluents that were subject to a relatively high frequency of WET testing. Again, because this was not a balanced statistical design with all facilities conducting all types of tests, these results suggest that it may be inappropriate to rely on results from a single type of test in WET monitoring.
To further examine the effect of test frequency on relationships between WET and instream condition, we compared various effluent compliance scenarios on the basis of different percentages of WET tests passed (using the pass/fail criteria in Table 1). One-way analysis of variance, examining test frequency (log-transformed to meet assumptions of normality) in relation to biological condition (impairment vs no impairment) indicated a significant effect (p = 0.049, F = 4.69) for dischargers that failed between 5 and 25% of their tests. Dischargers that consistently passed their WET tests and had conducted at least four tests had a ≤ 15% chance of being associated with instream biological impairment. Although not significant, the trend was reversed for sites failing >50% of their tests (p = 0.123; F = 2.87). Thus, relationships between WET and instream biota appeared to be stronger if dischargers either frequently passed or frequently failed commonly used WET compliance criteria, consistent with our hypothesis. These results also indicate that instream biological condition was more predictable if a facility had conducted more than one WET test in a year. A similar idea was expressed by Mount .
Linear regression analysis indicated a direct relationship between WET test frequency (number of tests conducted overall or number of tests conducted for a specific WET endpoint for each facility) and relative difference value (R2 = 0.44, p < 0.001, F = 8.52). Dischargers that tended to fail more WET tests also tended to exhibit instream biological impairment. This result may be a reflection of the fact that effluents that show unacceptable toxicity in WET testing may be subject to more retesting, and, therefore, a greater number of tests, than an effluent that passes toxicity limits or monitoring criteria. Indeed, many states require follow-up testing in National Pollutant Discharge Elimination System permit monitoring if a given test yields unacceptable toxicity results .
Hypothesis 3: Is the relationship between WET and instream biological condition strongest for effluent-dominated systems and weakest for systems that greatly dilute the effluent?
Forty percent of sites with effluents making up >80% of the stream under design flow conditions exhibited agreement between WET and instream condition. More dilute effluents exhibited substantially less agreement between WET and instream condition, as hypothesized. Effluents that made up >80% of the stream under reported 7Q10 conditions had a significant relationship between WET and instream biological condition if WET compliance was defined as passing between 75 and 85% of all WET tests conducted (χ2, p < 0.05). These results suggest that even in effluent-dominated systems, WET noncompliance based on a single WET test failure, or even a 15% WET test failure rate, may not be associated with impaired benthic community depending on the type of WET end-point and habitat features (see below).
Hypothesis 4: Are relationships between WET and instream biological condition stronger for acute and weakest for chronic WET endpoints?
Log linear analysis indicated that the fathead minnow chronic growth endpoint (NOEC/IWC) was significantly related to biological condition (p < 0.05, Fig. 5). The fish chronic survival endpoint (NOEC/IWC) and the fish LC50 also exhibited higher test fail rates with stream impairment but were significant at p < 0.10. Ceriodaphnia LC50 and chronic survival endpoint (NOEC/IWC) exhibited a greater fail rate in association with nonimpairment (Fig. 5), opposite of what was expected if these endpoints were predictive of stream biological conditions. No improvement in relationships was observed using the LC50/IWC for either species, indicating that accounting for dilution in acute testing was not more reliable than using an LC50 <100% effluent as a pass/fail criterion. The Ceriodaphnia chronic reproduction endpoint (NOEC/IWC) showed no relationship with instream biological condition. Thus, fish acute and chronic endpoints were more related to instream condition than Ceriodaphnia endpoints in this analysis. Fish acute endpoints did not display stronger relationships with instream condition than fish chronic end-points.
For each endpoint, we also examined the relationship between dischargers that would be judged as noncompliant for a given WET endpoint (using the pass/fail criteria in Table 1) and resulting biological condition. Generally speaking, relationships were poor, because most endpoints had between 25 and 40% agreement with the instream assessment and p values of >0.2 (Table 3). Again, only the fathead minnow chronic endpoint (with noncompliance based on either an average NOEC/IWC of < 1.0 or a >25% failure rate) and the fathead minnow LC50 (using a 25% failure rate) were most related to instream condition, with each exceeding 50% agreement. Adding categorical expressions of either effluent dilution or habitat quality to this analysis did not improve the relationship between WET failures and instream impairment. However, we did observe that in all cases, the effluent comprised at least 20% of the design stream flow and habitat quality was at least satisfactory for aquatic life production.
We used forward stepwise multiple discriminant analyses to determine which complement of WET variables best explained the observed variability in stream conditions across sites. Ceriodaphnia acute WET test endpoints were deleted from these analyses because there was an inverse relationship between this endpoint and stream biological condition in this data set (Fig. 5). Results of this analysis indicated that the fish chronic endpoint (NOEC/IWC) was the most significant variable in the model (Table 4). Number of failed fish acute tests, design effluent dilution, and percentage of WET tests failed overall were also included in the model but had p values >0.18. Thus, fish chronic endpoints were as or more related to instream biological conditions than fish acute endpoints. The discriminant model correctly classified sites 63% of the time in terms of whether they were judged as impaired or nonim-paired, but predictive accuracy for impaired sites was relatively poor (Table 4).
In a final test of this hypothesis, we conducted analyses on a subset of dischargers that reported few (<10%) or no failed acute WET tests (defined as an LC50 of <100%) and that had conducted more than one chronic WET test. A total of 36 dischargers met these criteria. Log-linear analysis of these data indicated no significant relationships between either species chronic NOEC endpoint and instream biological condition downstream of the discharge (p > 0.20 for both species). Similar results were obtained using the Ceriodaphnia chronic lowest-observed-effect concentration (LOEC)/IWC and the fish growth LOEC/IWC for each facility (Fig. 6). Average fish survival LOEC/IWC was the only WET endpoint that was related to biological condition (p < 0.05). Thus, chronic fish survival appeared to be a reliable predictor of instream condition, independent of acute WET results. These results also suggest that the chronic WET failure criterion of NOEC/IWC < 1.0 may be conservative in many instances.
Table Table 3.. Summary of percentage of agreement observed between whole effluent toxicity test endpoint failure and instream impairment
a Test species: C. dubia = Ceriodaphnia dubia; P. promelas = Pimephales promelas. Endpoints: LC50 = median lethal concentration; NOEC = no-observed-effect concentration.
C. dubia 48-h LC50
Based on average endpoint
Based on 25% failure rate
P. promelas 96-h LC50
Based on average endpoint
Based on 25% failure rate
C. dubia 7-d NOEC
Based on average endpoint
Based on 25% failure rate
Based on 15% failure rate
P. promelas 7-d NOEC
Based on average endpoint
Based on 25% failure rate
Hypothesis 5: Are relationships between WET and instream biological conditions stronger when WET intertest variability is low for a given facility?
Several analyses presented previously suggested that WET endpoint variability over time affects observed relationships between WET and instream conditions as hypothesized. As a further test of this hypothesis, we analyzed the effect of intertest WET variability for the fathead minnow chronic LOEC endpoint because this endpoint exhibited a relatively strong relationship with instream biological condition as compared with other WET endpoints examined (Fig. 6). Sites that exhibited low intertest endpoint variability (defined as the intertest coefficient of variation <20%) had more agreement between WET and instream conditions than when WET intertest variability was high (intertest coefficient of variation >20%; Fig. 7). Of dischargers that exhibited compliance with the fish chronic test and had low intertest endpoint variability, nearly 70% were associated with unimpaired instream conditions. Conversely, of dischargers for which this endpoint indicated noncompliance and variability was low, stream impairment was observed in all cases. In contrast, WET results were unrelated to instream condition when intertest endpoint variability was high. These results suggest that in addition to WET test frequency, the variability observed in a test endpoint over time also affects the ability of a WET endpoint to accurately predict instream conditions.
Several studies have attempted to verify relationships between WET and instream biological condition, including the following: (1) the eight complex effluent toxicity testing program studies performed by the U.S. EPA in the early 1980s , (2) the Trinity River (TX, USA) studies performed by Dickson et al. [10,22], β,) the Elkhorn Creek (KY, USA) study performed by Birge et al. , (4) the North Carolina (USA) study performed by Eagelson et al. , (5) a study performed by the Ohio EPA at several sites in that state , and (6) a study of 107 dischargers in Florida by the Florida Department of Environmental Protection .
Review of the U.S. EPA studies and those by Birge et al.  and Dickson et al. , indicated that relationships were stronger between instream biological condition and ambient toxicity test results rather than between instream condition and WET. Furthermore, of the U.S. EPA studies in which strong relationships were observed between WET and instream assessments, many of the sites showed substantial acute toxicity. A similar condition was evident in the studies conducted by Dickson et al. and Birge et al. The studies conducted in North Carolina, Ohio, and Florida had similar limitations, although the first two examined several sites exhibiting chronic toxicity only. Given the fact that most of these studies used very limited WET information (most effluents were tested only once or twice), it is remarkable that good correlations with instream effects were observed. Because study design and analysis were site specific in most of these studies, and because site selection was nonrandom in most instances, past studies were unable to establish predictive relationships that could be applied elsewhere [23,24].
The recent Pellston Conference on WET  summarized several reasons why WET tests may or may not be effective predictors of effluent effects on instream biota. Of particular importance is the actual exposure of instream fauna to contaminants in the effluent in comparison with the WET compliance (dilution) criteria actually used and the degree to which this actual exposure is mimicked in standard WET tests.
Using commonly prescribed WET test pass/fail and facility WET compliance criteria, we observed that exposure and associated biological effects of effluents were discernible if the effluent comprised ≥80% of the stream flow under design conditions, if instream habitat quality was at least satisfactory and preferably good, if at least three WET tests and preferably more than four had been conducted by the facility, and if the facility had fairly consistent WET test results over time, particularly for fish acute or chronic tests.
Table Table 4.. Results of forward stepwise multiple discriminant analysis with Ceriodaphnia dubia acute whole effluent toxicity deleted from analysis (see text for explanation)
1. Log (P. promelas NOEC/IWC)
2. No. of failed P. promelas acute tests
3. Log (effluent design concentration)
4. Percentage of failed WET tests
% Correct classification
a IWC = instream waste concentration; NOEC = no-observed-effect concentration; P. promelas = Pimephales promelas; WET = whole effluent toxicity.
The effect of effluent dilution or exposure on the predictive value of WET results has been acknowledged [8,10,25,26] but not quantified. Effluents discharging to an effluent-dominated stream are more likely to have flow conditions approaching the design (7Q10) situation than are effluents discharging under more riverine conditions. Therefore, WET limits or pass/fail criteria that rely on the stream 7Q10 as the design effluent dilution condition are more likely to be effective predictors of instream effects because the effluent comprises more of the stream flow. However, in general, the 7Q10 flow condition appeared to be conservative as a chronic WET pass/fail criterion. At least for the fathead minnow, a chronic criterion of NOEC/IWC < 2.0 or LOEC/IWC < 1.0 was more significantly related to instream impairment. It is not known whether this conclusion is specific to our database. Work by Parkhurst  and discussions by Waller et al.  also point out the potential conservatism of using a 7Q10 as the design criterion in developing WET limits. Multiyear data obtained for several dischargers in this study  also suggested that WET results based on average or actual flow conditions, rather than a 7Q10 flow, more efficiently predicted instream biological condition.
In all cases, our analysis indicated that use of a specific test fail rate was a more effective predictor of instream condition than a single WET test failure or an average test value. A similar type of idea was expressed by Mount  and Groe-the et al. , indicating that effluent compliance evaluations should include a more careful examination of the pattern of WET test results over time. Implicit in this idea is the recommendation that effluent evaluations should be based on several tests over time.
Based on all analyses in this study, we observed that fish acute and chronic endpoints were most related to instream condition; however, no one endpoint was capable of accurately portraying instream conditions for all dischargers. Furthermore, fish chronic survival may be as or more reliable than acute fish endpoints in predicting instream effects in some cases. However, without some form of ambient or instream monitoring (using biological methods, toxicological methods, or both), it may be difficult to determine which WET test endpoint is providing the most accurate information regarding potential or ongoing instream effects.
A surprising result of this study was the lack of relationship between Ceriodaphnia acute or chronic WET endpoints and instream biological results. Previous results by Eagleson et al.  indicated an excellent relationship between Ceriodaphnia chronic pass/fail WET results and stream macroinvertebrate results, and this species is now widely used in WET testing programs throughout the United States. Reanalysis of these data, using our methodology for expressing impaired versus nonimpaired conditions, also indicated a significant relationship between Ceriodaphnia pass/fail results and instream biological condition (χ2 test, p < 0.01).
There were several differences between the two studies that might account for this disparity in results. First, the North Carolina study used a modified WET testing procedure for Ceriodaphnia that included a different water renewal schedule, greater sample holding times, and different statistical analysis procedures than tests used in our database. Second, the North Carolina WET results were based on single-concentration pass/fail tests, whereas the chronic WET data in our study were almost exclusively based on multiple-concentration tests. Third, North Carolina used a qualitative biological assessment procedure that was somewhat different from other assessment methods represented in our database, both in terms of collection and analysis methods. Fourth, we combined facilities from several states, which may have masked local differences in relationships between WET and biological data. Supporting information for the individual facilities tested by Eagleson et al.  (e.g., habitat quality, type of pollutants discharged, and effluent flow) were unavailable, so we could not further resolve this question.
The present study design attempted to reduce the effects of extraneous factors by examining biological condition directly up- and downstream of each wastewater treatment facility and by including only those sites with similar habitat quality up-and downstream of the discharge. Thus, the major sources of uncertainty remaining in our study were WET endpoint relevance and variability, effluent exposure, and bioassessment variability resulting from unknown insensitivity or biases of the bioassessment methods used. All of these sources of uncertainty reflect our present state of knowledge regarding how WET results are interpreted and how a weight-of-evidence approach might be implemented in assessing effluent compliance.
Some of the cases in which biological impairment was observed in our study, without apparent effluent toxicity, may be explained by more long-term effects of the effluent that could not be measured in WET testing. Whole effluent toxicity tests were not designed to measure subtle chronic or bioac-cumulation effects of effluents on aquatic biota. Unfortunately, funding and other resources were not available to obtain detailed chemical information for most of the sites used in this study, nor were we able to obtain actual exposure information (frequency and duration of chemicals discharged) and effluent dilution at the time of testing. This information would help account for discrepancies between biological impairment and lack of observed effluent toxicity. Results of our study suggest that such information should be incorporated into WET programs.
Unknown ecological precision and accuracy of the biological assessment methods used in this study are also major factors affecting our ability to observe relationships between WET and instream biological condition. We explicitly limited our assessment of instream biological condition on the basis of the benthic macroinvertebrate fauna. Although invertebrates are sensitive to a number of contaminants and other stressors, they may not always yield an accurate assessment of instream condition by themselves. Furthermore, results from three or four different biological methods were combined in this study, which could add imprecision to our analysis despite our attempts to rely on data from only standardized, rigorous methods. Some have suggested that more accurate assessments of biological condition are achieved if more than one faunal group is monitored (e.g., fish and invertebrates or algae and invertebrates ). It is possible that some of our sites were mis-classified with respect to biological condition because of the potential limitations in using only benthic macroinvertebrate data. We note, however, that at least for the Ohio sites in our database, in which fish as well as invertebrates were monitored, there was extensive redundancy in the assessments obtained between the two faunal groups (84% redundancy).
Bioassessment methods may not have been sensitive enough in cases where frequent WET test failures were observed but not accompanied by biological impairment. Similar criticisms of biological assessment methods have been raised by others [26,28]. We do not ignore the fact that relationships between WET and biological condition are limited by the current methodologies used to collect both types of data and their inherent uncertainties in ecological representativeness. Such uncertainties also place constraints on the accuracy of interpretations using a weight-of-evidence approach. The accuracy of biological assessments or WET tests will always be a function of the methodologies used. Until specific data quality objectives are defined for WET testing and instream assessments and methods are used that consistently meet those data quality objectives, relationships between WET and instream condition and the use of a weight-of-evidence approach to assess effluent compliance will remain elusive.
This study was supported by the Water Environment Research Foundation, Alexandria, Virginia, USA. The peer-review panel (P. Firth, I. Polls, T. La Point, B. Parkhurst, and R. Cruz) provided invaluable comments and suggestions on all aspects of the study. J. White and E. Leppo developed and managed the database. Helpful criticisms and comments were provided by several people during the course of this project, including T. Moore, G. Chapman, V. deVlaming, M. Barbour, J. Gerritsen, J. Stribling, and two anonymous reviewers.