Although the research described in this article was funded wholly or in part by the U.S. Environmental Protection Agency (contract 68-C4-0034) and Science Applications International Corporation (subcontract 4500145731), it was not reviewed by either entity and therefore does not necessarily reflect their views. No official endorsement should be inferred.
An empirical comparison of effective concentration estimators for evaluating aquatic toxicity test responses†
Article first published online: 2 NOV 2009
Copyright © 2000 SETAC
Environmental Toxicology and Chemistry
Volume 19, Issue 1, pages 141–150, January 2000
How to Cite
Bailer, A. J., Hughes, M. R., Denton, D. L. and Oris, J. T. (2000), An empirical comparison of effective concentration estimators for evaluating aquatic toxicity test responses. Environmental Toxicology and Chemistry, 19: 141–150. doi: 10.1002/etc.5620190117
- Issue published online: 2 NOV 2009
- Article first published online: 2 NOV 2009
- Manuscript Accepted: 20 JUN 1999
- Manuscript Received: 10 FEB 1999
- Relative inhibition estimation;
- Inhibition concentration estimation;
- Ceriodaphnia dubia;
- Macrosystis pyrifera;
- Regression modeling
Aquatic toxicity tests are statistically evaluated by either hypothesis testing procedures to derive a no-observed-effect concentration or by inverting regression models to calculate the concentration associated with a specific reduction from the control response. These latter methods can be described as potency estimation methods. Standard U.S. Environmental Protection Agency (U.S. EPA) potency estimation methods are based on two different techniques. For continuous or count response data, a nominally nonparametric method that assumes monotonic decreasing responses and piecewise linear patterns between successive concentration groups is used. For quantal responses, a probit regression model with a linear dose term is fit. These techniques were compared with a recently developed parametric regression-based estimator, the relative inhibition estimator, RIp. This method is based on fitting generalized linear models, followed by estimation of the concentration associated with a particular decrement relative to control responses. These estimators, with levels of inhibition (p) of 25 and 50%, were applied to a series of chronic toxicity tests in a U.S. EPA region 9 database of reference toxicity tests. Biological responses evaluated in these toxicity tests included the number of young produced in three broods by the water flea (Ceriodaphnia dubia) and germination success and tube length data from the giant kelp (Macrocystis pyrifera). The greatest discrepancy between the RIp and standard U.S. EPA estimators was observed for C. dubia. The concentration–response pattern for this biological endpoint exhibited nonmonotonicity more frequently than for any of the other endpoint. Future work should consider optimal experimental designs to estimate these quantities, methods for constructing confidence intervals, and simulation studies to explore the behavior of these estimators under known conditions.
As a follow-up to a recent Pellston workshop addressing toxicity testing , we compared the current U.S. Environmental Protection Agency (U.S. EPA) potency estimation techniques to an alternative method based on a parametric concentration–response model. In this analysis, we evaluated the workshop recommendation to evaluate improvements in the statistical analysis of toxicity test data. In particular, effective concentration estimates based on a flexible model for the relationship between exposure concentration and adverse response can be used to generate potency estimates. One construction of the effective concentration estimates is to consider the concentration that leads to some specified level of inhibition relative to the control group. Both parametric and non-parametric methods have been proposed for constructing these potency endpoints. The linear interpolation inhibition concentration estimator [2,3] is a nonparametric point estimator that uses assumptions of monotonic decline along with piecewise linearity to estimate the concentration associated with a specified level of decrement from the control responses. The U.S. EPA in the analysis of whole effluent toxicity tests commonly applies this method and a probit regression–based estimator. Bailer and Oris [4,5] proposed an estimator of reproductive inhibition based on a parametric concentration–response model. This model assumed that the count response, number of young in three broods, is a Poisson-distributed random variable and that the mean number of young can be modeled using an exponential term involving a polynomial in the test concentrations. Recently, Bailer and Oris  generalized this estimator to consider dichotomous (e.g., germination or mortality) and continuous (e.g., tube length or weight) responses. Their framework involves the use of generalized linear models [7,8] as generic concentration–response models. The estimated values of parameters from these generalized linear models are used to derive potency estimators, general estimators of effective concentrations, which were labeled relative inhibition concentration (RIp) estimators.
In this article, standard U.S. EPA estimators and the RIp estimator are compared in the context of a large database of aquatic toxicity tests using reference toxicants. This provides a comparison of a recently proposed regression-based potency estimator with methods currently used by the U.S. EPA.
Data have been collected over time by U.S. EPA region 9. The database with raw data for both the cladoceran Cerio-daphnia dubia and the giant kelp Macrocystis pyrifera was provided in electronic (machine-readable) form. In addition, summary data sets containing U.S. EPA method-based estimates and other calculations were also provided. The responses provided for C. dubia, a freshwater organism, were number of young (a count of the number of young in three broods) and survival after exposure to a reference toxicant (sodium chloride) as described in the U.S. EPA test method . Only the number of young response was analyzed for C. dubia. These tests were conducted at six laboratories between July 18, 1989 and June 16, 1993.
|Organism||Type of measured response||No. of laboratories||Average no. experiments per laboratory (minimum, maximum)||Date of initial experiment||Date of final experiment|
|Ceriodaphnia dubia||No. of young||6||26.2 (8, 46)||July 18, 1989||June 16, 1993|
|Macrocystis pyrifera||Germination proportion||11||24.3 (9, 80)||August 10, 1988||May 25, 1997|
|Mean tube length||11||24.1 (9, 74)||August 10, 1988||March 25, 1997|
The responses provided for M. pyrifera, a marine test, were the germination proportion (of 100 spores) and tube length (the average of 10 measurements) [10,11]. These tests were conducted at 11 laboratories between August 10, 1988, and March 25, 1997. Table 1 contains a summary of the experiments that are part of this database.
Description of estimation methods
The objective of many aquatic toxicology studies is to estimate the concentration of a particular chemical that leads to a specified level of inhibition in response relative to responses in control or unexposed organisms. We compared the current U.S. EPA point estimate methods, the linear interpolation procedure or a probit model-based procedure, with a recently proposed regression-based potency estimator. The next few sections provide a general review of the potency estimators that were applied to the data sets described above.
Experimental data and notation
For the methods that follow, we assume that Yij corresponds to the response of the jth organism to the ith concentration (Ci), where j = 1,…, ni and i = 0,…, G. Furthermore, we assume that N = n0 +… + nG organisms have been randomly assigned to these G + 1 groups. Finally, let μi = E(Yij) correspond to the average response in the population of organisms exposed to the toxicant at concentration Ci, with Yi = (ΣjYij)/ni corresponding to the sample estimate of μi.
Standard U.S. EPA potency estimation method I—Lnear interpolation procedure
A nonparametric procedure was developed as an alternative for potency estimation in aquatic toxicology studies [2,3]. This particular estimator is applied to continuous data (e.g., tube length) or count data (e.g., number of young produced in three broods). The derivation of the linear interpolation estimator, the inhibition concentration estimator associated with proportion p inhibition relative to the control response, that is, Ci such that μi = (1 − p)μ0, was based on two assumptions. First, it was assumed that the responses were monotonically non-increasing (μ0 ≥ μ1 ≥… ≥ μvG), and, second, it was assumed that the concentration-response pattern between adjacent concentrations was linear. If monotonicity was violated, adjacent groups exhibiting this violation were pooled until themono-tonicity assumption was met. As an example, if Y0 < Y1, then responses from groups 0 and 1 would then be pooled together, and this average would be assigned to both concentration groups, say M0 = M1 = (ΣjY0j + ΣjY1j)/(n0 + n1). If Y2 > M1, then M2 would be the average of responses in the first three concentration groups; otherwise, M2 = Y2. This process would continue until a set of monotonically nonincreasing means were constructed, that is, M0 ≥ M1 ≥… ≥ MG. To find the inhibition concentration estimator, ICp, the two means that bracket a proportion p decrement from the control mean were identified. In other words, the group indices j and j + 1 such that
with j ranging from 0 to G were identified. This linear interpolation estimate was then defined as the linear interpolant between Cj and Cj+1. Confidence intervals (CIs) for this estimator were constructed from a nonparametric bootstrap per-centile-based method .
Standard U.S. EPA potency estimation method II—Pobit-based inhibition concentrations
The linear interpolation method described above is not applied to quantal response data, defined as the number of successes (e.g., survivors) in a certain number of trials (e.g., number of organisms on test). For quantal response data, the U.S. EPA recommends fitting a probit regression model and then solving for the concentration that is associated with a certain response proportion . In essence, this model is equivalent to fitting a special case of the model we describe in the next section, where a generalized linear model is fit with a probit link function and binomial response distribution. Thus, the probability of response at a particular concentration C, say πc, is modeled as πc = Φ(β0 + β1C), where Φ() is the cumulative normal distribution function. In aquatic toxicology, this model is applied to survival or germination data, in which inhibition of response is expected with higher concentrations. Thus, a common goal is to estimate the concentration at which the response probability is at a certain specified level. For example, a 25% inhibition concentration would correspond to C where πc = 0.75, assuming nearly 100% response in the control conditions.
|Organism||Type of measured response||β1 < 0, β2 < 0||β1 > 0, β2 < 0||β1 < 0, β2 > 0||β1 > 0, β2 > 0|
|Ceriodaphnia dubia||No. of young||46 β,0%)||99 (65%)||8 (5%)||0 (0%)|
|Macrocystis pyrifera||Germination proportion||123 (46%)||16 (6%)||128 (48%)||0 (0%)|
|Mean tube length||30 (12%)||1 (<1%)||222 (88%)||0 (0%)|
As a notational convenience, we denote the potency estimate derived from either the U.S. EPA potency estimation procedures (i.e. the linear interpolation method or the probit regression estimator) as an ICp.
Recent parametric approach—RIp
The parametric alternative for potency estimation used a model of adverse response as a function of toxicant concentration, followed by inverting this model at a specified inhibition level to estimate the concentration associated with that response level. Regression methods have been advocated for use in aquatic toxicology . Recent related work has continued to investigate the evaluation of toxicant effects on continuous responses [14,15]. Most of this recent work focused on normal or log-normal responses in which mean responses were parameterized as some function of toxicant concentration. More general techniques have been proposed for cases of non-normal responses, which frequently arise in the context of aquatic toxicity testing. In particular, Maul  proposed the use of a generalized linear model framework  for analysis of toxicity test data. Bailer and Oris [4–6] suggested a regression-based estimator of inhibition concentrations in this context.
For example, consider count responses that might arise from counting the number of young from C. dubia reproduction tests. Assuming a Poisson response distribution and that a log link function was linear in a polynomial of concentrations, Bailer and Oris [4,5] derived an estimate of the concentration associated with a specified level of reproductive inhibition, the so-called relative inhibition concentration p, or RIp. In this approach, a generalized linear model assuming a Poisson distribution and a mean response that was exponential in a polynomial of concentrations, that is,
was proposed. It has been argued that a second-degree polynomial, K = 2, provides a range of possible patterns of concentration response [4–6]. As an aside, Kodell and West  advocated both a polynomial response and use of a quadratic polynomial in recent work focusing on risk estimation for continuous responses. The RIp was the value of the toxicant concentration C such that μC = (1 − p) μ0, where 0 < p < 1. This implied
Thus, the RIp is the value of C, solving Equation 1:
The parameter estimates of the log-linear model for μC are obtained by maximum likelihood and substituted into Equation 2, which can then be solved to yield an estimate of RIp. As an example, if a 50% inhibition concentration is to be estimated using a log-linear model with K = 2, the quadratic equation, β,2C2 + β1C + 0.693 = 0, is solved to find RI50.
Bailer and Oris  generalized the idea from the specifics of the C. dubia reproduction toxicity tests to other response scales encountered in aquatic toxicity tests. For survival and germination (dichotomous) data, a generalized linear model with a logit link function and a binomial response distribution was used. For tube length (continuous) data, a generalized linear model with a log or identity link function and a normal or gamma response distribution could be used.
Confidence intervals for RIp were constructed via non-parametric bootstrapping, as was suggested for the linear interpolation inhibition estimator . In addition, Bailer and Oris  suggested a delta-method confidence interval for RIp. An alternative strategy for generating bootstrap-based CIs for RIp is to use a parametric bootstrap in which the bootstrap samples are generated with mean determined by the maximum likelihood parameter estimates and the response distribution determined by the nature of the measurements (e.g., binomial for germination data, Poisson or negative binomial for number of young, and normal or gamma for tube length data).
The SAS® macro programs [16,17] were developed using generalized linear model methodology for estimating RIp and associated confidence intervals for count, dichotomous, and continuous response types. The program is documented with extensive commenting of the source code and is available from the authors (send requests to firstname.lastname@example.org). The program models a particular type of response by fitting a generalized linear regression model involving a polynomial of degree K = 1 or 2 in the test concentrations.
Analysis methods for comparing ICp and RIp estimates
The ICp point estimates were obtained for all response types using the methods described previously. These were provided with the electronic versions of the data. The RIp point estimates were obtained for the count responses provided for the C. dubia data using a generalized linear model assuming a log link function and Poisson error distribution. For the M. pyr-ifera germination proportion data, a logit link function and binomial error distribution were assumed (essentially performing logistic regression). For average tube length responses, a log link and a normal response distribution were used to estimate RIp.
All data, regardless of response type, were initially modeled using a quadratic polynomial in the test concentrations. The pattern of estimated model coefficient signs was then investigated to assess model validity. In particular, it was of interest to determine whether any data set produced a positive quadratic coefficient (i.e., β,2 > 0). Such an occurrence would indicate a modeled enhancement of response at high concentration levels, a result suggestive of model misspecification (i.e., overparameterization). The frequency distributions of coefficient signs in the quadratic models are given in Table 2. All models were refit using a regression model linear in the test concentrations (on the scale of the link function), and RIp point estimates were revised in cases where β,2 > 0 in the quadratic model. Approximately two-thirds of C. dubia toxicity tests exhibited concentration-response patterns with β,1 > 0; β,2 < 0, a pattern of initially increasing responses followed by decreasing responses, whereas the M. pyrifera responses seldom, if ever, demonstrated this pattern. In contrast, approx. 88% of the tube length data sets exhibited (β1 < 0; β,2 > 0). Looking more closely at example data sets, the tube length concentration-response data could often be described as almost exponential decay with immediate and often extreme toxicity occurring at low concentration levels.
Data sets were characterized by the nature of the regression model that best fit observed concentration-response patterns. Displays of side-by-side box plots of control responses across laboratories were constructed to highlight potential laboratory differences (Figs. 1 to 3). Summary statistics (mean, standard deviation, minimum, maximum, and coefficient of variation) were generated for each ICp and RIp endpoint. This summary was stratified by laboratory (Tables 3 to 5 and Figs. 4 to 6).
A random effects model  with laboratory as a random effect was also fit to explore the component of variation associated with different laboratories relative to the error source variability (Table 6).
The RIp and ICp estimates were compared with a variety of different summaries (Table 7). First, concordance and discordance of the 25% and 50% inhibition levels were evaluated. Concordance was defined as the proportion of cases with RI25 < IC25, RI50 < IC50 or RI25 ≥ IC25, and RI50 ≥ IC50. In addition, plots of ICp versus RIp were generated with lines of unit slope and zero intercept provided for reference (Figs. 7 to 9). Large departures from this reference line were used to identify data sets leading to the most discrepant ICp and RIp estimates.
Laboratory and time comparisons
Laboratory differences were first explored by examining the range of control responses observed across the laboratories included in this data set. Figures 1 to 3 present side-by-side box plots in which the control responses for number of young (Fig. 1), germination proportion (Fig. 2), and tube length (Fig. 3) are presented by laboratory. Median values for the number of young ranged from 18 (laboratory C. dubia02) to nearly 30 (laboratory C. dubia01). The greatest variability in this response was also associated with the highest response (laboratory C. dubia01). Median germination proportions ranged from 80 to 95% across laboratories, and average tube length ranged from 13 to 17 μm.
A comparison of laboratories by response type was performed to investigate the possible presence of a systemic laboratory-based difference in RI and/or IC point estimations. Tables 1 to 3 and Figures 4 to 6 provide descriptive summaries of these comparisons.
In the case of C. dubia total young estimation, the RI method appeared to produce more consistent estimation than the IC method, as illustrated by the observed coefficients of variation, which were consistently lower in RI estimation, although the difference was notably smaller for 50% inhibition levels as opposed to 25% inhibition levels. The accompanying box plots seemed to indicate systematically smaller point estimates at laboratory C. dubia03 regardless of estimation method or inhibition level.
A significant laboratory effect in RI or IC point estimation appeared to occur for M. pyrifera germination proportions. The RI procedure seemed to be producing a few comparably large point estimates for data generated by laboratories M. pyrifera01 and C. dubia04 (Fig. 5). However, overall performance between the IC and RI methods for this type of response was largely consistent, as illustrated by the accompanying scatter plots of ICp versus RIp (Fig. 8). Generally, the methods yielded similar estimates for lower inhibition levels, with a slight tendency for RIp to exceed ICp when RI25 was <100. However, at p = 50%, the methods appeared to produce somewhat divergent results above a concentration level of 150 units (Fig. 8).
All RI and IC point estimates were also indexed by date of experiment to investigate the possible presence of a time effect in the data. No substantial time effect appeared to exist in either the C. dubia data or M. pyrifera germination or tube length data (figures not shown).
The final analysis of laboratory variability involved fitting analysis of variance random effects models to evaluate the laboratory component of overall variability in the responses (Table 6). A random effects model is appropriate for such a purpose because one may view the laboratories involved with the present experimental data as a sample from a larger population of potential laboratories. The estimate of total variance associated with a random effect is known as the variance component because it is measuring the part of the overall variance in the response that is contributed by that effect. Generally, 13 to 16% of the ICp/RIp point estimation variation in C. dubia reproduction and M. pyrifera germination could be attributed to a laboratory effect, 14 to 25% of the ICp/RIp variation in M. pyrifera tube lengths could be attributed to laboratory effects, and 35 to 37% of the ICp/RIp variation in M. pyrifera germination proportions could be attributed to laboratory effects. No clear difference between ICp and RIp with respect to the laboratory component of variation was observed.
Descriptive analysis of application of RIp and ICp to region 9 data
The concordance or discordance between RI and IC estimates was assessed after all data sets were fit using an appropriate model. Concordance was defined as a case where similar behavior was observed between the RI and IC across the 25 and 50% inhibition levels. The first two columns in Table 7 exhibit concordant behavior between RIp and ICp; the last two columns exhibit discordant behavior.
Data sets exhibiting concordant performance between the RI and IC procedures comprised 56% of the C. dubia experiments, 64% of the M. pyrifera germination experiments, and 83% of the M. pyrifera mean tube length experiments. As noted above, the C. dubia responses exhibit the enhanced responses at low concentrations more frequently than either of the M. pyrifera responses. The pattern of discordance for this response is estimated to be RI25 ≥ IC25 with RI50 < IC50 β,3% of cases). With enhanced responses at low concentration levels, the IC estimator is essentially an inhibition concentration relative to a pooling of control plus enhanced responses. This pooled baseline response will be larger than the control responses. Thus, although 50% inhibition relative to the baseline response may lead to an IC50 larger than the RI50, the 25% inhibition level relative to the baseline level may lead to an IC25 that is less than the RI25.
Scatter plots of IC versus RI point estimates were provided with a line of unit slope for comparison of the two resulting point estimators (Figs. 7 to 9). For C. dubia, it appears that IC and RI results were positively related with the exception of a few data sets, especially at the p = 25% inhibition level. Similar RIp and ICp estimates were obtained for the M. pyr-ifera germination data sets.
|Database||Response||Point estimator||Between laboratories||Within laboratories|
|Ceriodaphnia dubia||Total young||IC25||0.0211 (15%)||0.1172 (85%)|
|RI25||0.0148 (13%)||0.0997 (87%)|
|IC50||0.0261 (16%)||0.1407 (84%)|
|RI50||0.0164 (14%)||0.0982 (86%)|
|Macrocystis pyrifera||Proportion germinated||IC25||560.54 β,5%)||1,043.04 (65%)|
|RI25||956.33 β,7%)||1,608.08 (63%)|
|IC50||2,233.24 β,5%)||4,130.13 (65%)|
|RI50||2,085.62 β,6%)||3,655.36 (64%)|
|Average tube length||IC25||72.46 (14%)||450.22 (86%)|
|RI25||184.24 (25%)||561.62 (75%)|
|IC50||455.90 (29%)||1,101.02 (71%)|
|RI50||1,069.56 (25%)||3,260.37 (75%)|
Comparison of the IC and RI estimation methods on M. pyrifera average tube length estimation produced more inconsistent results than that for the other response types. Initially, one may observe discrepancies by laboratory in average tube lengths in the control group (zero concentration), which may impact the estimation procedure. Scatter plots of ICp versus RIp showed a strong general tendency for RI estimates to exceed those produced by the IC method. The relationship between the point estimates appeared linear, although with a slope of < 1. The linearity of this pattern explains why this response leads to the most concordant result between RI and IC estimates.
Data sets with the greatest departure between RIp and ICp estimates
Two issues should be addressed regarding discrepancy between RIp and ICp estimates. First, there were data sets with RIp estimates but no ICp estimates and visa versa. After exploring these data sets, we conclude this is due to a miscoding of laboratory identification values between the raw data sets and the summary data sets. The second discrepancy is more interesting. We considered data sets in which the RIp and ICp estimates differed most dramatically. These can be seen graphically by examining Figures 7 to 9. Points that are farthest from the reference line with respect to vertical distance are points with the largest RIp and ICp differences. For the three responses (young, germination, and tube length), number of young exhibited the most distinctive pattern of departure, which appeared associated with differences in the point estimates. Two patterns were associated with large differences in the number of young responses: enhanced responses at low concentrations with toxicity exhibited at higher concentrations and unusual patterns in which the lowest tested concentration had significantly decreased responses relative to controls followed by increases in the next couple of concentrations before ultimate declines in responses. The second pattern was more common. Differences in the RIp and ICp estimates were not as extreme for the kelp responses, and broad generalization is not warranted.
A recommendation for the RIp estimation method arises from the observation that it can be applied to any response endpoint encountered in whole effluent toxicity tests. The generalized linear model provides a framework for this analysis of these difference response scales, and the RIp estimator arises naturally from this construction. The linear interpolation procedure can also be applied to different response scales; however, the monotonicity requirement of this procedure may make it inappropriate for certain experimental results.
The results told a complicated story for the C. dubia reproduction data. The relationship between ICp and RIp was seen to vary with the choice of p, with estimated RI25 > IC25 and estimated RI50 < IC50. One conjecture for this pattern is the possible effects of hormesis biasing the estimate arising from the linear interpolation ICp method. This ICp imposes a monotonically decreasing structure to the estimation of inhibition concentration. The RIp method does not share this restriction. Thus, one expects a potential bias in the ICp method when the data truly follow a pattern in which stimulation of response occurs at low levels of toxicant exposure (i.e., when hormesis is present). This observation would also explain the apparent reversal in the RIp/ICp relationships when different levels of p are considered. More investigation is required to make a definitive statement about this result. The RI25 may be slightly larger than the IC25, but this type of comparison is inconclusive; we do not know the true value of the 25% inhibition concentration, so we cannot claim that one estimator is superior to the other. In comparison, M. pyrifera germination led to very similar point estimates between the probit-based IC and RI estimators, which may not be surprising because both estimators are based on regression models applied to binomial responses. The average tube length response yielded RI > IC regardless of the value of p.
|Type of measured response||RI25 < IC25 and RI50 < IC50||RI25 ≥ IC25 and RI50 ≥ IC50||RI25 < IC25 and RI50 ≥ IC50||RI25 ≥ IC25 and RI50 < IC50|
|Ceriodaphnia dubia||No. of young||44 (28%)||44 (28%)||18 (11%)||51 β,3%)|
|Macrocystis pyrifera||Germination proportion||26 (10%)||145 (54%)||58 (22%)||38 (14%)|
|Mean tube length||27 (10%)||193 (73%)||24 (9%)||21 (8%)|
To use the RIp estimator, one needs to have a good model fit; the regression relationship must provide a reasonable description of the concentration-response relationship. This should be a prerequisite for the application of any estimation procedure. The ICp procedure requires monotonic decline to yield unbiased estimators. Application of this method without evaluating this assumption is suspect. We view a significant concentration-response relationship as a sensible prerequisite for any effective concentration estimation exercise. Nonmonotonic concentration-response models were required for C. dubia number of young modeling, whereas monotonic concentration-response models were reasonable for either M. pyrifera germination or tube length responses.
We believe that the RIp estimator is preferable to either ICp estimator because of its broad applicability to different response levels and its applicability to nonmonotonic concentration-response patterns. In addition, the linear interpolation ICp is acknowledged as having coverage probability problems, yielding CIs that do not achieve the nominally stated confidence coefficients . Note that the nonparametric bootstrap-based confidence interval method applied to RIp would be anticipated to lead to confidence intervals with poor coverage probability properties . We think that more work is needed to determine the best method of constructing intervals for this estimator. In fact, we have conducted some small-scale simulations that suggest that there are consistent undercoverage problems associated with nonparametric bootstrap CIs for either ICp or RIp estimators. Note that a delta-method construction has already been suggested as an alternative . In addition, parametric bootstrapping from a negative binomial distribution may provide a promising alternative for the count responses. Other alternatives, such as the bias-corrected or bias-corrected and accelerated bootstrap method, should be explored .
The selection of p is a scientifically based biological decision. Given the choice of p, the experimental design (spacing of doses and number of organisms) should reflect this decision . The statistical experimental design implications of a low level of p are the need to include multiple concentrations that bracket the desired level of inhibitory response. Optimal experimental design principles might be used to address this concern. In addition, unequally replicated designs might be considered (if logistically feasible). For example, perform twice the number of replicates in the low concentration region rather than at the highest concentration conditions (when responses are often inhibited to zero). An empirical comparison such as conducted in this study is not adequate for making recommendations for this question.
In conclusion, it appears reasonable to incorporate parametric estimation methods into whole effluent toxicity testing. These methods are appropriate for all response scales (di-chotomous, count, and continuous) and can incorporate non-monotonicity without difficulty or bias. However, questions remain when using these methods. In particular, we believe that we should initiate studies to determine optimal methods for constructing CIs. Neither the linear interpolation ICp nor the RIp CIs are meeting the nominally specified coverage probabilities. Some recent work has suggested promising alternatives, but more work is needed (A.J. Bailer, R.T. Elmore, B.J. Shumate, and J.T. Oris, unpublished data).
The authors thank Kathleen Stralka for her comments on a previous version of this manuscript. In addition, the authors thank Ruth Much for her assistance in managing the details of this contract.
- 1GrotheDR, DicksonKL, Reed-JudkinsDK, eds., 1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving System Impacts. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA.
- 21988. A robust statistical method for estimating effects concentrations in short-term fathead minnow toxicity tests. Contract 69–03–3534. Office of Water, U.S. Environmental Protection Agency, Washington, DC., .
- 31993. A linear interpolation method for sublethal toxicity: The inhibition concentration (ICp) approach. National Effluent Toxicity Assessment Center Technical Report 03–93. U.S. Environmental Protection Agency, Duluth, MN..
- 41993. Modeling reproductive toxicity in Cer-iodaphnia tests. Environ Toxicol Chem 12: 787–791., .
- 51994. Assessing the toxicity of pollutants for aquatic systems. In LangeN, RyanL, BillardL, BrillingerD, ConquestL, GreenhouseJ, eds, Case Studies in Biometry. John Wiley & Sons, New York, NY, USA, pp 25–40., .
- 61997. Estimating inhibition concentrations for different response scales using generalized linear models. Environ Toxicol Chem 16: 1554–1559., .
- 71989. Generalized Linear Models, 2nd ed. Chapman & Hall, London, UK., .
- 81992. Application of generalized linear models to the analysis of toxicity test data. Environ Monit Assess 23: 153–163..
- 91989. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to freshwater organisms, 2nd ed. EPA/600/4–89/001A. U.S. Environmental Protection Agency, Cincinnati, OH., et al.
- 101995. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to west coast marine and estuarine organisms. EPA/600/R-95–136. U.S. Environmental Protection Agency, Cincinnati, OH., , .
- 111990. Marine Bioassay Project fifth report: Protocol development and interlaboratory toxicity testing with complex effluents. Report 90–13WQ. State Water Resources Control Board, Sacramento, CA, USA., , , , , .
- 121994. An Introduction to the Bootstrap. Chapman & Hall, London, UK., .
- 131985. Advantages of using regression analysis to calculate results of chronic toxicity tests. In BahnerR, HansenD, eds, Aquatic Toxicology and Hazard Assessment (8th Symposium). STP 891. American Society for Testing and Materials, Philadelphia, PA, pp 328–338., .
- 141993. Upper confidence limits on excess risk for quantitative responses. Risk Anal 13: 177–182., .
- 151995. Calculation of benchmark doses from continuous data. Risk Anal 15: 79–89..
- 16SAS Institute. 1993. SAS® Technical Report P-243, SAS/STAT Software: The GENMOD Procedure, Release 6.09. Cary, NC, USA.
- 17SAS Institute. 1997. SAS® Macro Language: Reference, 1sted. Cary, NC, USA.
- 181996. Applied Linear Statistical Models, 4th ed. Irwin, Chicago, IL, USA., , , .
- 191996. Session 3: Methods and appropriate endpoints. In GrotheDR, DicksonKL, Reed-JudkinsDK, eds, Whole Effluent Toxicity Testing: An Evaluation of Methods and Predictability of Receiving System Impacts. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, pp 51–82., et al.