Recommendations for the development and application of wildlife toxicity reference values



Toxicity reference values (TRVs) are essential in models used in the prediction of the potential for adverse impacts of environmental contaminants to avian and mammalian wildlife; however, issues in their derivation and application continue to result in inconsistent hazard and risk assessments that present a challenge to site managers and regulatory agencies. Currently, the available science does not support several common practices in TRV derivation and application. Key issues include inappropriate use of hazard quotients and the inability to define the probability of adverse outcomes. Other common problems include the continued use of no-observed- and lowest-observed-adverse-effect levels (NOAELs and LOAELs), the use of allometric scaling for interspecific extrapolation of chronic TRVs, inappropriate extrapolation across classes when data are limited, and extrapolation of chronic TRVs from acute data without scientific basis. Recommendations for future TRV derivation focus on using all available qualified toxicity data to include measures of variation associated with those data. This can be achieved by deriving effective dose (EDx)-based TRVs where x refers to an acceptable (as defined in a problem formulation) reduction in endpoint performance relative to the negative control instead of relying on NOAELs and LOAELs. Recommendations for moving past the use of hazard quotients and dealing with the uncertainty in the TRVs are also provided. Integr Environ Assess Manag 2010; 6:28–37. © 2009 SETAC


Toxicity reference values (TRVs) are point estimates of chemical doses or concentrations that are used in conjunction with exposure estimates of similar units to ascertain whether wildlife species may be adversely affected due to exposure to a chemical. TRVs are mostly derived from and compared with oral exposure data to predict the potential for impacts of environmental contaminants on avian and mammalian wildlife. TRVs have been developed to assess inhalation exposures (Johnson and Jay 1999; Gallegos et al. 2007) as well; however, the oral exposure route is the most-frequently evaluated exposure route in a wildlife assessment. Assessments requiring TRVs range from the initial hazard (or screening-level) evaluation, in which point estimates of exposure are compared with point-estimate TRVs resulting in a simple ratio (i.e., a hazard quotient), to more complex risk assessments involving the use of full distributions of exposure and effect concentrations to calculate the range and probability of results. In all instances, the TRV derivation can have a significant influence on the resulting risk estimate.

Hazard versus Risk Assessment

A clear distinction between hazard (or screening) assessments and risk assessments is required. A hazard assessment typically compares point estimates of exposure with a toxicity metric that is considered likely not to result in an adverse effect (i.e., the TRV). This is to provide decision-makers with a single point estimate known as a hazard quotient. Interpretation is based on whether exposure is above, at, or below the TRV (i.e., whether the hazard quotient is above, at, or below 1). Given the potential for uncertainty in the estimation of the exposure and toxicity metrics, liberal interpretation is used when quotients are close to a value of 1. Conversely, a risk assessment combines distributions of exposure and effect (e.g., dose–response curves or species sensitivity distributions) to provide decision-makers with information about the magnitude and probability of a range of outcomes. Interpretation is based on differing probabilities of an adverse outcome. What distinguishes a screening assessment from a risk assessment is that the former does not explicitly and quantitatively address the probability of an outcome, while a risk assessment does. In practice, most assessments will move through one or more levels (“tiers”) of increasing complexity (i.e., from hazard assessments toward true risk assessments) depending on the complexity of the project and the level of certainty needed for risk management decisions. This study focuses primarily on the derivation and application of point-estimate TRVs for use in screening assessments; however, some recommendations are also relevant for risk assessment applications.

Challenges in wildlife TRV derivation are not new. A general absence of toxicity data for wildlife species in the literature, combined with the low probability that new wildlife data will be forthcoming, leads to questions about how to develop a TRV in a data-sparse environment. Extrapolation from common laboratory test genera (e.g., Mus, Rattus, Colinus, Gallus spp.) to wildlife species has been widely used, but both methods for extrapolation (allometric dose-scaling—Opresko et al. 1994; Sample et al. 1996; application of default uncertainty factors—Chapman et al. 1998) lack a robust technical basis. Although several compendia of TRVs are available (e.g., Sample et al. 1996), detailed guidance on the scientific aspects of TRV development and application in screening and risk assessments is not currently available. Inconsistencies in derivation, level of protection sought, and degree of conservatism in TRVs are widespread (McDonald and Wilcockson 2003). Risk assessors, risk managers, and regulators have expressed a desire for greater consistency to reduce the overall uncertainty in toxicity thresholds and thus improve the utility of risk management recommendations.

This study presents recommendations on the derivation and application of wildlife TRVs (Table 1) to promote consistency within the ecological risk assessment (ERA) community, with a focus on the scientific aspects of wildlife TRV development and application. Recommendations arose from several meetings among the authors as part of a subcommittee of the Ecological Risk Assessment Advisory Group (ERAAG) of the Society of Environmental Toxicology and Chemistry (SETAC) North America. Preliminary recommendations were presented in two fora: an interactive poster session at the 28th SETAC North America annual meeting (November 2007), and the US Environmental Protection Agency (USEPA) Ecological Risk Assessment Forum/Tri-Services Environmental Risk Assessment Work Group meeting (January 2008). Questionnaires were distributed to more than 150 people during these meetings, and the responses received from approximately 20 people were considered in the preparation of this study. These recommendations are jurisdiction-neutral: they are not intended to reflect the specific policies or preferences of any one jurisdiction; rather, they are meant to reflect the state of the science. Recommendations focus on technical (not policy) issues, including aspects of data extraction and interpretation, selection of endpoints that relate to survival or fitness of organisms, extrapolation between species, and derivation of TRVs in the context of variability in chemical-specific toxicological data sets and species-specific variations in response. These recommendations focus on dose-based (typically derived from administered concentrations) TRVs rather than tissue-based TRVs (e.g., Johnson, McFarland, et al. 2007).

Table 1. Summary of recommendations for wildlife TRV development and use
Wildlife TRV Development
 Link TRVs to the problem formulation and assessment endpoints.
 Don't limit toxicity endpoints to mortality, reproduction, and growth.
 Consider data quality and differences such as the TRV test species being different from the assessed species, and the significant differences in exposure duration between the test species and the site-exposed receptor under evaluation.
 Consider relative species sensitivity when conducting interspecies extrapolation. Do not apply uncertainty factors for interspecies extrapolation without scientific justification.
 Don't assume TRVs are protective of any or all species within a vertebrate class.
 Don't use allometric dose-scaling with body mass when assessing chronic/subchronic toxicity between species.
 Don't extrapolate toxicity data between receptor classes when deriving TRVs for chronic exposures unless there is scientific justification to do so.
 Don't extrapolate chronic TRVs from acute exposure data unless data justify the extrapolation.
 Endeavor to use the more predictive dose–response distributions or EDx instead of NOAELs and LOAELs.
 Be transparent with all data, calculations, and assumptions used; use “confidence qualifiers” for data sets. Provide supporting data, including methods used and the basis for assumptions.
 Explicitly represent the level of uncertainty and use qualitative or quantitative methods to illustrate confidence in the TRV.
Wildlife TRV Use
 Apply point-estimate TRVs only for the purpose of screening chemicals (to determine which chemicals are to be retained for further evaluation).
 Hazard quotients (HQ) are not expressions of risk. Don't interpret HQ magnitude as a magnitude of effect.
 Don't conclude that an HQ>1 indicates unacceptable risk, or that a site with HQ>1 must have risk management or remediation.


TRV must relate to problem formulation

Wildlife risk assessments are initiated to assist environmental managers in their dealing with generic questions such as “What will happen to the wildlife if these chemicals remain in (or are introduced to) the area?” However, generic protection goals (e.g., “protect wildlife from contaminants”) are not sufficient for directing TRV selection; dialogue among risk assessors, environmental managers and in many cases, other stakeholders during the problem formulation is required to more clearly link TRVs and assessment and measurement endpoints. Factors such as existing and proposed land uses, geographic location, regulatory requirements, societal concerns, and the degree of risk aversion of both the manager and their organization, and other interested parties (i.e., the public and other stakeholders) will influence the level of protection to be afforded to wildlife species. Risk assessors should formulate assessment endpoints that capture the risk managers' concerns in terms of an attribute and entity to be protected (Suter et al. 2004).

Clarity in the assessment and measurement endpoints is required so the applicable toxicity data can be accessed to derive a TRV. Often, environmental protection goals are directed at population or community level protection, but TRVs are not available for these levels of ecological organization. Because TRVs are thresholds for effects to individual animals, measurement endpoints that are clearly related to the higher level assessment goals (e.g., survival, growth, reproductive output) are typically used to formulate a toxicity threshold. Data describing other responses to chemicals (e.g., behavior or physiological changes) may also be available and can have profound influences (e.g., toxicant-induced lethargy will alter survival when predator vigilance is critical for adults and offspring), but are often not clearly linked to most environmental protection goals. Data should be analyzed, corroborated, and endpoints evaluated without bias of a priori assumptions that survival, growth and reproduction are the only endpoints that lead to adverse population outcomes. Rather, one should assume that any physiological responses that result in direct or indirect changes to the survival, growth, reproduction, or immigration of organisms may have the potential to result in adverse consequences to a population when many organisms are exposed. Although most risk assessments have the inherent assumption that population-level effects will be absent if no effects are predicted for individual organisms, it is incorrect to suggest that the converse is true; that is, effects on individuals do not always result in changes in population density or age/sex structure due to many compensatory mechanisms that are present in ecological systems (Fairbrother 2001). Food chain models that consider exposure levels relative to toxicity thresholds for individuals can be combined with population models to translate effects on individual organisms into estimates of changes in population growth rates as a result of exposure to chemicals (Fairbrother 2001).

Thus, selection of appropriate TRVs will differ with each risk assessment as they are ultimately dependent upon the stated environmental protection goals and assessment endpoints. Risk assessors must be able to articulate this connection clearly, so the results of the assessment can be presented to the environmental managers in terms that are compatible with the decisions to be made.

Historical and ongoing use of NOAELs or LOAELs as TRVs

EDx-based TRVs (see further discussion below) are clearly preferable; however, many TRVs are based on no-observed-adverse-effect levels (NOAELs) or lowest-observed-adverse-effect levels (LOAELs), despite consensus that NOAELs and LOAELs have significant shortcomings. (For the purpose of this study, EDx is defined as a dose resulting in an x% reduction in an endpoint relative to a control group.) Note that the ASTM-I E47 Committee on Biological Effects and Chemical Fate has recently defined these terms as concentrations (NOAEC and LOAECs) and not levels; however, we have opted to use the more commonly used acronyms NOAEL and LOAEL to reflect the common usage among practitioners. This practice has been facilitated by the broad availability of LOAEL- and NOAEL-based TRVs in easy-to-access compendia (e.g., Sample et al. 1996; USEPA 2005 ecological soil screening levels [Eco-SSLs]; USACHPPM 2000; Los Alamos National Laboratory [LANL] ecorisk database); however, this practice has little technical merit for screening assessments and no merit for risk assessment purposes. NOAEL and LOAEL values are not innately related to biologically relevant thresholds and do not provide information about the actual magnitude of effects in the reported studies. NOAELs do not necessarily equate to a “no effect” dose; they reflect only the test concentrations used in the study and are strongly influenced by factors related to statistical power (e.g., study design, replication). NOAELs and LOAELs (or equivalent terms) in ERA have been criticized elsewhere (Hoekstra and van Ewijk 1993; Laskowski 1995; Chapman et al. 1996; Bailer and Oris 1997; OECD 1998); however, their use continues to be widespread, in part due to policy decisions that the NOAEL- and LOAEL-based TRVs provide an adequate basis for evaluating hazards to wildlife and complicity by practitioners who continue to emphasize policy over science (Kapustka 2008). Notwithstanding this policy decision, this study provides a spectrum of alternatives to NOAELs and LOAELs for consideration by risk assessors and policy makers alike.

Selecting appropriate toxicological data

The reliability of the TRV depends on the quality and quantity of data used. A comprehensive literature search with careful evaluation of all data retrieved is the essential foundation of TRV development and not a trivial exercise. Guidance on the literature search and evaluation process (USACHPPM 2000; USEPA 2005) is available and is not duplicated here.

Extrapolating TRVs

Extrapolating between species is not acceptable

The majority of TRVs are based on common laboratory test species. The near-complete absence of toxicity data for most wildlife species means that extrapolation of toxic responses observed in laboratory test species to species of interest is necessary.

Allometric scaling (e.g., as was used in Sample et al. 1996) is one extrapolation approach that is widely applied in human toxicology and that has been used for wildlife risk evaluations despite its multiple limitations. However, it is no longer recommended for use in wildlife risk assessment (USEPA 2005). First, supporting data are limited. Much of the mammalian data are based on anticancer drugs evaluated in Freirich et al. (1966) rather than contaminants typically evaluated in wildlife risk assessments. Second, the allometric scaling models developed for both human and wildlife risk assessment are all based on acute toxicity data. Their applicability to chronic toxicity data is unknown.

Recently, Raimondo et al. (2007) developed interspecies correlation estimation (ICE) as an alternate approach for quantifying interspecies toxicity relationships. ICE is based on log-linear regression models that describe acute toxicity relationships between pairs of species over a range of chemicals. ICE models were developed for all chemicals for which adequate data were available, and for chemicals grouped by similar mode of action. Consideration of modes of action improved regression models for some, but not all, groups (e.g., neurotoxicants, carbamates, and organophosphates). This indicates that mode of action can be an important determinant in interspecies toxicity extrapolation. Ultimately, although ICE models are a step forward, they are currently similar to allometric models in that they are based solely on data from acute studies.

Because modes of action can vary dramatically for the same chemical over acute and chronic exposures (discussed in more detail below), it is likely that interspecific scaling factors based on chronic toxicity data also will differ from those based on acute toxicity data. Additionally, given the variation in cross-species physiological responses in different organ systems, it is reasonable to expect multiple chronic scaling factors for a given chemical, depending on the mode of action considered. In their current forms, neither allometric scaling nor ICE models represent chronic toxicity, and, therefore, their application to chronic data is not recommended. In the absence of suitable models, we favor the use of toxicity information as reported, because it is often unknown whether target species would be more resistant or more sensitive. To take a biased approach without sufficient information, in our view, is unwarranted. Alternatively, uncertainty factors may be applied to adjust toxicity values. However, generic uncertainty factors (e.g., 10-fold for any uncertainty) should not be used. Rather, if uncertainty factors are used, there should be a scientific basis for their application (Chapman et al. 1998). Whether applying uncertainty factors or not, uncertainty can be minimized by selecting test species that are as taxonomically or physiologically related to the wildlife species of interest as possible.

Extrapolating across taxonomic classes is not acceptable

Cross-class extrapolation of toxicity data has been done when data to support TRV derivation are extremely limited. However, extrapolation between classes is not recommended for development of chronic TRVs. These extrapolations are highly uncertain under the best of circumstances, with uncertainty increasing with greater taxonomic distance. Examples from acute toxicity data for aquatic taxa show that uncertainty increased as taxonomic relatedness decreased (Suter et al. 1986; Suter and Rosen 1988). Few studies have directly investigated cross-class extrapolations for wildlife; however, Luttik and Aldenberg (1997) concluded that differences between birds and mammals preclude extrapolations between these two classes based on LD50 values. Sample and Arenal (1999) observed similar LD50-based allometric scaling factors for birds and mammals for a majority of the chemicals evaluated, but because no clear pattern was observed for differences based on chemical categories, they concluded that extrapolations between birds and mammals should be approached with extreme caution. More recently, Raimondo et al. (2007) observed that uncertainty in LD50-based ICE models increased as taxonomic relatedness between surrogate taxa and the target taxon increased. Similar findings were observed for chronic wildlife toxicity data. Conversely, Johnson, Quinn, et al. (2007) found onset of central nervous system effects (i.e., convulsions) to be remarkably similar between species of 3 classes of vertebrates (reptiles, birds, and mammals) from daily oral exposures of RDX (1,3,5-trinitro-1,3,5-triazine) for 14, 60, and 90 d, respectively. This relationship was not true for two other energetic compounds tested using the same species. No clear pattern of cross-class sensitivity to these other energetic compounds was apparent, and, therefore, different conclusions on relative toxicity for each taxonomic class could be drawn depending upon which compound was considered. Although data are limited, clear patterns in relative cross-class sensitivity are lacking for both acute and chronic toxicity data and, therefore, extrapolations across classes should be avoided.

Extrapolating chronic TRVs from acute data is not acceptable without scientific support

Most wildlife risk assessments are intended to evaluate long-term exposure to low concentrations of chemicals (although there are exceptions such as spills and pesticide applications). However, a significant fraction of the mammalian and avian toxicity data are based on acute (short-term exposure) studies. This is particularly true for mortality studies involving a single dose delivered orally in a highly absorbable form (e.g., in water or vegetable oil). Reproduction studies may involve longer exposure periods; however, these are still typically shorter than an organism's life span. In the past, many risk assessments have extrapolated chronic TRVs from studies with acute or subacute exposure durations. However, these extrapolations are uncertain because the relationships between acute and chronic responses often are not known for most species–chemical combinations. Consequently, extrapolation of chronic effects from acute data is not recommended, unless there are data to support the extrapolation.

There is no generic acute-to-chronic extrapolation factor that can be applied for wildlife TRV derivation. Hill (1994) compared responses of mallards (Anas platyrhynchos) from studies with single dose versus 5-d exposure to a variety of pesticides (organophosphorus, carbamate, and organochlorine compounds) and showed very different responses between the two exposure regimes. Most notably, the 5-d exposure values were much more variable among chemicals than were the single dose values; however, this may be due to differences in concentration to dose conversions. Overall, Hill (1994) found no statistical relationship between the two sets of values. Studies compiled in the aquatic organism database show that acute to chronic ratios (ACR) vary considerably by species and by substance class (Länge et al. 1998). Metals have the largest ACR, and other inorganics also show considerably different responses between acute and chronic exposures (inorganic substance ACRs vary from 20 to nearly 200). ACRs for organic substance are lower, but still vary by an order of magnitude (range: 2 to 28). Thus, there is no empirical basis for a universal ACR to extrapolate chronic exposure TRVs from acute exposure toxicity data.

Acute and chronic exposures result in significantly different physiological effects due to species- and exposure-specific variation in adsorption, distribution, metabolism and excretion (ADME) rates. For example, after administration of a single oral dose of DDT to dogs, the highest concentration of DDT was found in the bile with moderate amounts in the central nervous system (CNS) and blood, and small amounts in the kidney and liver (St. Omer 1970). However, after 2 weeks of feeding, DDT was found in fat, skin, muscle, and kidney; it does not begin to show up in other organs until at least 4 weeks of exposure. Cook and Trainer (1966) showed that lead poisoning from acute exposures in mallards results in a rapid increase in blood lead levels and subsequent mortally due to peripheral nervous system effects, while a lower dose chronic exposure has much slower uptake, as evidenced by lower blood lead levels and neurotoxic effects to the central nervous system. Acute poisoning by DDT is evidenced by CNS signs quickly followed by death (St. Omer 1970), whereas chronic toxicity affects endocrine functions (e.g., prostaglandin synthesis in the eggshell gland mucosa by p,p-DDE, resulting in eggshell thinning; Lundholm 1997) and acts as a potent androgen receptor antagonist (Kelce et al. 1996). Therefore, differences in ADME for acute and chronic exposures may prevent the development of chronic TRVs from acute toxicity data.

Given the likelihood of different ADME rates, toxicological endpoints, and dose–response curves between acute and chronic exposures to the same chemical, it seems unwise to use TRVs based on acute studies when assessing risks from chronic exposures. Chronic TRVs based on acute data will likely be incorrect and targeted on inappropriate toxicological endpoints. Dividing the acute TRV by an uncertainty factor will not correct this misalignment or necessarily result in a more conservative estimate. Rather, the lack of a chronic TRV should be identified as a data gap and discussed as part of the uncertainty analysis. If there are data to support an extrapolation from acute exposures to develop a chronic TRV, then these data should be documented and the extrapolation can be done.

Data-dependent options for deriving TRVs

Ideally, all TRVs should be derived with a thorough understanding of the underlying mechanism of toxicity and physiological differences between species. Dose–response relationships are best used to illustrate these points. In screening assessments, this information can be used to derive single point estimates (e.g., EDx values), while in a risk assessment context (see text box), the underlying dose–response distribution can be used directly for understanding the likelihood and magnitude of potential effects as well as the response to incremental increases in exposure.

The use of EDx-based TRVs (which still leads to the calculation of hazard quotients [HQs]—see next section of the study) is flexible in that different protection goals (i.e., allowable magnitude of effects) can be tailored for each assessment endpoint, which in turn may reflect different land uses or different target species (e.g., rare vs common species). Practically, the development of dose–response relationships for many chemicals and wildlife species often is constrained by data limitations. These challenges have led to the use of TRVs based on NOAELs and/or LOAELs. However, even in data-poor situations, we recommend that TRVs be derived by extracting dose–response information (e.g., dose and effects level for each treatment) from the study reports or publications, rather than relying on the reported NOAEL or LOAEL. We recognize that moving away from NOAEL- and LOAEL-based TRVs is challenging for various reasons (not the least of which is the existing regulatory precedent (cf., Hope 2009). To bridge this practice with that recommended herein, risk practitioners may wish to show where NOAELs and LOAELs fall on the dose–response relationship to provide context to the EDx-based screening assessment.

Because the selection of a TRV for use in wildlife assessment is inherently a data-dependent process, TRV derivation options will vary according to the quantity and specificity of toxicity data used. All options require extraction of dose and response data from pertinent wildlife toxicological studies, when possible. For example, if a study has 5 treatments, individual doses and effect sizes can be extracted for each treatment, instead of simply determining NOAEL/LOAEL values. In some cases, this is relatively straightforward, because the studies were designed for the purpose of establishing dose–response relationships. Unfortunately, this is not typically the case and published studies often do not include enough data to reconstruct a dose–response curve. Thus, “data mining” of the scientific literature needs to be conducted to build a relevant data set. Once a dose–response dataset has been assembled, the decision on which TRV derivation approach is appropriate to follow is dependent on data quantity and receptor specificity, but mostly is driven by the data quality objectives from the problem formulation. Three different potential approaches are described below. (When no toxicity data are available [e.g., evaluation of proposed compounds], QSARs may provide a method of estimating the toxic properties of a compound, using the physical and structural characteristics of this compound relative to a toxicity data set from similarly structured compounds. However, options for QSAR derivation are beyond the scope of this study.)

Ideally, enough data are available to fit species-specific dose–response curves for many species. Point estimate (e.g., EDx) values from each curve could be combined to build a species sensitivity distribution (SSD) (e.g., to estimate an EDx-based TRV protective of y% of species) or variability among the curves could be used to predict a range of possible dose–response relationships for any species. However, because toxicity studies for vertebrates collect a wide range of continuous data and integrate many different methods, all such data are rarely equivalent (an important criterion in developing SSDs). While the SSD approach is commonly used in aquatic risk assessment (e.g., Newman et al. 2000; Baird and Van den Brink 2007), SSDs have rarely been developed for vertebrate wildlife species (e.g., Moore et al. 2006) because of their substantial data needs and differences between study designs and reported results.

It is more likely that TRVs will be generated using single dose–response curves. Receptor-specific models can be developed when sufficient data exist (e.g., Kerr and Meador 1996; Moore et al. 1997, 1999, 2003; Wayland et al. 2007). In cases where data are more limited, it may be possible to combine dose and response data from different species or endpoints and use the whole data set for deriving the model. EDx or Benchmark Dose (BMD; a BMD corresponds to the statistical lower confidence limit on the study dose producing a predetermined level of change in adverse response compared to the response in untreated animals. BMD considers the whole dose–response relationship) methods can be used to derive TRVs (Caux and Moore 1997; Moore and Caux 1997; USACHPPM 2000).

Finally, when available data do not support the formal derivation of dose–response curves, the assessor still has better options for TRV derivation than relying on NOAEL/LOAEL estimates. For example, the dose–response data can be plotted (e.g., a scatterplot) and examined visually; the underlying relationship can be used to select a TRV. The dose–response relationship (strong or weak) can also provide insights into uncertainty and the possible implications of exposure to doses exceeding the TRV.


Hazard quotient methodology

Hazard quotients (HQs) are the primary method used in screening (or hazard) assessments to compare the expected magnitude of chemical exposure with a TRV, with an HQ <1 generally being used as a default de minimis risk value. Conservative assumptions are used to estimate the exposure and TRV to avoid type II error (i.e., to conclude that there is not significant or unacceptable hazard when in fact there is). Consequently, there is a high degree of confidence that chemicals with HQs <1 do not pose unacceptable hazard. However, given the high probability of a type I error (i.e., to conclude unacceptable risks when in fact there are none), HQs ≥1 typically require further studies to refine uncertainties before making risk management decisions. Interpretation of the hazard assessment is always influenced by the precision and the accuracy of both the exposure concentration and the TRV, which dictates the extent of confidence one can have in the resulting HQ.

HQs have been misused in many ways, irrespective of whether the denominator is an EDx-based TRV or not. Three common errors are as follows: 1) assignment of false precision to HQ values; 2) classification of HQ values of low, medium, and high risk; and 3) addition of HQ values to obtain a composite HQ (or Hazard Index; HI).

False precision

The number of significant figures in the HQ is dictated by confidence or certainty one has in the input data. Moreover, the least certain datum determines the number of significant figures in the resulting calculations. For example, the sum of 1 (with 1 significant figure) and 0.02001 (with 4 significant figures; note that leading zeros do not count) becomes 1, with 1 significant figure. If these were chemical concentrations in some medium and if greater precision were used in determining the first value to be 1.00, then the sum would be 1.02, with 3 significant figures. Given that diets of wildlife at best are expressed with 2 significant figures (0.50 of item A, 0.30 of item B, and 0.20 of item C), the best a resulting HQ can have is 2 significant figures. Note however, that the confidence one has in the dietary splits is likely to be 0.5, 0.3, and 0.2 respectively, which reduces the resulting HQ to 1 significant figure; in other words, possible HQ values are 0.1 to 0.9, 1, 10, etc.


There has been great temptation to assign degrees of severity to ranges of HQ values. A central problem with the classification of HQs is that the underlying toxicity information used to identify a TRV is almost always nonlinear. A 10-fold higher concentration of a particular chemical exposure (e.g., Pb) may bring about only a minor increase in toxic response, whereas a 2-fold higher exposure concentration of another chemical (e.g., Se) may be lethal. Because the primary use of the HQ is to determine if further work is warranted, it is sufficient to use the HQ to make binary decisions—screened out versus screened in for subsequent analysis. A more appropriate option for evaluating the severity of effects involves the use of multiple TRVs (see below; for example, several EDx values covering a range of effects severity). This “multiple HQ option” would likely facilitate discussions with risk managers by showing where the predicted dose falls relative to several discrete x values.


The same reasons against using the HQs for classification apply here, plus others. Varying dose–response relationships (i.e., slopes and intercepts) and modes of action make summing HQ values (i.e., calculating a hazard index, HI) inappropriate. There are some situations in which one legitimately may have concern about additive (i.e., when chemicals “act together” to produce effects without enhancing or diminishing each other's action) responses (e.g., of certain metals or PCB congeners). In such cases, the first step is to identify the substances with similar modes of action. Next, one would determine a toxicity equivalency among the substances (i.e., X = 1.5Y, X = 4Z). The final step would be to convert the environmental exposures into the common currency X, so that a truly additive response would be obtained. However, this is very difficult to do with much certainty and, therefore, such manipulations of data should be used sparingly. This becomes particularly difficult when attempting to extrapolate toxicity data from one biological species to another, because most species will respond differently to the different substances. Notwithstanding, some regulatory schemes allow for either simple mathematical additivity (as a simple screening step, despite its weaknesses; cf., Kortenkamp 2007) or a target organ specific hazard index (TOSHI). The latter is often hard to accomplish, and even harder to apply in a population or community effect context. Thus, the default approach becomes chemical effects addition so that cumulative impacts are not completely ignored. This is a policy-driven approach meant to fill in a data gap and as such invites legitimate skepticism regarding the meaning of an HI ≥ 1.

Moving beyond hazard quotients

As described in the previous section, the HQ is essentially a binary output: chemical/receptor combinations are either screened out or retained for subsequent assessment tiers, up to and including a risk assessment. A key improvement in wildlife assessment involves the replacement of NOAEL- or LOAEL-based TRVs with EDx-based TRVs; however, moving beyond HQs requires abandoning all single point expressions of exposure and effect. For the purpose of this study, we do not review options to refine the exposure assessment and assume that exposure is either represented as a single point estimate or as a distribution. The options available for the effects assessment all involve incorporating the dose–response distribution underlying the TRV, but are data-dependent (e.g., SSD, dose–response curve, or scatter plots).

Three examples are presented (Figures 1 to 3) to illustrate the most common approaches for using dose–response information in wildlife assessment; a detailed discussion of the specific methods involved in each example is not included because the goal of this study was only to provide an overview of potential options. When a dose–response model is available, it is possible to determine the specific effects level associated with a particular dose, such as reproductive effects of methyl mercury to the common loon (Gavia immer) (Figure 1). When data are limited, a dose–response model may be developed based on combined data from multiple species (i.e., Figure 2 displays potential reproductive effects of methyl mercury to a range of avian species for which mercury-specific toxicity data were unavailable). Finally, in data-poor situations (e.g., avian cobalt toxicology), scatter plots can be used to place the predicted dose into context from an effects perspective, but with uncertainty being less quantifiable (Figure 3). No example is provided for the SSD approach, as it would be rare to have sufficient wildlife toxicity data to pursue this option.

Figure 1.

Dose–response model for a single species. In this example, sufficient data were available to develop a model for the reproductive effects of methyl mercury to the common loon, Gavia immer. 1The magnitude of response y-axis scale extends from 0 to 1 (e.g., 0.7 corresponds to a 70% effect size such as reduction in reproductive output relative to control). Data points are labeled as follows: loon (L).

Figure 2.

Dose–response model for multiple species. In this example, data were combined from several avian species to develop a general model for the reproductive effects of methyl mercury to avian receptors. 1The magnitude of response y-axis scale extends from 0 to 1 (e.g., 0.5 corresponds to a 50% effect size such as reduction in reproductive output relative to control). Data points are labeled as follows: chicken (C), duck (D), heron (H), loon (L), osprey (Y), pheasant (P), quail (Q), tern (T).

Figure 3.

Scatter plots of toxicological data for multiple species with sublethal (A) and lethal (B) endpoints. In this example, there were insufficient cobalt data for developing a dose–response model. However, available data were plotted to show the general trends in effect sizes across a range of doses for sublethal (A) and lethal (B) endpoints. 1For plot A, the magnitude of response y-axis scale corresponds to an effect size such as reduction in reproductive output relative to control. 2For plot B, the magnitude of response y-axis scale corresponds to mortality relative to the control. Data points are labeled as follows: chicken (C) and duck (D).

Note that these examples also have potential application for risk assessments as well as screening assessments. If a range of exposure doses has been calculated (e.g., probabilistic distribution), then comparisons can be made to either dose–response models or to the dose–response data set (i.e., scatter plot). In the first situation, there are several “joint probability” methods (e.g., Jacobs and Vesilind 1992; Solomon et al. 2000; Suter 2007) that consider both the exposure and effects distributions to develop true risk estimates (i.e., both probability and magnitude of adverse effects). Some examples of studies where these techniques have been applied in wildlife ecological risk assessments include Moore et al. (1997, 1999, 2003) and Wayland et al. (2007). In cases where dose–response relationships were not modeled, the exposure distribution could be used to estimate the probability of exceeding one (or several) EDx values (i.e., derived using scatter plots). In addition, the potential severity of effects associated with the portion of the dose distribution exceeding the EDx can be put into context using the underlying dose–response information.

Uncertainty in the TRV

It is easy to think of a TRV as a “stand-alone” number. In reality, every TRV value derived from a dose–response relationship has an associated probability (p) of that amount of toxicant eliciting a specified response (i.e., effect size x) for a specified assessment endpoint, as well as an associated uncertainty. When p is not known or expressed, risk managers (and stakeholders) must simply be assured qualitatively, with some certainty, that p is at or near an acceptable level (i.e., to reduce the likelihood of Type II errors). This need for assurance that the threat is not being underestimated has encouraged use of numerous cascading uncertainty factors (Baron and Wharton 2005; Duke and Taggart 2000) and precautionary assumptions when deriving the TRV (and in the assessment itself). The lowest possible TRV becomes increasingly attractive, even if it actually (but unknowably) represents some vanishingly small value of p.

If data underlying the TRV selection process are sufficiently robust, it may be possible to estimate the p associated with the TRV. Similarly, given sufficient data, an upper 1-sided tolerance interval could be used to give a TRV with a specified certainty of a specified percentage of the responses being below it. Either of these refinements may give risk managers a better sense of whether they are likely to be over- or underestimating the threat.

When uncertainty associated with a TRV is illustrated with (in this case, a hypothetical) dose–response relationship (Figure 4) for a single species, it becomes apparent that different p values for each possible EDx-based TRV could be derived from such a curve. Furthermore, there is also a range of TRVs at a given p. These are manifestations of uncertainty, a combination of stochastic variability (natural heterogeneity) and incertitude (lack of knowledge). This uncertainty has numerous sources, including (but not limited to): extrapolation from high dose, short-term exposures used in tests to lower, long-term exposures typically encountered in the field; extrapolation of results in common test species to those anticipated in free-living species; extrapolation from results in homogenous test populations to those in more variable wild populations; effect of multiple stressors, (including, but not limited to chemical stressors) on free-living populations; interspecies variation in response; and difficulties in measuring changes in the desired toxicity endpoint.

Figure 4.

A hypothetical dose–response relationship. The data points (□, 1 to 5) are the results of five individual toxicity tests on the same species (or a species sensitivity distribution (SSD) if these data points were test results from five different species). Here, these data are fit to a lognormal distribution, with uncertainty in the response dose at a specific percentile expressed as a prediction interval; however, different distributions and intervals are possible.


New toxicological studies

Most of the current challenges in TRV derivation can be traced to the fact that the majority of data extracted for use in TRV derivation are from older studies that were not intended for this use. The limited toxicological data regarding the effects of common pollutants on relevant and appropriate wildlife species continues to be a critical data gap. Future studies should be designed explicitly to facilitate TRV derivation. Key issues include the need for accurate measurement of food and water consumption rates and changes in test organism body mass during the study; inclusion of sufficient test treatments to support the fitting of dose–response curves; and collection of biological tissues during the study to provide a means of connecting toxicological endpoints to tissue concentration. Examples of this last issue include TRVs calculated for blood lead concentrations for birds (Pain 1996; Johnson, Wickwire, et al. 2007) and for RDX (1,3,5-trinitro-1,3,5-triazine) in quail (Johnson, McFarland, et al. 2007), which can lead to nonlethal field sampling techniques to generate empirical exposure doses (e.g., Johnson, McFarland, et al. 2007) rather than focusing exclusively on estimated exposures. Developments in wildlife-specific toxicokinetic physiologically based models would also help in this regard.

There is a need for data from wildlife species beyond the common test species. Most toxicological data used in TRV derivation involve common laboratory species; few wildlife species are used for controlled toxicological investigations and more are needed. Wildlife laboratory studies for birds are largely limited to northern bobwhite, Japanese quail, mallard, zebra finches, pigeon, and other granivorous species; however, other predatory models have been developed in outdoor applications (e.g., American kestrel, Eastern screech owls). Laboratory studies for mammals include primarily rodents (e.g., voles [Microtus], New World mice [Peromyscus]), and shrews, though some studies for outdoor applications have also been developed (e.g., mink). We recognize that proper husbandry and care of wildlife species in the controlled laboratory environment can be challenging, and, given these limitations, investigators should focus on wildlife that can provide information regarding the physiological differences that may affect absorption. Few interspecific comparisons have been conducted due to these data limitations; however, it would be prudent to focus on a few selected chemicals in a collaborative effort. An initial goal should be to determine whether or not general recommendations for interspecies variation can be developed, and, if so, to ascertain the likely magnitude of interspecific variation to assist in developing appropriate uncertainty bounds for other chemicals that will likely continue to have limited data.

Existing toxicological data

Although toxicological data are limited for many chemicals, these data have not been used to their fullest advantage. Literature reviews have not emphasized extraction of the entire dose–response relationship, and returning to the original literature to do so will require a great amount of effort. The ECOTOX database used in the USEPA's Eco-SSL derivation process provides a logical starting point for this exercise given the level of effort already expended on literature identification and evaluation. We are aware that TRVs have been repetitively derived for the same chemical/receptor combinations in individual risk assessments, but these are rarely published in the peer-reviewed literature and are, therefore, lost to general use. We encourage the publication of these important works; alternatively, there may be interest on the part of one or more regulatory agencies to provide support in compiling and sustaining these efforts as a central repository of data.


The issues related to the derivation and application of TRVs as described in this study have resulted in significant inconsistencies in wildlife assessments. These inconsistencies are due, in part, to a lack of clarity on the part of both risk assessors and risk managers regarding the critical need to distinguish between hazard and risk assessments. The understandable desire to avoid type II errors (i.e., to conclude that there is not unacceptable hazard when in fact there is) at the hazard assessment stage has resulted in many procedures leading to ultra-conservative generic TRVs. The so-called refinement of these generic TRVs for risk assessment purposes has led to questionable practices that fail to recognize that HQs can only measure hazard, irrespective of its illusory sophistication. We recommend that all assessors recognize this dichotomy and its implications for wildlife assessments. Reliance on tabular summaries of NOAELs and LOAELs have no biological relevance despite wide-spread regulatory acceptance; instead, at a minimum, data should be extracted from the underlying studies and examined in detail to generate appropriate EDx-based TRVs for hazard assessment purposes. Hazard assessments should not overstate their precision and thus their value for site management purposes by portraying hazard quotients as a quantitative measure of risk. In the event that a risk assessment is warranted, the assessor should be prepared to invest considerable resources to pursue options that move beyond hazard quotients. Achieving this ultimate goal of providing genuine risk estimates for wildlife exposed to chemical contaminants will require both regulatory consensus as well as a level of cooperation in research and compilation activities that is not currently available.


The authors thank the British Columbia Ministry of Environment for providing funding for two workshops to develop this study and the Pacific Northwest Chapter of the Society of Environmental Toxicology and Chemistry for administrative support, as well as all survey respondents who provided constructive comments on earlier presentations of our recommendations. All views and opinions expressed herein are those of the authors and do not necessarily represent those of any other government, public, or private entity. No official endorsement is suggested or is to be inferred. Note that authors are listed in alphabetical order and that all made substantive contributions to this study.