Expert opinion on toxicity profiling—report from a NORMAN expert group meeting



This article describes the outcome and follow-up discussions of an expert group meeting (Amsterdam, October 9, 2009) on the applicability of toxicity profiling for diagnostic environmental risk assessment. A toxicity profile was defined as a toxicological “fingerprint” of a sample, ranging from a pure compound to a complex mixture, obtained by testing the sample or its extract for its activity toward a battery of biological endpoints. The expert group concluded that toxicity profiling is an effective first tier tool for screening the integrated hazard of complex environmental mixtures with known and unknown toxicologically active constituents. In addition, toxicity profiles can be used for prioritization of sampling locations, for identification of hot spots, and—in combination with effect-directed analysis (EDA) or toxicity identification and evaluation (TIE) approaches—for establishing cause–effect relationships by identifying emerging pollutants responsible for the observed toxic potency. Small volume in vitro bioassays are especially applicable for these purposes, as they are relatively cheap and fast with costs comparable to chemical analyses, and the results are toxicologically more relevant and more suitable for realistic risk assessment. For regulatory acceptance in the European Union, toxicity profiling terminology should keep as close as possible to the European Water Framework Directive (WFD) terminology, and validation, standardization, statistical analyses, and other quality aspects of toxicity profiling should be further elaborated. Integr Environ Assess Manag 2013; 9: 185–191. © 2013 SETAC


Chemical quality status of the environment is usually assessed by comparing measured contaminant levels in biotic and abiotic samples to compound-specific quality standards that are defined in legal frameworks such as the European Water Framework Directive (WFD) (EC 2000, 2008) and the US Clean Water Act (1977). In addition to these targeted chemical analyses, many research groups have performed biomarker measurements to determine environmental quality. A biomarker is defined as “a change in a biological response that can be related to an exposure to, or toxic effect of, an environmental chemical or chemicals” (Peakall and Shugart 1993). Biomarkers often represent subtle and reversible effects that reflect a specific toxicological mechanism that may be linked to effects at whole organism levels such as survival, growth, and reproduction. Biomarker responses can thus be considered as early warning signals for irreversible adverse effects at higher exposure levels. Biomarkers can be measured in feral or caged organisms exposed in situ, but also in bioassays, which are biological test systems in which whole organisms or parts of organisms (e.g., tissues, cells, proteins) show a quantitative response when exposed to individual chemicals or complex environmental mixtures thereof.

The main advantage of using bioassays for environmental monitoring purposes is that the integrated toxic potency of complex environmental mixtures can be determined as a whole (including chemical interactions such as additivity, synergism, or antagonism), even when the chemical composition of the mixture is unknown and the active constituents of the mixture are not identified. Despite these advantages, effect-based toxicological monitoring has been restricted so far mainly to research applications and has only to a limited extent been incorporated in regulatory monitoring programs. Most likely, this is due to the fact that many bioassay and biomarker methods are still in an early phase of standardization. In addition, no environmental quality criteria have been derived for the different endpoints used in effect-based methods, to which the obtained monitoring results can be compared. Finally, the acceptance of effect-based monitoring techniques requires a paradigm shift from measuring individual compounds that may have regulatory criteria toward measuring integrated toxicity that may be indicative for possible effects at the population level.

Many environmental scientists, however, recognize that panels of bioassays and biomarkers have added value for environmental monitoring (Eggen and Segner 2003; Houtman et al. 2004; Escher et al. 2008; ICES 2011). In a recent Integrated Environmental Assessment Management (IEAM) publication, we explored the applicability of using a battery of effect-based bioanalytical techniques for the purpose of sediment quality assessment (Hamers et al. 2010). Ideally, such a battery will be able to quantify multiple specific modes of action, which when taken together will cover the spectrum of toxicological “syndromes” as comprehensively as possible. For each sample tested, the combined results from the test battery form a unique toxicological profile indicating which endpoints are most or least affected by the contaminants in the sample. Effect-based measurements with such a battery can thus be regarded as a “safety net” for signaling the presence of toxic potency in an environmental sample.

To discuss and achieve a common position on the concept of toxicity profiling (Hamers et al. 2010) and its possible applications for environmental pollution monitoring purposes, the NORMAN network of reference laboratories for monitoring of emerging environmental pollutants ( organized an expert group meeting (Amsterdam, October 9, 2009) with peer European scientists working on effect-based monitoring strategies and techniques in marine, freshwater, and soil environments. Items that were addressed included a common definition of toxicity profiling, its application in environmental risk assessment, sampling strategies and sample pretreatment methods, bioassay methods, and uncertainties in toxicity profiling. The current article describes the outcome of the meeting and the consequent follow-up discussions within the expert group.


The expert meeting agreed on the following common definition of a toxicity profile:

“A toxicity profile is a toxicological “fingerprint” of a sample, ranging from a pure compound to a complex mixture, obtained by testing the sample or its extract for its activity toward a battery of biological endpoints.”


When testing individual compounds, toxicity profiling can be used as a hazard characterization in prospective risk assessment to make a prognosis of possible environmental effects of new or existing chemicals. For instance, Hamers et al. (2006) determined the toxicity profiles of 27 individual brominated flame retardants regarding their endocrine disrupting potency using bioassays covering 12 different modes of action.

When testing complex mixtures in environmental samples, toxicity profiling can be used for environmental monitoring in retrospective risk assessment to make a diagnosis of the environmental quality of existing situations in the field. For instance, Houtman et al. (2004) determined the toxicity profiles of sediments from 15 different sampling locations in the Dutch Rhine-Meuse estuary in 5 different in vitro bioassays. It is this latter diagnostic application of toxicity profiles for complex mixtures in environmental samples that was further discussed in the expert group meeting. Three approaches were discussed to apply toxicity profiling for environmental quality assessment (see Hamers et al. [2010] for an illustrative diagram), i.e., (1) translating toxicity profiles into hazard profiles indicating the relative distance to a baseline toxicity profile, reflecting desired or acceptable environmental quality; (2) translating the toxicity profiles into risk profiles indicating the ratio between the actual bioassay response and a bioassay response considered to be safe for environmental health; and (3) selecting samples with relatively high toxic potency for further identification of causative compounds using in depth effect-directed analysis (EDA) strategies.

Approach 1: Translating toxicity profiles into hazard profiles

By translating toxicity profiles to hazard profiles, each measured response in the battery can be classified as similar or as slightly, moderately, or highly increased compared to a reference situation. A moderate to high response can be considered as an early warning asking for further investigation. Such an early warning signal can thus be considered as an alarm bell that impairment may be occurring, which demonstrates the safety net function of the toxicity profiling approach as a first tier in environmental monitoring strategies that ultimately may lead to further toxicological, chemical, and ecological studies. To fulfill this safety net function, fast and cost-effective bioassays are required. In contrast to the other approaches described below, the expert group considered levels of standardization and validation of the bioassays to be less important in this safety net context, as long as false negatives are avoided.

To classify a given bioassay response as high or low, Hamers et al. (2010) suggested using the ratio between each toxicity profile and an adequate baseline toxicity profile. In this context, toxicity profiles from reference locations could be used as baseline toxicity profile, whereas the choice of reference locations could be based on expert judgment or on good chemical quality, but preferably on good ecological quality (Hamers et al. 2010). It should be realized that such a baseline toxicity profile has temporal and spatial variability that should be taken into account when translating toxicity profiles into hazard profiles. The expert group suggested alternative ways to discriminate bioassay responses associated with high or low likelihood of good environmental quality, such as the use of z-scores indicating which responses deviate from average.

In addition, the natural break algorithm (Jenks 1977) was proposed to derive threshold levels. This classification method determines at what value a biological response measured in the field (e.g., ecological index, species composition) is most responsive to a steering parameter (i.e., the measured toxic potency). It yields a binary decision: a steering parameter higher than the determined value leads to an impaired biological response and a steering parameter lower than the determined value does not. Kapo and Burton (2006) used this method to establish spatial associations between discrete biological conditions (impaired or nonimpaired) and multiple steering stressor parameters in a multidimensional application of an epidemiological-based approach combining GIS-based, weight-of-evidence (WoE), and weighted logistic regression (WLR) techniques. Kapo and Burton (2006) quantified “impairment thresholds” for different stress factors, including a concentration-derived prediction of mixture toxicity expressed as the multisubstance potentially affected fraction of species (msPAF) (De Zwart and Posthuma, 2005). In principle, the same method may be applicable for classification of bioassay responses to complex environmental mixtures of pollutants into effective and ineffective toxic potencies.

Alternatively, fuzzy logic data processing (Zadeh 1965) was discussed as a method for classification of environmental samples. Instead of classifying a single bioassay measurement as belonging to one single class (yes/no membership, e.g., “moderate estrogenicity”), fuzzy logic accounts for uncertainty in the classification by allowing a single measurement to belong partly to one class and partly to another class. For example, in fuzzy logic an estrogenic bioassay response is not just qualified as “moderately estrogenic,” but can be qualified as “30% low estrogenic” and “70% moderately estrogenic.” Taking into account the gradual membership of each individual bioassay measurement (e.g., estrogenicity, dioxin-like activity, mutagenicity), the multiple bioassay measurements are then classified into an overall quality classification (e.g., “moderately polluted”) based on rule-based IF X AND Y THEN Z decisions. Keiter et al. (2009) used this method for classification of contaminated sediments based on in vitro toxicity profiles. Although the authors stressed that the inclusion of ecological relevance should be further implemented as a weighing factor in the rule-based classification, they concluded that fuzzy logic may be a promising step in the classification of sediment hazard, for instance in the context of the Water Framework Directive. A similar rule-based approach was developed to derive the Ecosystem Health Condition Chart (EHCC) as an integrative biomarker index for feral mussels exposed to the Prestige oil spill (Marigomez et al. 2013). In this approach, individual biomarker responses have been classified according to predefined ranges of biomarker responses taking into account both reference and critical values for each biomarker. The overall indicator class value is then determined according to predefined rules based on the combination of biomarker responses and the weight of each individual biomarker. The EHCC approach is not restricted to biomarker responses determined in feral animals but can also be applied to bioassay responses determined in laboratory tests. By using different colors for different classes of the individual bioassay responses and the overall indicator value, the EHCC method is very suitable for a graphic representation (Marigomez et al. 2013).

Rather than classifying the toxicity profile into a discrete class (e.g., “moderately polluted”), the experts further discussed the advantages and disadvantages of aggregating the information present in the hazard profiles into a single continuous indicator value (Figure 1). The indicator value can be calculated as the area of a star plot, in which each axis represents the normalized value of a bioassay response, similar as in the Integrated Biomarker Response (IBR) approach developed by Beliaeff and Burgeot (2002). Alternatively, a continuous indicator value can also be obtained from the cumulative distribution function of ECx values determined in the different bioassays, in terms of the concentration factor of the original sample required to reach the x% effect level. The indicator value pT is then determined as the fraction of bioassays for which the actual concentration in the field (i.e., concentration factor = 1) exceeds the ECx value (De Zwart and Sterkenburg 2002).

Figure 1.

Schematic overview of the different sample types, test matrices, bioassays and endpoints that may be used in toxicity profiling. In case of biota sampling, the organisms should preferably not only be used for toxicity profiling, but also be examined for their general health status.

An aggregated indicator for toxicity allows the use of toxicity profiles in the context of existing GIS-models, so that chemical environmental quality can be mapped. In addition, single indicator values are easy to explain as a useful and objective tool to the general public and to policy makers. A disadvantage of such a simplification is that information may be lost, because the underlying toxic endpoints, which are most affected in case of a “bad” indicator value, are cloaked. In conclusion, the experts had no objections to derive indicator values, but they stressed that the method by which the indicator is derived should be transparent, and that the underlying information should remain available.

Approach 2: Translating toxicity profiles into risk profiles

To translate bioassay responses into terms of risks, critical bioassay response levels are required at the whole-organism level (in vivo) or at the cellular or molecular level (in vitro), above which chemicals are expected to cause effects at the population or community level (Hamers et al. 2010). For dioxin-like compounds, for instance, Traas et al. (2001) derived critical sediment quality criteria ranging from 1 to 12 pg TEQ/g OC for vitamin A depletion and reduced litter size in European otter (Lutra lutra). Based on this critical level, the authors conclude that TEQ levels in sediments, which can be determined by in vitro bioassays (Houtman et al. 2004), are too high for a successful reintroduction of the otter in the Biesbosch Lake area. For the observed effects of estrogenic compounds on fish reproduction, Hamers et al. (2010) made a straightforward but conservative attempt to derive a corresponding critical bioassay response level in sediment, ranging from 0.1 to 2 ng estradiol equivalent (E2EQ) per g dw. Based on knowledge of the underlying mechanisms of action (MoA) of endocrine disruption, Ankley et al. (2009) used a systems-based approach to develop predictive toxicological tools for endocrine disrupting compounds. Using results from models, in vitro and in vivo bioassays, the authors made an effort to extrapolate molecular indicators to effects at the level of an individual fish. For most MoA other than dioxin-like activity and endocrine disruption, however, the expert group recognized that lack of toxicokinetic and toxicodynamic knowledge prevents such an extrapolation. Translation of bioassay derived toxicity profiles into risk profiles can be further facilitated by establishing in vitro–in vivo relationships using physiologically-based pharmacokinetic (PBPK) and pharmacodynamic (PBPD) modeling, as has been done in human risk assessment for decades (Clewell 1993; Louisse et al. 2010). Compared to human risk assessment, however, additional challenges need to be overcome in ecological risk assessment when using results from short-term bioassays, such as extrapolation from the individual to the population level and from one (or few) to many species. Interspecies extrapolation may benefit from retrospective interspecies correlation estimates (ICE) of the toxicity of chemicals determined in 2 species (USEPA 2010) or from prospective traits-based approaches (TBA) that may in the long run lead to quantitative trait sensitivity relationships (QTSR) (Van den Brink et al. 2011) similar to the well known quantitative structure activity relationships (QSARs).

Approach 3: Identification of responsible compounds

Bioassay responses deviating from the reference situation may be further investigated for identification and sourcing of the responsible compounds, so that effective measures can be taken. Effect-directed analysis (EDA) is often used to determine which compounds in an extract are responsible for the observed effects in an in vitro bioassay. EDA implies an iterative procedure of bioassay testing and sample fractionation resulting in a relatively clean responsive fraction, which can be chemically identified (Brack 2003). Toxicity Identification and evaluation (TIE) is often used to determine which compounds in an original, nonextracted sample are responsible for the effects observed in an in vivo bioassay (see also below). Phase I of a TIE approach implies an iterative process of toxicity testing and chemical manipulations that make classes of contaminants biologically unavailable for the test organism. After Phase I, suspected compounds are identified in Phase II and their toxicity is confirmed in Phase III of the TIE procedure (Burgess 2000). Although no examples are currently available, in principle it should be possible to trace back the emerging compound identified (by EDA or TIE) and its corresponding toxicity profile to its source. Source tracing requires an elaborate sampling strategy through different systems in space or in time, for instance through a river basin or through the food chain.

Toxicity profiling and the Water Framework Directive

The expert group further discussed the use of toxicity profiles in the context of the European Water Framework Directive (WFD). The WFD discriminates between 3 types of monitoring, i.e., surveillance monitoring, operational monitoring, and investigative monitoring (EC 2000, 2003). Surveillance monitoring is carried out to supplement and validate procedures for impact assessment of water bodies, to design efficient and effective future monitoring programs, and to assess long-term changes in natural conditions and long-term changes resulting from widespread anthropogenic activity. Operational monitoring is carried out to establish the status of water bodies identified as being at risk of failing to meet their environmental objectives and to assess any changes in the status of such bodies resulting from programs of measures undertaken to reduce relevant pressures. Investigative monitoring is performed to ascertain the causes of failing to achieve the environmental objectives, or to ascertain the magnitude and impacts of accidental pollution. So far, possible bioassay applications within the WFD have been restricted to investigative monitoring and the EDA approach described above is a good example of bioassay use within this type of monitoring. The expert group, however, is convinced that the hazard profiling approach also deserves a place within the surveillance monitoring programs carried out within the WFD. With respect to chemical pollution, surveillance monitoring is currently restricted to the chemical analysis of 33 priority pollutants discharged into the river basin or sub-basin and other pollutants known to be discharged in significant quantities. To improve impact assessment, toxicity profiling is proposed as an extension to the surveillance monitoring toolbox. Even if analyzed individual pollutants meet environmental quality standards (EC 2008), toxicity profiling can serve as an effective first tier tool to indicate failure of good chemical status, which may be caused by nonanalyzed emerging pollutants not known to be discharged to the water body under surveillance and/or by mixtures of known and unknown pollutants. Finally, operational monitoring may benefit from bioassay techniques, for instance to enhance assessment of changes in water quality and link these to changes in chemical pressure.


Sample collection and pretreatment

Toxicity profiles can be determined for “spot samples” representing the pollution status at the moment of sampling, such as water and suspended particulate matter (SPM), but also for “continuous samples” representing the time-integrated exposure to contaminants, such as sediments, passive samples, and biota. For the purpose of hazard profiling (approach 1) and EDA (approach 3), experts agreed that testing of sample extracts is preferred above testing of original nonextracted samples, because in many cases extraction prevents matrix effects that may occur in the bioassays and results can be compared directly to chemically analyzed data. In addition, preconcentration of the samples increases the probability of finding effect, preferably in a dose-dependent way when low cost acute toxicity or fast screening assays are applied.

To obtain a worst case “total extract” for hazard profiling or EDA, the most exhaustive method possible is recommended (Figure 1). Nevertheless, some compounds will escape such an exhaustive extraction procedure and their contribution to the overall toxic potency of the sample remains unknown. Metals, for instance, will not be extracted using organic solvents, and volatile compounds will evaporate from the extracts during volume reduction. This uncertainty, which even may lead to false negative test results, is in conflict with the principle of using toxicity profiling as a “safety net.” On the other hand, the expert group confirmed that no uniform extraction method exists with equal affinity for all possible contaminants in the environmental samples. Eventually, extraction methods using multiple solvents and/or steps could be used to obtain a true worst case exposure scenario.

It is important, however, to know the limitations of the extraction methods, so that the corresponding uncertainty can be qualified. Especially for abiotic samples, uncertainty is introduced by the fact that physicochemical conditions of the samples may change after sampling (e.g., redox status, pH, surface characteristics), which may lead to (rapid) degradation or other modification of toxic contaminants in the sample. Therefore, transport and storage of the samples should be restricted to a minimum, and extraction should be performed as soon as possible.

For the purpose of risk profiling (approach 2), it is important to realize that bioavailability is not taken into account when toxicity profiles are determined on exhaustive extracts from water, SPM, and sediment samples. However, toxicity profiling can be extended with other types of samples that represent more realistic exposure scenarios taking bioavailability into account. For some bioassays, sample extraction is not necessary and nonconcentrated, original sample material, representing the abiotic environment as it is in situ, can be tested directly for its toxic potency (Figure 1), sometimes after enrichment of the sample with bioassay-specific critical agents (e.g., medium, buffer, ions). To determine the potency of pollutants to decrease bioluminescence by marine Vibrio fischeri bacteria, for instance, nonconcentrated freshwater samples only need to be adjusted for salinity by adding 20 g/L NaCl (Perez et al. 2010). Similarly, sediment samples may be used for direct contact assays with bacteria (Rönnpagel et al. 1995), invertebrates (Ingersoll et al. 1995), or fish (Hollert et al. 2003). Alternatively, passive samplers for hydrophobic compounds can be used to collect time-weighted average concentrations of the freely dissolved organic pollutants that are available for passive diffusion (Vrana et al. 2005). Extracts from such biomimetic substrates, which are considered to be a more realistic reflection of the bioavailable fraction of pollutants, can be tested in both in vitro (Villeneuve et al. 1997) and in vivo bioassays (Petty et al. 2000). The most realistic exposure scenario accounting not only for passive diffusion of the available fraction but also for active uptake and biotransformation is obtained by testing extracts of biota samples from organisms living in the field. Such bioavailable and biotransformed extracts (Figure 1) should preferably be prepared from nondestructively collected biota samples such as blood, fat pads, or hair. For instance, Simon et al. (2011) determined the potency of blood accumulative pollutants extracted from polar bear plasma samples to interfere with thyroid hormone transport proteins. In case of such biota sampling for bioassay testing, the experts stress the importance to determine in parallel the general health status of the sampled organism using simple morphometrics and possible (nondestructive) biomarkers. Thus, the toxicity profile based on bioassay responses to the (internal) exposure to toxic compounds can be validated by the health status of the very same organisms determined as (early warning) effects (Figure 1).

Bioassay methods

For each extract, a toxicity profile is obtained by testing its activity toward a battery of toxicological endpoints, which can be determined in either in vivo bioassays using whole organisms or in vitro test bioassays using parts of organisms (cells, tissues, proteins). Both in vivo and in vitro bioassays show a measurable and potentially biologically relevant response after exposure to the complex mixture of contaminants (Figure 1).

In vitro bioassays are often used to determine effects on generic endpoints, such as respiration, growth, or cytotoxicity, or on specific endpoints, such as specific enzyme induction and inhibition, DNA binding and damage, or gene expression. In vivo bioassays often require bigger test volumes and are often performed to determine generic endpoints, such as survival, growth, or reproduction of the tested organism. In vivo bioassays, however, can also be used to determine specific effects on functioning organisms that cannot be determined in vitro (e.g., effects on the immune system or the nervous system), or to determine biomarker responses that are specific for a well-known mode of action (e.g., Cytochrome P450 (CYP) induction or increased levels of DNA adducts).

The experts underline the advantage that in vitro bioassays require relatively small volumes, which result in higher test concentrations and consequently an increased chance of finding a response (“early warning”). In addition, in vitro bioassays generally have shorter test durations, making them more cost-effective. Therefore, the expert group supports the use of a strategically selected battery of specific in vitro bioassays covering the spectrum of toxic and ecologically relevant syndromes as well as possible. On the other hand, the expert group stresses that in vivo tests should not be excluded on beforehand from the toxicity profiling test battery because intact organisms yield information on functions that cannot be determined in vitro (e.g., growth, condition, reproduction). In addition, in vivo effects at the organism level of biological organization have a higher predictive value and a higher relevance for ecological effects at the population level or higher, than molecular or cellular level in vitro effects.

In vivo bioassays can make use of organisms from well-characterized laboratory cultures or from feral animals collected in the field. Although the latter have higher ecological relevance, the experts agreed that field organisms should not be used for toxicity profiling purposes, because their populations are often too heterogeneous (distribution in age, size, and development, suffering from multiple stresses) for toxicity testing. Of course, biomarker measurements in feral organisms can be used for in situ risk assessment purposes, but such monitoring approaches fall outside the scope of the current article.

Quality aspects and dealing with uncertainties

The reliability of the toxicity profiling depends on the level of validation and standardization of the experimental procedures, ranging from sample collection, transport, and storage, to extraction and ultimate bioassay testing. Method validation is the process of verifying that a method is fit for its intended purpose. A protocol for the validation of chemical and biological monitoring methods, including sampling methods, has been developed within the NORMAN network (Schwesig et al. 2009). Standardization implies that the test preparation and procedure is well defined and comprehensible, allowing comparability and reproducibility of test results. Standardization is especially important when bioassays are used for regulatory purposes and several standardization bodies (e.g., OECD, ISO, ICCVAM) have evaluated protocols for in vitro bioassays. The vast majority of bioassays described in literature, however, are carried out according to in-house developed protocols. According to the expert group, at least the following quality aspects should be taken into account when using such nonstandardized bioassay methods, i.e., testing of dilution series, testing of a positive reference compound, testing of a reference sample, testing of process controls, checking for cytotoxicity, replication of measurements, assessment of test power, and adequate statistics.

Bioassays that have not been internationally standardized may still generate useful information. In the ToxCast program, for instance, US Environmental Protection Agency (USEPA) makes preliminary toxicity evaluations of greater than 1000 environmental chemicals using approximately 600 high throughput screening (HTS) bioassays (Kavlock et al. 2012). The majority of these assays have not been internationally standardized, but the large amount of test results can contribute to the weight of evidence in a toxicity evaluation. A similar approach can be adopted for toxicity profiling purposes, using similar criteria for inclusion in the test battery as in ToxCast, i.e., (except for the HTS criteria) linkage to known MoA, ability to test concentration response, and minimal false negatives (Dix et al. 2007).

Finally, additional uncertainties and assumptions should be explicitly mentioned when performing toxicity profiling on environmental samples, such as limitations of pretreatment procedures, possible changes in bioavailability during testing (e.g., due to absorption of the extracted compounds to the plastic wall of the test container), or the validity of the concentration addition principle for the complex mixture of unidentified contaminants present in the samples.


The authors acknowledge the NORMAN network and especially its Executive Secretary Dr. Valeria Dulio (INERIS, France) for supporting the organization of the expert meeting.