Statistical analysis of cytochrome P4501A biomarker measurements in fish



Induction of the cytochrome P4501A (CYP1A) enzyme system in fish is a common biomarker of exposure to aromatic hydrocarbons. Induction of CYP1A can be measured at a number of steps in the transcription—translation—functional protein pathway using a variety of techniques. The present study examined the range of these measurements from 94 published papers in an attempt to examine the statistical characteristics of each method. Cytochrome P4501A induction, as measured by catalytic ethoxyresorufin-O-deethylase (EROD) activity, protein levels (enzyme-linked immunosorbent assay, Western blot analysis, and immunohistochemistry), and mRNA levels (Northern blot analysis and reverse transcription—polymerase chain reaction), was analyzed. When possible, the variance structure, effect size determination, and dose—response modeling of each method of measurement in the laboratory and field were examined. Conclusions from this analysis include: 1) Because of interlaboratory and interspecies variability, general end-point determinations will need to be defined in terms of the statistically detectable fold-change of measurements relative to control or reference values, and 2) fold-change in EROD activity provides the most robust measure of the dose responsiveness of aromatic hydrocarbons within specific chemical classes (e.g., polycyclic aromatic hydrocarbons). The relationship between the ability to measure statistical differences in induction level and the biological significance of those measurements has yet to be defined. To utilize these biomarkers in a risk assessment context, this relationship must be addressed at the scientific and management levels.


The cytochrome P450 (CYP) gene family encodes a large group of enzymes organized into families and subfamilies based on the homology of their amino acid sequences that catalyze a wide variety of monooxidation reactions, including epoxidation, hydroxylation, and dealkylation [1]. Cytochrome P450 proteins are involved in a number of normal cellular activities, including fatty acid metabolism and steroid synthesis [2]. Also, CYP proteins, particularly the CYP1A proteins, play an important role in the biotransformation of planar aromatic hydrocarbons, such as polycyclic aromatic hydrocarbons (PAHs) [1–5].

Induction of the CYP1A enzyme system is probably the most well-studied biomarker of exposure to planar aromatic hydrocarbons [1–5]. Induction of CYP1A can be measured at discrete steps in the transcription—translation—functional protein pathway (e.g., mRNA, protein, and catalytic function) using a number of techniques (e.g., polymerase chain reaction, antibodies, and enzyme activity). A general search of the literature reveals that more than 7,100 papers were published between 1981 and 2007 using the CYP1A system to detect exposure to or suggest toxicity by aromatic hydrocarbons. More than 1,200 of these papers were either in vivo or in vitro studies with fish. Several extensive reviews of the literature on the use of CYP1A as a biomarker in aquatic systems have been published [3–5]. All of these reviews discuss the utility of using CYP1A end points as measures of exposure to a wide variety of planar aromatic hydrocarbons and, perhaps, the potential for using these end points as indicators of both exposure and effect.

A review by Whyte et al. [5] focused on one end point, the ethoxyresorufin-O-deethylase (EROD) assay in fish, and listed 619 citations. A conclusion of Whyte et al. [5] was that a performance-based set of standards should be developed for the use of EROD (and, presumably, other CYP1A assays) as an index of exposure to and possible effect of aromatic hydrocarbons in fish. Since the publication of that study, more than 145 additional papers have been published that discuss the use of EROD, but the trend in the literature during the past several years has been to include the analysis of gene induction using molecular or immunochemical techniques. Regardless, we concur with the conclusions of Whyte et al. [5] and promote the use of performance-based standards for all types of CYP1A biomarker assays.

To develop such standards, it is important to study typical characteristics of the different biomarker assays. Performance-based characteristics that should be examined include constitutive levels of CYP1A activity or expression in control or reference animals, within- and among-animal variability in these measurements, commonly used sample sizes per treatment group, ability to detect differences in CYP1A activity or expression between control/reference and treatment groups, and the slope, linearity, and linear range of dose responsiveness of CYP1A assays across classes of chemical compounds. The question of whether constitutive levels and inducibility in the laboratory are comparable to field situations needs to be elucidated, laboratory-to-laboratory variability needs to be accounted for, and the best way to report bioassay results to make comparisons among treatments or across studies needs to be determined. The reviews concerning the use of CYP1A biomarkers in the aquatic environment published thus far have provided strong summaries of the literature, but to our knowledge, no study has included a systematic analysis of the statistical properties of the various assays across species, location (e.g., laboratory or field), and investigators.

In an attempt to begin the discussion of performance-based standards of CYP1A biomarker assays, we conducted an analysis of data summarized from the peer-reviewed literature (i.e., a meta-analysis). Our focus in this initial study was on fish and on the response of the CYP1A system to PAH exposure. Thus, we did not look at the biomarker assays in relation to dioxin-like compounds or polyhalogenated compounds. The goal of the present study, therefore, was to examine the peer-reviewed literature and determine the statistical characteristics of CYP1A biomarker assays in fish in response to PAH exposure.


A search of peer-reviewed literature was conducted using keywords selected to target studies of CYP1A induction in liver of teleost fish by aromatic hydrocarbons. A total of 227 studies were revealed by this search for the period from 1981 to 2006. We searched the literature using broad keywords and multiple databases, but we do not claim that the search uncovered an exhaustive list of studies published during this period. The search was intended to provide a representative cross-section of the literature, not to discover all published papers. Data were revealed from studies analyzing the induction of CYP1A in liver by the measurement of catalytic activity (EROD), protein levels (enzyme-linked immunosorbent assay, Western blot analysis, and immunohistochemistry [IHC]), and mRNA levels (Northern blot analysis and reverse transcription—polymerase chain reaction [RT-PCR]). To be retained in the meta-analysis data set, a study needed to present, in tabular or graphical form, measurements from livers of animals with clearly identified units of measure, sample size per treatment, and a source of variability (i.e., standard deviation or standard error). To examine constitutive levels of CYP1A, control (unexposed treatments from laboratory studies) or reference (clean or reference site treatments from field studies) taken from studies of aromatic hydrocarbons in general (e.g., PAHs, polychlorinated biphenyls, and dioxins) were retained. To examine dose—response relationships, only studies conducted in the laboratory that used intraperitoneal (IP) injections of known doses of PAHs were retained in the meta-analysis data set.

All studies were categorized and coded by assay type, study type (laboratory or field), taxonomic group (family and genus), and author group. Author groups were determined by examination of author lists on each paper and cross-referencing to common authors. This category was used to determine interlaboratory differences in performance characteristics of control/reference data, assuming that the same standard operating procedures were used among papers with the same author list or with an investigator determined to be a laboratory leader or mentor common to those papers. Based on the information available in the papers retained for analysis, EROD activity and mRNA levels (as measured by RT-PCR) were used to explore constitutive CYP1A levels in control/reference samples. Ethoxyresorufin-O-deethylase activity and protein levels measured by Western blot and IHC were used to explore dose—response relationships.

Statistics were calculated to analyze basic univariate features of control and reference background, or constitutive, expression [6]. The unit of replication was individual studies on independent groups of fish. In the case of a published paper that examined more than one species of fish or conducted more than one experiment, these were considered to be independent replicates. A standard power analysis [7] using α = 0.05 and β = 0.2 (power = 1 — β = 0.8) was applied to control and reference site data to determine the minimum difference needed to detect statistically significant, two-group comparisons of CYP1A induction. For EROD, the analysis end points examined were catalytic activity (pmol/min/mg protein), minimum detectable increase in catalytic activity, and minimum detectable fold-change in activity. For mRNA levels measured by RT-PCR or Western blot analysis and for protein levels measured by IHC, only the fold-change detectable difference was analyzed. Mixed-model analysis of variance (ANOVA; Proc Mixed in SAS® 9.1 [8]) was used to partition variance for analysis end points for constitutive EROD expression among laboratory controls versus field reference sites (fixed effect), taxonomic group (fixed effect), and author group (random effect). For dose-response relationship analysis, PAH levels were converted to molar values μM/kg) and scaled to inducibility equivalency factors (IEFs) to provide a common dosing scalar to compare across studies. The IEFs were derived from comparative studies in the literature with more than one PAH examined and were based on the induction by 3-meth-ylcholanthrene (3MC) set to unity [9–12]. Thus scaled, laboratory dose-response studies were analyzed using least-squares regression analysis [8] and multifactor ANOVA [8] to examine generalized responses of different CYP1A measurements to PAHs across species and author groups.


Using the above selection criteria, 94 papers were retained for the analysis of EROD control/reference data [12–105], and 27 of these papers were selected for further analysis of EROD dose-response relationships [12–38]. The papers chosen for analysis of the EROD control/reference data set had a total of 78 unique author groups (which we equated to individual laboratories), 42 different genera of fish, and 21 different families of fish. Based on the criteria for defining a study, the final analysis data set for EROD contained a total of 174 studies, including 92 laboratory studies [12–76] and 82 field studies [77–105]. Author groups originated from North America, South America, Europe, Asia, and Australia. The number of studies with replication was insufficient to examine the data at the level of genus or species; thus, we conducted all further analysis at the family level. Seven families of fish, including both marine and freshwater fish, comprised 80% of all studies examined. The most commonly studied fish (number of studies in parenthesis) were in the Salmonidae (28), Cyprinidae (18), Ictaluridae (13), Anguillidae (11), Pleuronectidae (11), Gadidae (6), and Mugilidae (4). Apogonidae, Centrachidae, Clupeidae, Escosidae, Percidae, and Sparidae had two studies each. Eight other families were included that had one study each.

Fifteen papers equating to 44 studies were retained for the analysis of RT-PCR control/reference data [106–120], but none met the requirements to be included in the dose-response analysis. No data from Western blot analysis or IHC were used in the control/reference examination, but two papers equating to two studies using Western blot analysis [33,38] and two papers equating to three studies using IHC [26,33] met the criteria for inclusion in the dose-response examination. Thus, the total number of studies among all analysis end points was 223. Pertreatment sample sizes used in these studies across all end points analyzed had a range of 2 to 46, with a median of five fish per treatment group and the 25 to 75% quartiles ranging from eight to nine fish per treatment.

Figure Fig. 1..

Overlay scatterplot and boxplot of ethoxyresorufin-O-deethylase (EROD) activity for laboratory controls and field reference sites. Individual data points are jittered within the boxplot boundaries to avoid overlapping points. Boxplots show the median (thick line within boxes) and the range of the 25th to the 75th percentiles of data (boxes). Lines extend above and below the boxes to ± 1.5-fold of the interquartile range, a typical measurement to illustrate potential outliers [6].

Control/reference analysis

Data for constitutive expression of control or reference EROD data were examined for general statistical characteristics and for possible differences between laboratory and field studies. When EROD was expressed as catalytic activity, control and reference values averaged approximately 46 pmol/min/mg protein (median, 25 pmol/min/mg protein; mode, 10 pmol/min/mg protein) and were highly variable. Laboratory controls expressed a slightly higher, but not significantly different (α = 0.5), median catalytic activity with greater overall variability compared to field reference site samples (Fig. 1). Sample size per treatment also was slightly lower, but again not significantly different, in laboratory studies (median sample size, n = 5) compared to field studies (median sample size, n = 7) (Fig. 2). The greater variability and smaller sample sizes in laboratory studies resulted in a lower power to detect differences among treatments (median minimum detectable difference, 56 pmol/min/mg protein) compared to field studies (median minimum detectable difference, 35 pmol/min/mg protein) (Fig. 3). The ability to detect differences among treatments between laboratory and field studies was normalized and the variance stabilized when EROD was expressed as detectable difference in the fold-change of EROD activity relative to control (median minimum detectable difference, 1.79-fold) or reference organisms (median minimum detectable difference, 1.67-fold) (Fig. 4).

Figure Fig. 2..

Overlay scatterplot and boxplot of laboratory control and field reference site sample sizes. Information is presented in the same format as described for Figure 1.

Figure Fig. 3..

Overlay scatterplot and boxplot of the detectable difference in ethoxyresorufin-O-deethylase (EROD) activity for laboratory controls and field reference sites. Information is presented in the same format as described for Figure 1.

Considering that the variation in EROD activity could have resulted from interlaboratory variation (i.e., author group), taxonomic group, or location (i.e., laboratory vs field), mixed-model ANOVAs were performed using author group as a random variable and taxonomic group at the family level and study location (field vs laboratory) as fixed variables. Analyses of variance were run for EROD analysis end points of EROD activity, detectable increase in EROD activity, and detectable fold-change in activity relative to control/reference treatments. When EROD was expressed as catalytic activity, the analysis indicated significant effects caused by author group, taxonomic group, and location (Table 1). When EROD was expressed as detectable change in activity, significant effects were indicated for author group and taxonomic group, but location was not significant in the model. When EROD was expressed as detectable fold-change, none of the factors contributed significantly to the model, indicating that a broad range of EROD biomarker studies are comparable across analysis laboratories, taxonomic groupings, and laboratory and field studies when detectable levels of inducibility are used to index the outcome of the EROD assay.

Figure Fig. 4..

Overlay scatterplot and boxplot of the detectable difference in fold-change ethoxyresorufin-O-deethylase (EROD) activity for laboratory controls and field reference sites. Information is presented in the same format as described for Figure 1.

The only meaningful comparison for control/reference data concerning mRNA expression measured by RT-PCR was detectable fold-change, and no apparent differences were found in this measure between laboratory and field studies for RT-PCR. A comparison of detectable fold-change for EROD and RT-PCR data (Fig. 5), however, indicates that RT-PCR may be a more sensitive measure of CYP1A induction. For laboratory and field studies combined, the median detectable fold-increase in EROD (1.73-fold) was greater than the median detectable fold-increase in RT-PCR (1.25-fold).

Table Table 1.. Summary of mixed-model analysis of variance for control and reference ethoxyresorufin-O-deethylase (EROD) laboratory and field studiesa
EROD analysis end pointLaboratory (common authors)Taxonomic group (family)Location (reference sites vs laboratory controls)
  1. a Laboratory (determined based on common authors) was used as a random variable, and taxonomic group and location (reference sites or laboratory controls) were used as fixed variables. Models were tested using EROD activity, detectable increase in activity, and detectable fold-increase in activity as the dependent variables. Values are presented as the p values for significance of effect in the model for the indicated variable.

Detectable increase0.0130.0230.305
Detectable fold-increase0.1850.9200.525
Figure Fig. 5..

Overlay scatterplot and boxplot of the detectable difference in fold-change in measurement end point for ethoxyresorufin-O-deethylase (EROD) compared to reverse transcription—polymerase chain reaction (RT-PCR) mRNA analysis across all studies (laboratory and field). Information is presented in the same format as described for Figure 1.

Dose-response relationships

Two types of dose-response relationships were examined. The first type modeled the relationship between EROD end points and laboratory-based, IP injection studies with PAH. The modeled relationship included 27 papers equating to 30 studies on 11 species with three PAHs (3MC: 12 studies, three species; benzo[a]pyrene [BaP]: 12 studies, seven species; β-naphthoflavone [βNF]: 14 studies, seven species) [12–38]. The best-fit, single-parameter, linear-regression model was found for the log of fold-change EROD relative to controls versus the log of PAH dose (converted to molar concentrations and scaled to IEFs relative to 3MC). This model was able to explain 68% of the variation in the dose-response of EROD (Fig. 6). A multifactor ANOVA was used to examine additional sources of variation in the EROD versus PAH dose-response relationship (Table 2). Factors considered in this analysis included log(IEF dose), species of fish, chemical, and interaction terms between dose and chemical and between dose and species. As expected, log(IEF dose) was highly significant in the model. In addition, species of fish was highly significant. Chemical also was significant despite the fact that the dose scale attempted to account for chemical differences through conversion to IEFs and use of molar concentration values. The addition of the factors species and chemical to the single-parameter model resulted in a highly significant model that accounted for 92% of the variation in the dose-response of EROD with these three PAHs (Table 2).

Figure Fig. 6..

Best-fit, single-parameter model of laboratory-based, intraperitoneal injection dose—response studies with polycyclic aromatic hydrocarbons (PAHs) on ethoxyresorufin-O-deethylase (EROD) activity in fish. Dose scale is expressed in terms of inducibility equivalency factors (IEFs) based on 3-methylcholanthrene as derived from comparative studies in the literature.

The second type of dose-response analysis was an examination of laboratory-based, IP injection studies to compare three different CYP1A measurement end points (EROD, Western blot analysis, and IHC). The modeled relationships included three papers equating to 11 studies on three species conducted with BaP and βNF (Western blot analysis: BaP, one study; βNF, one study; EROD: BaP, three studies; βNF, three studies; IHC: BaP, one study; βNF, two studies) [26,33,38]. Dose levels were scaled to IEFs for 3MC to remain consistent with the analysis of the first dose-response analysis above. The resulting linear models (Fig. 7) indicated that Western blot analysis had the steepest dose-response curve and explained the highest amount of variation in the data (r2 = 0.98), followed by EROD (r2 = 0.61), and IHC (r2 = 0.36). The slope of the IHC dose-response curve was not significantly different from zero (α = 0.05), indicating that the dose responsiveness of IHC methods may not be comparable outside of a single study.

Figure Fig. 7..

Comparison of laboratory-based, intraperitoneal injection dose—response studies with polycyclic aromatic hydrocarbons on cytochrome P4501A (CYP1A) protein in fish for different analysis end points: Western blot analysis (WBA), ethoxyresorufin-O-deethylase (EROD) activity, and immunohistochemical staining (IHC). Dose scale is expressed in terms of inducibility equivalency factors (IEFs) based on 3-methylcholanthrene as derived from comparative studies in the literature.

Table Table 2.. Analysis of variance of fold-change ethoxyresorufin-O-deethylase (EROD) activity in the dose-response studies shown in Figure 6a
Dependent variable: Fold-change EROD
  1. a Dose of chemical scaled to inducibility equivalency factor (IEF dose) and fish species were highly significant, and chemical was significant in the model. No significant interactions were observed. This model explains more than 91% of the variation in the dose-response studies examined. df = degrees of freedom; MSE = mean squared error, Pr = probability; SS = sum of squares.

Sourcedf SquaresMean squareFPr > F
Model14 3,226.496768230.46405518.59<.0001
Error24 297.45782112.394076  
Corrected total38 3,523.954590   
r2  Coefficient of variationRoot MSE
0.915590  34.189193.520522
Source dfType I SSMean squareFPr > F
Log (IEF dose) 11,082.0384291,082.03842987.30<.0001
Species 81,988.184057248.52300720.05<.0001
Chemical 184.06887884.0688786.780.0155
Log (IEF dose) *Chemical 15.9803345.9803340.480.4940
Log (IEF dose) *Species 366.22507022.0750231.780.1776


A fairly large, representative group of studies published during the past 25 years was used to explore the general characteristics of different measures of CYP1A expression as a biomarker of exposure to PAHs. The results of this analysis indicate that the majority of studies conducted during this period used catalytic activity, as measured by EROD activity, as an end point. Across all measurement end points, the sample sizes used per treatment were variable, but most studies used five organisms per treatment. Field studies used larger sample sizes (median sample size, n = 7), reflecting a common perception that field situations may give rise to higher within-treatment variation compared to that in controlled, laboratory experiments. Field studies, however, tended to have slightly lower amounts of variation for EROD compared to laboratory studies. Suggesting reasons for the slightly lower constitutive expression of EROD in the field compared to the laboratory (e.g., field stress or presence of CYP1A inhibitors or antagonists) would be highly speculative. Regardless of the reasons, one could recommend that to optimize experimental design between laboratory and field experiments, the sample sizes per treatment should be increased in the laboratory. If one were interested in equalizing the ability to detect CYP1A induction between EROD measurements and RT-PCR measurements, power analysis indicates that sample sizes for EROD measurements would need to be increased to more than 40 organisms per treatment—clearly an unreasonable recommendation. Thus, in terms of cross-study comparisons in a broad range of studies, the use of RT-PCR as a measure of CYP1A induction is more sensitive compared to EROD within the realm of realistic experimental designs.

Expressing EROD activity in terms of detectable fold-change difference relative to control or reference fish decreased and stabilized variability in the measurements and allowed comparisons across studies of EROD activity regardless of fish taxon, author group, or study location (Figs. 2–4 and Table 1). In general, the use of detectable fold-change differences in all measurement end points allowed the comparison of data among studies and the development of general dose—response relationships for PAH and CYP1A induction. The use of CYP1A inducibility of fish in studies as an index of PAH exposure would allow the development of performance-based standards for CYP1A biomarker studies. For example, individual scientists could use the analysis presented here to establish performance objectives for laboratory or field studies to maintain alignment with the general scientific community, or the scientific community could use this analysis to set performance-based standards for CYP1A biomarker assays. Expressing results in terms of fold-change relative to control or reference fish also could lead to the development of quantitative models of dose—response relationships, allowing these biomarkers of PAH exposure to be used as more quantitative bioassays of exposure and exposure level. Based on our analysis, the dose—response relationships presented herein are promising, but these relationships will need further, specific study and validation before they can be established as true exposure bioassays. If one is interested in comparing the results of studies in a quantitative way, it is important to choose end points that are comparable across studies. All of the end points examined in the present analysis met this objective except for IHC. Because the typical method of analysis involves experimenter-specific judgment about staining intensity and is scored on an ordinal scale, IHC, in our opinion, does not currently allow quantitative comparisons across or among studies. This is not to say, however, that IHC is not useful or valid or that it could not be adapted for use in these comparisons (e.g., if one were to use optical density instead of ordinal scoring).

A common question from managers to scientists in recent years has been, “what is significant induction?” This question can have multiple meanings, only some of which can be addressed with the results presented in this paper. The meta-analysis conducted herein can address the question, “across a broad range of studies, species, laboratories, and locations, what is the ability to detect a statistically significant induction in CYP1A?” Our analysis also can begin to address the question “Can CYP1A be used as a quantitative bioassay of PAH exposure?” We caution that the dose—response relationships presented in this analysis, however, were all laboratory studies with known amounts of IP-injected PAH. Conditions in the field will certainly alter the bioavailability and metabolic activity of fish. What the present study cannot address (and one that no other study to date can address) is the question “At what point does CYP1A induction indicate that toxicity in the form of mortality or decreased fecundity is going to occur?” With current methods and technology, statistically significant induction of CYP1A indicates exposure to planar aromatic hydrocarbons, including dioxins, PCBs, and PAHs. Many scientists conclude that the induction of CYP1A indicates an increased risk of toxicity [5], and evidence suggests that the toxicity of dioxin-like chemicals is mediated through CYP1A-dependent pathways (see, e.g., [121]). Even for those chemicals, however, it is not clear what level of CYP1A induction indicates a significant risk of toxicity. Thus, a natural question is whether there exists a CYP1A induction level because of PAH exposure that is predictive of toxicity. One paper has attempted to establish CYP1A induction by PAHs in weathered oil as a bioindicator of population-level harm in fish [122]. These results have largely been refuted, however, based on recent evidence that toxicity from PAHs in fish and mammals is mediated by both CYP1A-dependent and -independent pathways [111,121,123,124]. These issues will continue to make the incorporation of CYP1A data into risk assessments difficult. The development of performance-based standards and quantitative exposure bioassays using CYP1A biomarker measurements will lead to an improvement in this incorporation, but the use of these measurements in risk assessment (especially in the effects-characterization phase) will need to be addressed at both the scientific and management levels.


The authors would like to thank A.J. Bailer (Miami University) for his helpful discussions, consultation on statistical approaches, and assistance with the “R” programming language during preparation of this manuscript. These studies were supported in part by Miami University, the University of North Texas, and the ExxonMobil Corporation. The content, discussion, and conclusions presented herein constitute the opinions of the authors and may not represent the opinions or positions of the supporting agencies.