• Open Access

Relevance Weighting of Tier 1 Endocrine Screening Endpoints by Rank Order

Authors


Abstract

Weight of evidence (WoE) approaches are recommended for interpreting various toxicological data, but few systematic and transparent procedures exist. A hypothesis-based WoE framework was recently published focusing on the U.S. EPA's Tier 1 Endocrine Screening Battery (ESB) as an example. The framework recommends weighting each experimental endpoint according to its relevance for deciding eight hypotheses addressed by the ESB. Here we present detailed rationale for weighting the ESB endpoints according to three rank ordered categories and an interpretive process for using the rankings to reach WoE determinations. Rank 1 was assigned to in vivo endpoints that characterize the fundamental physiological actions for androgen, estrogen, and thyroid activities. Rank 1 endpoints are specific and sensitive for the hypothesis, interpretable without ancillary data, and rarely confounded by artifacts or nonspecific activity. Rank 2 endpoints are specific and interpretable for the hypothesis but less informative than Rank 1, often due to oversensitivity, inclusion of narrowly context-dependent components of the hormonal system (e.g., in vitro endpoints), or confounding by nonspecific activity. Rank 3 endpoints are relevant for the hypothesis but only corroborative of Ranks 1 and 2 endpoints. Rank 3 includes many apical in vivo endpoints that can be affected by systemic toxicity and nonhormonal activity. Although these relevance weight rankings (WREL) necessarily involve professional judgment, their a priori derivation enhances transparency and renders WoE determinations amenable to methodological scrutiny according to basic scientific premises, characteristics that cannot be assured by processes in which the rationale for decisions is provided post hoc.

Abbreviations
ADME

absorption, distribution, metabolism, and excretion

AMA

amphibian metamorphosis assay

AR

androgen receptor

ARBA

androgen receptor binding assay

CYP

cytochrome P450 enzyme

DHT

dihydrotestosterone

EDSP

Endocrine Disruptor Screening Program

EPF

Endocrine Policy Forum

ER

estrogen receptor

ERBA

estrogen receptor binding assay

ERTA

estrogen receptor transcriptional activation assay

ESB

Endocrine Screening Battery

FSTRA

fish short-term reproduction assay

GSI

gonad-somatic index

HPG

hypothalamic–pituitary–gonad

HPT

hypothalamic–pituitary–thyroid

Kd

dissociation constant

LABC

levator ani-bulbocavernosus muscle complex

LH

luteinizing hormone

MoA

mode of action

MTD

maximum tolerated dose

PPS

preputial separation

T3

3,5,3′-triiodothyronine, the active thyroid hormone

T4

thyroid prohormone

the Panel

expert panel of scientists from the Endocrine Policy Forum

TR

thyroid receptor

TSH

thyroid-stimulating hormone

VTG

vitellogenin

WoE

weight of evidence

WREL

relevance weight

WRES

response weight

INTRODUCTION

Regulatory agencies have consistently agreed that endocrine disruption must be evaluated by weight of evidence (WoE) procedures (U.S. EPA, 2011a; Organization for Economic Cooperation and Development [OECD], 2012). An objective, transparent framework for hypothesis-based WoE evaluations of endocrine screening and testing data has been published (Borgert et al., 2011a), referred to hereafter as “the Framework.” Although the focus of that publication was evaluation of data from Tier 1 of the U.S. EPA's Endocrine Disruptor Screening Program (EDSP), the proposed approach and principles apply to all toxicological and pharmacological data irrespective of the purpose or program for which data were generated.

Within the context of the U.S. EPA's Tier 1 EDSP, a broad array of data are to be considered and several separate WoE evaluations are implied (Borgert et al., 2011a). For each WoE evaluation, the Framework recommends that “weight” and “evidence” be clearly defined to enhance transparency, consistency, and credibility. Scientific “evidence” has been defined (Gori, 2010) and is evaluated by the Framework according to primary, secondary, and tertiary validity of the data. Primary validity pertains to the soundness and quality of the measurements made in a scientific investigation. Secondary validity pertains to the thoroughness and transparency of reporting. Tertiary validity addresses the probative power of the study design and its relevance to the questions posed. These concepts are well established and when combined with recommendations for transparent reporting of literature search and selection procedures similar to those used for systematic clinical reviews, can be used to evaluate all toxicological studies (human epidemiology results, laboratory animal investigations, and mechanistic studies) used in regulatory decision making (McCarty et al., 2012). It is important to keep in mind that in the context of the Tier 1 Endocrine Screening Battery (ESB), “evidence” pertains to the potential of a substance to operate via specific modes of action (MoAs); it does not pertain to or predict an ability to cause any particular type of effect, including adverse effects.

“Weight,” on the other hand, implies that all data do not contribute equally to answering the question posed. Thus, “weighting” involves a careful consideration of the specific hypothesis to be evaluated and how each particular measurement (data) informs that hypothesis. Ideally, weight would be assigned quantitatively (Borgert et al., 2011a) based on objective measurements of predictive power, false-positive and false-negative detection rates, and potency or strength of the response. This would avoid the biases inherent in professional judgments. In practice, however, quantitative rankings are not possible because the predictive capacity of most toxicological assays for potential adverse human health effects is unknown, and this is particularly the case for endocrine-mediated toxicity. Therefore, qualitative rankings are necessary and appropriate, acknowledging that some reliance on professional judgment is unavoidable. Nonetheless, objectivity and transparency are the overriding goals.

To be of practical utility and fully transparent, two questions need to be satisfied by a WoE methodology: (1) Is sufficient methodological detail provided so that independent analysts can apply the WoE evaluation method reliably and consistently? and (2) if independent analysts were to apply the WoE method and reach different conclusions, is the methodological detail sufficiently clear to allow a determination of why different conclusions were reached, for example, did different conclusions result from different applications of the WoE method or from different interpretations of the available data based on the same application of the method? According to McCarty et al. (2012), two factors will enable a WoE methodology to satisfy those questions: (1) the process used to weight various types of data, including its literature basis, must be clearly articulated; and (2) the weightings themselves must be derived a priori and applied consistently. The Framework employs two measures of “weight.” The relevance of each endpoint is assigned a weight, WREL, according to its importance for evaluating a specific hypothesis. The strength of response produced by the test chemical in a particular assay or endpoint is also given weight, a value deemed the response weight, “WRES” (Borgert et al., 2011a).

This article addresses the assignment of relevance weighting values (WRELs) and reports the recommendations of an expert panel of scientists from the Endocrine Policy Forum (EPF) (the Panel) convened to derive a consensus rank ordering of endpoints for evaluating each of the eight hypotheses (Borgert et al., 2011a) addressed by the U.S. EPA's Tier 1 ESB. The Panel's recommendations are their collective scientific judgment as informed by their experience with the assays and the relevant scientific literature cited herein. These recommendations represent a transparent set of assumptions that can be used to conduct a WoE evaluation for any particular chemical, and which will produce clear and consistent WoE evaluations for substances subjected to the Tier 1 EDSP ESB.

These proposed rankings are not envisioned as the final word on weighting of the ESB endpoints for endocrine screening or as a set of definitive rankings for endocrine endpoints. Rather, it is hoped that these recommendations will encourage broader and deeper scientific discussion about the relative strengths of the specific endpoints used for evaluating and interpreting potential endocrine activity. These proposed rankings may serve as a useful starting point for a full evaluation of Tier 1 battery performance based on data from the initial round of screening under Tier 1 Test Orders (for a review, see Borgert et al., 2011b).

DEVELOPMENT OF RELEVANCE WEIGHT (WREL) RANKS

The consensus “Relevance Weights” (WREL) for endpoints evaluated in the 11 Tier 1 endocrine screening assays were assigned by rank ordering for each of the eight hypotheses, as summarized above. These are listed in Tables 1 through 8 as rankings 1, 2, or 3 by hypothesis, with Rank 1 being the most relevant and Rank 3 the least relevant, to the specific hypothesis. Endpoints not listed for a particular hypothesis were considered not relevant to its evaluation. In determining the WREL rankings, the Panel developed several principles and the following definitions:

  • Rank 1: The endpoints are specific and sensitive for the hypothesis, are interpretable without knowing the response of other endpoints, and are in vivo measurements rarely confounded by artifacts or nonspecific activity.
  • Rank 2: The endpoints are specific and sensitive for the hypothesis, are interpretable without knowing the response of other endpoints, but are less informative than Rank 1, often due to potential confounding influences; includes in vitro and in vivo endpoints.
  • Rank 3: The endpoints are relevant for the hypothesis, but only when corroborative of Ranks 1 and 2 endpoints; includes some in vitro and many apical in vivo endpoints.
Table 1. Estrogen Agonist Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
FSTRAERTAERBA
Vitellogenin: increased in malesER agonismER competitive binding affinity
Uterotrophic assayFSTRAFSTRA
Uterus weight (wet/blotted): increased
  • Secondary sexual characteristics: reduced tubercle score: males
  • Gonad histopathology: males
  • Behavior: males
  • Pubertal female assay
  • Age and weight at vaginal opening: reduced
  • Age at first estrous: reduced
  • Ovary weight: reduced
  • Ovary histopathology: altered
  • Pubertal male assay
  • Testes weight
  • Testes histopathology: atrophy
  • Uterotrophic assay
  • Conversion to estrous supplemental
  • Fecundity
  • Estradiol
  • Testosterone
  • Behavior: females
  • Gonad-somatic index: reduced in males, increased in females
  • Gonad histopathology: follicular atresia
  • Fertilization success
  • Pubertal female assay
  • Growth
  • Estrous cyclicity
  • Pubertal male assay
  • Growth
  • Ventral prostate weight
  • Epididymides histopathology
  • Steroidogenesis assay
  • Estradiol levels

In the definitions above, “interpretable” means that the results for an endpoint provide information relevant to the hypothesis, without clarification from other endpoints. However, consistent with EPA policy (U.S. EPA, 2011a), a fundamental principle and requirement of the Framework is that no hypothesis can be decided on the results of a single assay (Borgert et al., 2011a). Whether a hypothesis is supported requires consideration of results from all relevant assays and endpoints (Ranks 1, 2, and 3).

Strength of Response

If an assay or endpoint is specific for a particular MoA (stated as hypotheses in the Framework), then strength of response is a reflection of the potential potency of the test substance for that MoA. In fact, whether a substance is capable of interacting with the endocrine system via any particular hormonal MoA is dependent upon it's ability to exhibit sufficient potency in vivo (Borgert et al., 2013). Thus, the relevance of an endpoint for deciding any particular hypothesis, that is, its WREL, depends to some extent on its ability to provide information on strength of response. The framework assigns the strength of response a specific weighting (WRES), but regardless of the means by which it is considered, the strength of the response should be taken into account wherever possible. Some assays in Tier 1, such as the estrogen receptor transcriptional activation assay (ERTA), provide a means of assessing relative potencies by comparison to a positive control, and all assays allow a comparison of the dose required to elicit a response with known circulating levels of endogenous ligands or endocrine-active pharmaceuticals (Borgert et al., 2012).

In Vivo versus In Vitro Endpoints

In vitro bioassays typically detect upstream events within hormonal pathways and exhibit a high degree of specificity and sensitivity for detecting interactions with specific components of the pathway, for example, interactions of the test article with a receptor or enzyme. Such in vitro assays can provide “potency” data, based on binding affinity, enzyme inhibition kinetics, or similar measures. In vitro assays cannot, however, reliably predict the overall biological effects (i.e., downstream events) induced by a compound in vivo (Bovee and Pikkemaat, 2009). In vitro assays are also, in most cases, deliberately over-responsive (compared with many in vivo systems) toward chemicals that bind to a particular receptor (OECD, 2012).

Whereas in vivo assays allow for the evaluation of the total potential response resulting from direct and indirect mechanisms of hormonal action, in vitro assays allow for the evaluations of only those responses occurring through an individual type of system under study and do not account for in vivo processes such as absorption, distribution, binding to serum proteins, metabolism, elimination, and other pharmacokinetic processes (Gray et al., 2004; Rozman et al., 2006; Marty et al., 2011; Vitale et al. 2012). Although in vitro assays are aimed at mechanistic specificity, they can nonetheless be confounded by cytotoxicity, solubility limits, oxidative stress that compromises cell membrane integrity or transport, etc. These limitations of in vitro assays can produce false-negative results for substances that require metabolic activation, and false-positive results for both low potency substances tested at high concentrations and for compounds that undergo rapid metabolism to inactive moieties (Gray et al., 2004). Of note, high concentrations in in vitro assays are typically required in EDSP Tier 1 screening protocols, consistent with the intent to minimize false negatives at the expense of false positives within the ESB. For these reasons, Rank 1 includes only in vivo endpoints.

Ambiguity and Confounding Influences in Specific Assays that May Affect Several Hypotheses

Ambiguities in assay interpretation that affect only specific hypotheses are mentioned under the appropriate headings, but some general ambiguities and potential confounding influences are broadly applicable. These are discussed here for efficiency.

The male (EPA 890.1500) and female (EPA 890.1450) pubertal assays can detect a variety of hormonal mechanisms and may be particularly useful in cases where the hypothalamic–pituitary–gonadal (HPG) or hypothalamic–pituitary–thyroidal (HPT) axes are affected (Ankley and Gray, 2013). However, the endpoints measured in these assays are not mechanistically specific and so there is ambiguity in their interpretation. This ambiguity is exacerbated by the potential for confounding by systemic toxicity. For example, many of the endpoints can be influenced by reduced body weight gain, which is likely to occur because the EPA test guidelines specify that the highest dose level should be at or just below the maximum tolerated dose (MTD), up to a limit of 1000 mg/ kg/day. These factors may explain why EPA validation studies for the pubertal assays were unable to demonstrate a negative response using a true negative control agent. Thus, it is unclear how often these assays will identify non–endocrine-active chemicals as positive based on systemic toxicity (Borgert et al., 2011b; Marty et al., 2011). Moreover, some of the endpoints measured in the male pubertal assay may not respond consistently for the same test chemical (Marty et al., 1999, 2011; Stoker and Zorrilla, 2010). The biological significance of small changes can be difficult to interpret for several endpoints evaluated in the pubertal assays, and it is unclear to what extent some changes depend on body weights (Ashby and Lefevre, 2000; Marty et al., 2003; Stoker and Zorrilla, 2010), making selection of a maximal tolerated change difficult to establish. Furthermore, several of the endpoints evaluated in the pubertal assays are observational (e.g., histopathological evaluations, vaginal opening (VO), preputial separation (PPS)) and therefore, not as amenable to quantitative assessment as are organ weights, receptor binding and activation, hormone levels, etc. However, some organ weights, such as uterus weights in the female pubertal assay, are confounded by the onset of estrous cyclicity and are too variable to be interpreted reliably. This limits the ability of the assay to provide information on the strength of response.

The androgen receptor binding assay (ARBA; EPA 890.1150) and estrogen receptor binding assay (ERBA; EPA 890.1250) provide specific, quantitative information regarding the affinity of a test substance for a specific receptor. Although this information is quite useful for evaluating affinity, it provides no information about whether a substance might act as an agonist, partial agonist, or antagonist because it provides no information about efficacy (Borgert et al., 2013). This limits the ability of these assays to inform specific hypotheses or to provide information about the strength of the response in a particular direction. Nonetheless, the strength of the molecular interaction between test substance and the specific receptor—that is, its affinity—can assist the interpretation of results from other assays. False-positive responses in these assays can be observed for test articles that have physical–chemical properties that denature the binding site of the receptors.

Systemic toxicity can confound endpoints in several Tier 1 screening assays and needs to be differentiated in the overall interpretation of a test chemical's potential for specific endocrine activity (Marty et al., 2011). Signs of decreased body weight, decreased survival, clinical observations, and abnormal behavior indicate overt toxicity. Altered histopathology in liver, kidney, or other tissues, and alterations in certain biochemical biomarkers are also informative regarding systemic toxicity of the test material.

As discussed above, the EPA test guidelines for the pubertal assays specify a 10% body weight decrement as indicating that an MTD has been reached. However, decreased body weight in a 6 to 10% range may also affect endpoints typically considered to be potential signs of endocrine activity. Whereas an MTD based on 10% body weight reduction was recommended for the female pubertal assay, an MTD based on 6% body weight reduction may be necessary in the male pubertal assay to preclude potential effects on the levels of active thyroid hormone 3,5,3′-triiodothyronine (T3) and prohormone thyroxine (T4) (Laws et al., 2007). Furthermore, Marty et al. (2003) showed a 1.8 day delay in PPS from feeding restriction that produced only 10% body weight decrement, and Chapin et al. (1993) reported decreased accessory sex organ weights resulting from a 10% difference in body weights between feed restricted and control animals. Altered nutrition may also affect reproductive maturation, reproductive function, and lactation.

EPA indicates that the MTD criteria in the pubertal assays may also be reached when hepatic or renal toxicity occurs, with or without altered clinical chemistry parameters. Hepatic toxicity and associated liver enzyme induction is an important confounder that can lead to alteration in clearance of thyroid hormones and testosterone, resulting in compensatory homeostatic changes. In the male rat pubertal assay, for example, adaptive responses can be mistaken as antiandrogenic effects. The large bolus doses administered for 30 days from postnatal day (PND) 23 to 53 can induce hepatic enzymes and produce hepatocellular hypertrophy and increased liver weights, which in turn increases testosterone metabolism (both hydration and conjugation). Except for conversion to dihydrotestosterone (DHT), testosterone metabolism markedly decreases or inactivates its androgenic activity, thereby producing an apparent antiandrogenic response that has no direct endocrine mechanism. Similar to the well-characterized MoA for enhanced metabolism and clearance of T4 due to altered hepatic metabolism, which perturbs the HPT axis in rats, humans are expected to be markedly less sensitive to enhanced metabolism of testosterone than rats. Whereas chemical exposures can induce hepatic enzymes in both humans and rodents, rodents rely heavily on albumin to transport steroids in blood, whereas sex hormone binding globulin (SHBG) transports steroids in human blood. The dissociation constant (Kd) for steroids from albumin is 10−6–10−4 M, whereas the Kd from SHBG is 10−10–10−8 M (Baker, 2002). The low affinity binding of albumin allows steroids to more easily dissociate and undergo metabolism. Thus, the relevance of this MoA is questionable due to substantive species differences in sensitivity to testosterone metabolism and the repeated high-dose exposures required for sufficient enzyme induction.

Clearance mechanisms may be overwhelmed by hepatic or renal toxicity or simply excessively high-dose levels. Altered clearance can affect other toxicokinetic parameters, which can increase systemic blood levels markedly above what would otherwise be expected from a particular dose. Whenever possible, existing toxicokinetic information should be considered when interpreting potential endocrine activity in Tier 1 assays.

Similarly, stress and environmental conditions may change reproductive function irrespective of endocrine activity. For example, Holson et al. (1995) found altered reproductive performance in male rats previously restrained under bright lights, and Stromborg (1986) reported that avian egg production is exquisitely sensitive to food ingestion and adequate nutrition. Of particular concern are potent MoAs, such as cholinesterase inhibition, which can produce specific toxicity that indirectly affects endocrine function at doses well below that which could occur by an endocrine mechanism (Juberg et al., 2013). Care must be taken not to interpret nonspecific responses in the fish short-term reproduction assay (FSTRA; EPA 890.1300), as specific to perturbations in the estrogen androgen, or steroidogenesis pathways, particularly when they are coincident either with signs of overt toxicity or with diagnostic indications of other MoAs apart from endocrine activity (details are explained under specific hypotheses below).

USE OF WRELs IN A WoE INTERPRETATION

A WoE evaluation performed according to the Framework considers the relevance of the Tier 1 ESB data for deciding eight individual hypotheses that the battery was intended to address. Data for a test chemical are evaluated hypothesis by hypothesis. The first step in considering any particular hypothesis is to evaluate the test chemical responses in the Rank 1 endpoints for that hypothesis. These provide the most direct information about how strongly the hypothesis might be supported for the test substance, or conversely, how limited the support might be. Because the Tier 1 ESB is to be interpreted as a whole, with weaknesses in some assays counterbalanced by the strengths of others, it is important that the evaluation also consider information from Ranks 2 and 3 endpoints and the patterns of response expected for prototype positive and negative controls. However, the response to Rank 1 endpoints should guide the evaluation and interpretation of information from lower ranked endpoints.

Positive responses in Rank 1 are a preliminary indication that the hypothesis is supported. Rank 2 endpoints are then evaluated to determine consistency. Consistent positive responses among Ranks 1 and 2 endpoints can be considered sufficient support for the hypothesis, that is, that the chemical possesses the potential to interact with the specific components of the endocrine system addressed by the hypothesis. Rank 3 endpoints would then be consulted for consistency, and together with the strength of response (WRES) in Ranks 1 and 2 endpoints, temper or strengthen support for the hypothesis. Inconsistent (i.e., negative) responses among Rank 2 endpoints would reduce support for the hypothesis, and depending on the strength of the response in Rank 1 endpoints, could render a conclusion ambiguous in some cases.

A lack of response in Rank 1 endpoints for a particular hypothesis is a preliminary indication that the hypothesis is not supportable. Rank 2 endpoints are then evaluated to determine consistency and completeness. This is particularly important for instances where potential activity might not be explicitly captured by the Rank 1 endpoints. If Rank 2 endpoints are consistent with negative (no response) results on Rank 1 endpoints, a conclusion of the null hypothesis can be rendered with confidence, that is, that the chemical lacks the potential to interact with the specific components of the endocrine system addressed by the hypothesis. Rank 3 endpoints are superfluous for interpretation when findings in both Ranks 1 and 2 endpoints are negative. In this case, positive Rank 3 responses must be considered anomalous or potentially responsive to some other MoA.

The interpretation is more complex if Rank 2 endpoints are inconsistent with negative results in Rank 1 endpoints. In this case, the strength of the response in Rank 2 endpoints becomes even more critical, as does an evaluation of Rank 3 endpoints as well as potential reasons that Rank 1 endpoints may not have responded. It is impossible to discuss how the myriad of inconsistencies between Rank 1 and Rank 2 endpoints might be interpreted, but some overarching themes can be stated. First, Rank 1 endpoints cannot be dismissed for inconsistency with Rank 2. Rank 3 endpoints, in contrast, provide only potentially corroborating information for findings in Ranks 1 and 2 and should not be used alone to evaluate any hypothesis. Situations in which Ranks 2 and 3 are consistent, but inconsistent with Rank 1 endpoints present the greatest challenge, and no general statements can be made. These latter situations must be considered on a case-by-case basis taking into account the strength of the responses, known confounding influences, and normal biological variation within the various endpoints involved, among other factors.

The Framework does not specifically address hypotheses regarding interactions with the HPT and HPG axis, as no Tier 1 assay is sufficiently specific to those axes. Clearly, the pubertal onset assays as well as the FSTRA and the amphibian metamorphosis assay (AMA; EPA 890.1100; Lutz et al., 2008)) would be expected to respond to agents with such potential, but the specific pattern of results expected is not clear and is subject to variable interpretation. For this reason, the Framework describes a set of hypotheses encompassing apparent thyroid agonism or antagonism (the manifestations of interactions with the HPT axis) as enhanced or reduced thyroid activity. To be sure, there are legitimate reasons to define these hypotheses by different approaches, but irrespective of the approach taken, transparency of the evidence evaluated and decision logic used to make the determination is crucial.

RATIONALE FOR WRELs RANKINGS BY HYPOTHESES

The eight hypotheses evaluated by the Tier 1 ESB assays pertain to the potential for a chemical to interact as an (1) agonist with components of estrogen pathways, (2) antagonist with components of estrogen pathways, (3) agonist with components of androgen pathways, (4) antagonist with components of androgen pathways, (5) agonist with components of thyroid pathways, (6) antagonist with components of thyroid pathways, (7) inducer of steroidogenesis enzymes, or (8) inhibitor of steroidogenesis enzymes.

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN AGONIST WITH COMPONENTS OF ESTROGEN PATHWAYS

Estrogenicity is a property that is defined based on a biological response. Estrogens are a class of steroid hormones linked principally with the control of female sex organ responsiveness and of reproduction (Korach et al., 1995). The American Heritage Medical Dictionary defines “estrogenic” as “causing estrus in animals” or “having an action similar to that of an estrogen.” This definition guides our evaluation of endpoints for the estrogen agonist hypothesis. Although “estrogens” generally refer to several substances that share a common feminizing activity (Armstrong and Federman, 2005), estrogens influence the growth and functioning of both female and male reproductive tissues (Ososki and Kennelly, 2003). Endpoints from both genders are included in our proposed Rank 1 endpoints for estrogen agonism. The definition of “estrogenic” has been used variably since introduction of in vitro assays capable of probing particular aspects or components of estrogen action, but these are context-dependent (Reel et al., 1996). Here, assays that involve transfer of the test agent via the bloodstream are given more weight than those in which the compound reaches molecular or tissue targets via an artificial fluid or medium. Since endocrine hormones are produced by one tissue and conveyed, usually via the bloodstream, to other organs or tissues where the cellular and physiological effects are produced, and since the purpose of the Tier 1 ESB is to determine potential endocrine activity, in vivo assays are given higher priority for this hypothesis. Table 1 lists proposed Ranks 1, 2, and 3 endpoints for the estrogen agonist hypothesis.

Rank 1 Endpoints

In the FSTRA, increased vitellogenesis in males is a specific and sensitive indicator of estrogen agonist potential. During vitellogenesis, the oocyte incorporates vitellogenin (VTG) proteins, lipids, vitamins, and other nutrients into egg yolk proteins (Lubzens et al., 2010.) This culminates in an oocyte competent to undergo fertilization that contains maternal mRNAs, proteins, lipids, carbohydrate, vitamins, and hormones necessary for development of the embryo. The VTG are phosphoglycoproteins primarily synthesized in the liver under the receptor-mediated regulation of 17β-estradiol and are found in the blood of females of all oviparous vertebrate species. Although several hormones can apparently induce VTG production in fish, many appear to work through or in cooperation with 17β-estradiol (Lubzens et al., 2010). Vitellogenesis is thus one of the primary reproductive processes under direct estrogenic control. VTG production can also be induced in male fish by administration of exogenous 17β-estradiol, but in control males, VTG is either not present or is present below the detection limits of most assays (Parks et al., 1999). For this reason, measuring VTG production in males may be a more sensitive indicator of estrogenic activity than an increased production in females. A clear, unambiguous induction of VTG in male fish is a well-established response to estrogen receptor (ER) agonists (Schmid et al., 2002; U.S. EPA, 2007a).

VTG in male fish does not appear to interfere with reproductive success or survival except in rare instances where excessive production leads to kidney failure (Hutchinson et al., 2006; Mills and Chichester, 2008; Caldwell et al., 2012). In vivo assays rather than in vitro or structure–activity relationship models are currently considered the most defensible screening tools for wildlife, in part because of long-standing uncertainties related to xenobiotic metabolism, bioavailability, and toxicokinetics that may not be replicated outside of the intact organism (Ankley et al., 1998). The Panel concluded that increased VTG levels in male fish is a Rank 1 endpoint for evaluating the estrogen agonist potential of a substance.

A statistically and biologically significant increase in wet and blotted uterine weights in the uterotrophic assay (EPA 890.1600; OECD 2007b) in female rats is a sensitive and specific indicator of estrogen agonist potential. Justification for the Rank 1 status of the uterotrophic assay comes both from its biological context and from historical use of the assay. Biologically, estrogens coordinate systemic responses during the ovulatory cycle, including the regulation of the reproductive tract, pituitary, breasts, and other tissues (Fritsch and Murdoch, 1998). Because the principal function of estrogen is to cause cellular proliferation and growth of the tissues of the sex organs and other tissues related to reproduction (Guyton and Hall, 1996a), an in vivo proliferative response is given a higher ranking than other endpoints. Historically, estrogenicity was defined based on the ability to induce uterine growth in immature or ovariectomized rodents (Rozman et al., 2006). Indeed, the crucial evidence establishing the potential estrogenicity of DDT (dichlorodiphenyltrichloroethane) analogues and other environmental estrogens lies in the observation that their administration to female animals evokes responses in the uterus or oviduct similar to those observed after administration of classical pharmaceutical estrogens, such as 17β-estradiol (Bulger and Kupfer, 1985). The uterotrophic effects of estrogen agonists are completely blocked by a pure ER antagonist (Wade et al., 1993). Results of OECD validation studies for the uterotrophic assay have been published, establishing its ability to discriminate true positives from negatives with respect to potential for estrogenic activity in vivo (Odum et al., 1997; Kanno et al., 2001, 2003; Owens and Ashby 2002; Owens et al., 2003). The main advantage of the uterotrophic assay is the determination of an overall biological effect that can account for interactions between cells and between different components of the endocrine system (Wang et al., 2012) as well as absorption, distribution, metabolism, and excretion (ADME).

For these reasons, the uterine weight (wet and blotted) endpoint in the rat uterotrophic assay is assigned a Rank 1 relevance weighting. Despite the strength of information provided by the uterotrophic assay for deciding the estrogen agonist hypothesis, it should be noted that uterotrophic responses are possible from other mechanisms, such as the anabolic uterotrophic action by androgens, which is blocked by antiandrogens but not by antiestrogens (Schmidt and Katzenellenbogen, 1979; Beri et al., 1998; Wang et al., 2012).

Rank 2 Endpoints

The agonist arm of the ERTA assay (EPA 890.1300) provides a measure of activation by a test substance that is highly specific for the alpha (α) form of the human ER (hERα). Responses in the assay are mathematically comparable to the level of activation produced by a prototypical full ER agonist, such as 17β-estradiol, and so can provide information about strength of response. The assay can also detect the ability of a test substance to inhibit the activity of an agonist, and so can distinguish between agonists, partial agonists, and antagonists. Although the ERTA assay is highly specific and sensitive for hERα, it does not provide a measure of response through the beta form, hERβ. It is subject to the general limitations of in vitro assays (Rozman et al., 2006), and is therefore less informative for the estrogen agonist hypothesis than Rank 1 endpoints.

Several endpoints in male fish (FSTRA) are also considered interpretable for estrogen agonist potential, including decreased tubercle scores in male fish, specific alterations in male gonad histopathology (e.g., degenerate spermatozoa and proliferation of Sertoli cells) and altered male behavior, specifically the expectation that aggressive male behavior and nest guarding activities be significantly reduced when exposed to estrogen agonists (Shappell et al., 2010). These characteristics are relevant, reliable, and can be indicative of estrogenic activity (OECD, 2007a). However, these endpoints can also be confounded by other hormonal activities or systemic toxicity and thus, are less specific than VTG production (Clearwater and Pankhurst, 1997; Pankhurst and Van Der Kraak, 2000). Furthermore, these endpoints are largely subjective, requiring expert judgments that render them more susceptible to methodological variability and less useful for measuring strength of response. For these reasons, the Panel assigned these endpoints Rank 2 status for evaluating estrogen agonist potential.

Several endpoints measured in the pubertal female assay are proposed as Rank 2 for evaluation of estrogen agonist activity because of their direct dependence on 17β-estradiol. These include biologically relevant age advancement (≥2 days younger) and decreased body weight at VO, biologically significantly decreased ovarian weights and altered ovarian histopathology (Marty et al., 1999; Kim et al., 2002; Stoker and Zorrilla, 2010). However, these endpoints may not respond consistently for the same test chemical (Marty et al., 1999; Stoker and Zorrilla, 2010) may also respond to chemicals with other hormonal activity (Kim et al., 2002), and the lack of clear negative results for a set of test compounds known to lack hormonal activity decreases confidence in the interpretations that can be placed in positive results. Some endpoints may respond generally to stress. Thus, these endpoints may not be as clearly interpretable or as useful for assessing the presence of estrogenic activity or measuring the strength of estrogen-like responses and are therefore proposed as Rank 2 rather than Rank 1 endpoints for evaluation of estrogen agonist potential.

In the pubertal male assay, testes weight and testes atrophy (demonstrated by testes histopathology) are proposed as Rank 2 for evaluation of estrogen agonist activity. Although estrogens—that is, 17β-estradiol—interfere with testes development by causing testicular atrophy, this endpoint is not specific for estrogens; other MoAs, such as oxidative stress, can also cause testicular atrophy. Histological evaluation may help in interpreting a response, but, as with the female pubertal assay, consistency of response to estrogens in the male pubertal assay is not expected, even for the same compound (Marty et al., 1999; Ashby and Lefevre, 2000; Stoker and Zorrilla, 2010).

OECD's validation of the uterotrophic assay considered several optional endpoints, but none were found to be more sensitive or specific than an increase in uterine weight for evaluation of estrogen agonist potential. Although conversion to estrus is an endpoint specifically under estrogenic control, its assessment is not as quantitative or objective as measurement of uterine weight, and therefore, may be more susceptible to methodological confounding than the Rank 1 endpoint.

Rank 3 Endpoints

The ERBA is an in vitro assay that determines the ability of the test chemical to bind with rat uterine cytosolic ER and is very sensitive and specific for this purpose (Kuiper et al., 1997). However, it is considered a Rank 3 endpoint, rather than Rank 2, because it does not allow the interpretation of agonist versus antagonist properties (Black et al., 1981; Kuiper et al., 1997). For estrogen agonist potential, both binding affinity and intrinsic activity (efficacy) at the receptor are required. Due to the high degree of conservation across mammals, compounds that bind to rat ER are assumed to be capable of binding to human ER (EPA 890.1250). ER binding alone does not consider the complex biological systems of a whole organism in determining a positive response. Metabolism of the test article in vivo and ER interaction with other cellular factors are not evaluated in the assay. Therefore, the assay cannot determine if binding will result in biological activity. A range of compounds can affect the results of the assay by altering buffer pH, denaturing the ER, or disrupting the ER binding kinetics, none of which are relevant to potential endocrine activity. Thus, several conditions may confound interpretation of ERBA data and require secondary diagnostic analyses to confirm competitive inhibition (Laws et al., 2006). Although there can be technical difficulties with interpretation of the assay and corroboration from other assays is required to determine the activity of a substance in this assay, it is considered a strong corroborative assay for substances that exhibit estrogen agonist (or antagonist) potential in Ranks 1 and 2 endpoints.

Several endpoints in the FSTRA may respond to estrogenic substances, albeit with lower specificity and sensitivity compared to Ranks 1 and 2 endpoints. The FSTRA endpoints proposed as Rank 3 include gonad-somatic index (GSI), fecundity, estradiol and testosterone levels, behavior in females, some gonadal histopathology findings (i.e., follicular atresia), and fertilization success. The mechanisms underlying these responses are not well characterized and may not be specific to estrogens. Reduction in sex steroid concentrations, GSI, reproduction, and altered gonadal histopathology (particularly when seen as isolated or nearly isolated “positive” findings) can be the result of general systemic toxicity unrelated to estrogenic activity or to perturbation of the HPG axis, androgenic, and/or steroidogenesis pathways (Foo and Lam, 1993; Clearwater and Pankhurst, 1997; Haddy and Pankhurst, 1999; Lethimonier et al., 2000; Pankhurst and Van Der Kraak, 2000; Wu et al., 2003; Aluru and Vijayan, 2009; Milla et al., 2009). Fecundity is particularly sensitive to systemic toxicity, decreased feeding or stress, and when seen in isolation, is a poor predictor of potential endocrine activity. Therefore, the Panel considered the interpretation of responses in these endpoints to depend on the response of other endpoints, and thus, to be of corroborative use only.

In the pubertal female assay, estrogen agonists promote growth and estrous cyclicity. These endpoints, however, were considered to be insufficiently specific to be interpretable without the context of other assays, may require high doses of potent estrogens to elicit responses, and may be susceptible to nonendocrine MoAs (Marty et al., 1999; Stoker and Zorrilla, 2010). They were therefore categorized as Rank 3 for the estrogen agonist hypothesis.

Similarly, growth, ventral prostate weight, and histopathology of the epididymides in the pubertal male assay respond to estrogen agonists, but the responses lack specificity (Stoker and Zorrilla, 2010) and may be susceptible to nonendocrine MoAs. These were therefore also given Rank 3 weightings for the estrogen agonist hypothesis.

It is important to appreciate that most of the endpoints measured in the male and female pubertal onset assays are considered Rank 3 rather than Rank 2. The pubertal assays are multimodal assays that rely on apical endpoints such as tissue and organ weights that will respond to more than one endocrine hormone. Although this aspect is a strength in that the assays may detect endocrine signals that are not detected by other assays, this also makes it difficult to determine the MoA because the endpoints are not specifically modulated by a single hormone. Furthermore, responses in these endpoints may reflect secondary, nonspecific effects related to systemic toxicity, stress, or other factors unrelated to hormonal signals. It is for these reasons that several endpoints in the pubertal assays are given Rank 3. Clearly, interpretation of these endpoints depends on more specific responses from other assays.

Compounds that increase estradiol levels in the steroidogenesis assay may produce responses in other assays that suggest a potential for estrogen agonism due to an increased level of the endogenous hormone rather than a direct agonist effect of the substance. Because circulating estradiol levels are controlled by a variety of factors, results in this assay can only provide corroboration and perhaps clarification for Ranks 1 and 2 endpoint responses.

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN ANTAGONIST WITH COMPONENTS OF ESTROGEN PATHWAYS

The biological properties of estrogens are summarized under the estrogen agonist hypothesis. Antiestrogens were classically defined as substances that inhibit the growth-promoting effect of estradiol on target tissues (Black et al., 1981). Antagonist activity can occur by direct interaction with ERs or by indirect mechanisms such as reduction of circulating estradiol levels or by enhancement of the activity of a hormone or other regulatory molecule whose biological actions oppose those of endogenous estrogens (MacGregor and Jordan, 1998). As with agonists, receptor-mediated antagonist potential is directly related to receptor affinity. In contrast with estrogen agonists, estrogen antagonist potential is inversely related to the intrinsic efficacy of the substance to activate the ER and produce cellular responses (Black et al., 1981). Such substances may be “pure” receptor antagonists, which oppose the activity of 17β-estradiol at the ER but produce no response alone, or may be “partial” or “inverse” agonists, both of which can reduce the “estrogenic tone,” that is, the background hormonal or constitutive ER activity. The extent to which estrogenic tone is reduced depends on the properties of the substance and the conditions of the particular assay used to measure the activity (reviewed by Negus, 2006). Regardless of these nuances, chemicals with sufficient ER affinity but with intrinsic efficacy less than that of 17β-estradiol have the potential to reduce estrogenic activity—that is, to act as estrogen antagonists—under some conditions. Table 2 lists proposed Ranks 1, 2, and 3 endpoints for the estrogen antagonist hypothesis.

Table 2. Estrogen Antagonist Hypothesis
Rank 1Rank 2Rank 3
endpointsendpointsendpoints
 
  • ERBA
  • ER competitive binding affinity
  • Pubertal female assay
  • Age and weight at vaginal opening: increased
  • Age at first estrus: increased
  • FSTRA
  • Vitellogenin: reduced in females
  • Gonad histopathology: in females
  • Aromatase assay
  • Aromatase inhibition
  • Steroidogenesis assay
  • Estradiol levels
  • Pubertal female assay
  • Estrous cyclicity
  • Ovary histopathology (atrophy)
  • Ovary weight: reduced, with atrophy
  • FSTRA
  • Fecundity
  • Estradiol
  • Testosterone
  • Gonad-somatic index
  • Behavior
  • Fertilization success

Rank 1 Endpoints

Currently, the Panel did not consider any of the current Tier 1 endpoints sufficiently specific and reliable to assign them a Rank 1 WREL weighting for potential antiestrogenic activity. If fully validated, the antiestrogenic mode of the uterotrophic assay would qualify as a Rank 1 endpoint for potential antiestrogenic effects. The antiestrogenic procedure replaces physiological estrogen in ovariectomized female rats by administration of a uterotrophic dose of 17β-estradiol (estrogen control). Treatment groups then receive the test chemical (treated). Attenuation of the uterotrophic effect of the exogenous 17β-estradiol by the test chemical (treated vs. estrogen control) indicates potential antiestrogenic activity.

Rank 2 Endpoints

The ERBA is an in vitro assay that determines the ability of the test chemical to bind with rat uterine cytosolic ER and is very sensitive and specific for this purpose (Kuiper et al., 1997). The ERBA specifically determines the specific binding affinity of a chemical for the ER under the conditions of the assay, but does not allow the interpretation of agonist versus antagonist properties (Black et al., 1981; Kuiper et al., 1997). Because intrinsic activity is not required for antiestrogenic potential, the determination of specific receptor binding affinity is more informative for estrogen antagonism versus agonism, and therefore, the ERBA is given a higher ranking for the estrogen antagonist versus estrogen agonist hypothesis. Due to the high degree of conservation across mammals, compounds that bind to rat ER are assumed to be capable of binding to human ER (EPA 890.1250). ER binding is categorized as a Rank 2 endpoint rather than Rank 1 because it does not consider the complex biological systems of a whole organism in determining a positive response. Metabolism of the compound in vivo and ER interaction with other cellular factors are not considered in the result. Therefore, the assay cannot determine if binding will result in biological activity. Additionally, without corroboration from other assays, it is not possible to determine if the compound is capable of an agonistic versus antagonistic activity.

In the pubertal female assay, substances with potential antiestrogenic activity increase the age and weight at VO and the age at which the first estrus occurs (≥2 days older). Limitations of these endpoints are discussed under Rank 2 endpoints for the estrogen agonist hypothesis (Section 1.2). Although these endpoints respond to a generalized decrease in estrogenic tone, this can occur via many different mechanisms, including antagonism at the ER and inhibition of aromatase or gonadotropin, leading to decreased estradiol levels (Marty et al., 1999; Ashby et al., 2002; Stoker and Zorrilla, 2010). Responses may require relatively potent compounds or high doses, the latter of which increases the chance of generating confounding results due to systemic toxicity.

Chemicals with potential antiestrogenic activity can affect the FSTRA by decreasing female VTG (U.S. EPA 2007a). Antiestrogenic reduction of VTG in female fish is not as specific, reliable, or robust as the estrogen agonist induced VTG increase in male fish. VTG reduction in female fish can also occur from stress and conceivably from hepatotoxicity and crosstalk with the aryl hydrocarbon receptor (Anderson et al., 1996a, 1996b; Lethimonier et al., 2000; Celander, 2011; Bugel et al., 2013). Thus, reduced VTG in female fish is placed in Rank 2 for the antiestrogenic hypothesis. Gonad histopathology was also given Rank 2 status because as the primary reproductive organ, the gonad produces physiologic responses to endocrine-active substances (Ankley et al., 2001; U.S. EPA, 2006). While there is little data on the expected profile of an antiestrogen in the FSTRA (U.S. EPA, 2007a), it is assumed that decreases in estrogenic activity that result in decreased VTG levels could also result in observable alterations to ovarian histopatholgy in female fish. Thus, in light of the scarcity of sensitive endpoints for indicating antiestrogenic activity in the FSTRA, decreases in female VTG and alterations in ovarian histopathology that are consistent with decreases in estrogenic activity are designated as Rank 2 endpoints.

Rank 3 Endpoints

Inhibition of aromatase (aromatase assay; EPA 890.1200) prevents conversion of 19C-androgens to 18C-estrogens and hence, conversion of testosterone to estradiol, resulting in lower circulating 17β-estradiol levels (Marty et al., 2001b). If significant and persistent, lower circulating levels of 17β-estradiol might reduce estrogenic tone. On the other hand, since conversion of androgens to estrogens via aromatase is not the primary source of estrogens, especially in humans, and because estrogen sulfates may be converted to free estrogens in compensation for reduced estrogen levels, a modest decrease in aromatase activity would have little if any effect on steady-state estrogen levels. Thus, inhibition of aromatase may suggest that a chemical has the potential to exhibit antiestrogenic effects, but the response in this assay could only corroborate antiestrogenic activity in Ranks 1 and 2 assays.

Similarly, compounds that reduce estradiol levels in the steroidogenesis assay may produce responses in other assays that suggest a potential for estrogen antagonism due to reduced levels of the endogenous hormone rather than a direct receptor antagonist effect of the substance. Because circulating estradiol levels are controlled by a variety of factors, however, results in this assay can only provide corroboration and perhaps clarification for Ranks 1 and 2 endpoint responses.

Antiestrogens may alter estrous cyclicity in the pubertal female assay, and may induce ovarian atrophy as evidenced by a concordance of atrophic changes in histopathology and reduced organ weight. Although these endpoints are not dispositive as a unique response, it may be used to corroborate more specific responses from other assays, and is therefore assigned Rank 3 priority.

Several endpoints in the FSTRA (fecundity, estradiol, and testosterone levels, GSI, behavior, and fertilization success) may respond to potential antiestrogens. However, as noted under the estrogen agonist hypothesis, these endpoints are difficult to interpret because of their responsiveness to different hormonal activities as well as their significant potential to be confounded by general systemic toxicity that also affects endocrine sensitive tissues. For these reasons, Rank 3 priority is appropriate for these as corroborative endpoints only for antiestrogenic potential.

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN AGONIST WITH COMPONENTS OF ANDROGEN PATHWAYS

Androgens are defined as any steroid hormone that controls the development and maintenance of male sexual organs and male secondary sex characteristics (American Heritage Medical Dictionary, 2007). Like estrogens, androgens are present in both males and females. In females, the production of androgens promotes the development of secondary sex characteristics that occur during puberty (Hiipakka and Liao, 1995; Guyton and Hall, 1996b). Typically, androgens are released into the blood stream where they bind tightly to a serum glycoprotein of hepatic origin called SHBG in humans or, weakly, to albumin in humans and rats (Winters, 1998; Kicman, 2010; Hammond, 2011). The bound hormone is delivered to and enters target cells where it binds to the androgen receptor (AR) and stimulates the production of androgen-dependent proteins (Winters, 1998; Kicman, 2010). The majority of steroid-induced androgenic effects result from an increased rate of protein formation in these target cells. For example, testosterone in prostate tissue induces the DNA–RNA transcription process. After several days, a simultaneous increase in the number of prostatic cells and quantity of DNA can be identified in the gland (Guyton and Hall, 1996b). The effects of androgens on target tissues are not always inductive. Androgens inhibit the pituitary alpha-subunit gene expression and gonadotropin-releasing hormone released by the hypothalamus (Winters, 1998). Table 3 lists proposed Ranks 1, 2, and 3 endpoints for the androgen agonist hypothesis.

Table 3. Androgen Agonist Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
  • Hershberger assay
  • Concordance of 5 endpoints:
  • Cowper's gland weight
  • Seminal vesicle weight
  • LABC weights
  • Glans penis weight
  • Ventral prostate weight
  • FSTRA
  • Secondary sexual characteristics: tubercles in females
  • ARBA
  • AR competitive binding affinity
  • Pubertal male assay
  • Age and weight at preputial separation: if accelerated
  • Seminal vesicle + coagulating gland weight (wet/blotted)
  • Ventral prostate weight
  • Dorsolateral prostate weight
  • Levator ani-bulbocavernosus muscle complex weight
  • Epididymis weight
  • Testes weight
  • Testes histopathology
  • Epididymides histopathology
  • FSTRA
  • Vitellogenin: reduced in females
  • Gonad histopathology
  • Hershberger assay
  • Concordance of two or more endpoints (see Rank 1)
  • Aromatase assay
  • Aromatase inhibition
  • Steroidogenesis assay
  • Testosterone levels
  • Pubertal male assay
  • Growth
  • Testosterone levels
  • Pubertal female assay
  • Growth
  • Age and weight at vaginal opening
  • Uterus weight
  • Ovary weight
  • Adrenals weight
  • Uterus histopathology
  • Ovary histopathology
  • FSTRA
  • Fecundity
  • Testosterone and estradiol levels
  • Gonad-somatic index
  • Behavior
  • Fertilization success
  • Hershberger assay
  • Only one of five endpoints respond (see Rank 1)

Rank 1 Endpoints

The androgenic mode of the Hershberger assay (EPA 890.1400) measures responses to the test article in castrated male rats. Several endpoints are considered Rank 1 because this in vivo assay measures specific tissue responses that are hallmarks of androgenic action and incorporates the ADME of the chemical. In the castrated male rat model, a concordant increase in the absolute weights of five androgen-dependent tissues—Cowper's gland, seminal vesicles, levator ani-bulbocavernosus muscle (LABC), glans penis, and ventral prostate—is considered a clear indication of potential androgenic activity (OECD, 2009a). Although androgens are known to increase the weight of these tissues, tissue growth can be altered by MoAs other than androgen agonism. Therefore, a significant growth response from multiple androgen-dependent tissues is necessary to suggest an androgenic response (OECD, 2009a). A statistically significant increase in organ weight for two or more of the assay tissues is required for a positive result according to the EPA test guideline (U.S. EPA, 2011b). Confidence that the chemical has androgen agonist potential increases as the number of significantly affected organs increases, and decreases as fewer endpoints respond. The Panel therefore assigned a concordant response in all five endpoints Rank 1 for the androgen agonist hypothesis.

In the FSTRA, the appearance of male secondary sex characteristics in female fish can provide a clear indication of androgenic potential (Ankley et al., 2003; U.S. EPA, 2007a; Ankley and Gray, 2013). As in mammals, the secondary sex characteristics in fish are controlled by sex steroids and, therefore, can be altered by exposure to endocrine-active compounds. In female fish of some species, androgen agonists induce the masculinization of secondary sex characteristics (U.S. EPA, 2007a). Exposure to an androgen agonist results in abnormal sexual differentiation and function in female fathead minnows, including the development of nuptial breeding tubercles, which normally are expressed only in male fathead minnows (Ankley and Villeneuve, 2006). In fathead minnows, the development of nuptial tubercles in females is considered diagnostic of androgen exposure (OECD, 2009b). Therefore, the development of tubercles in female fathead minnows was assigned a Rank 1 endpoint since it is reproducible and biologically relevant (U.S. EPA, 2006).

Rank 2 Endpoints

The ARBA measures competitive binding affinity of the test article for rat prostatic cytosol ARs in vitro. In vivo, this molecular interaction initiates the cellular response to endogenous androgens. Due to the high degree of conservation across mammals, compounds that bind to rat AR are assumed to be capable of binding to AR in humans (U.S. EPA, 2011c). Although receptor binding potential can indicate a potential interaction with androgenic pathways, binding alone cannot distinguish between potential agonist versus antagonist properties. Without corroboration from other assays, it is not possible to determine if the compound has potential for agonistic versus antagonistic responses. Furthermore, the assay cannot determine if binding will result in a biologically significant endocrine effect because ADME of the compound in vivo, and AR interactions with other cellular factors are not necessary for the in vitro result. The Panel would have assigned the ARBA to Rank 3 for the androgen agonist hypothesis, similar to the ranking for the ERBA for estrogen agonism, if a validated AR transactivation assay were a component of the Tier 1 battery. However, lacking another assay that allows this determination at the molecular level, the Panel assigned the ARBA Rank 2 status until it can be replaced by a fully validated in vitro assay more specific for agonist potential.

As stated above, significant growth responses in multiple androgen-dependent tissues (Cowper's gland, seminal vesicles, LABC, glans penis, and ventral prostate) in the androgenic mode of the Hershberger assay suggests androgenic potential (OECD, 2009a). The interpretation is most clear when several tissues respond concordantly. The EPA requires a statistically significant response from two or more of the tissues to interpret a positive response (U.S. EPA, 2011b), and therefore, the Panel assigned concordance of only two of the five endpoints Rank 2 for the androgen agonist hypothesis.

The pubertal male assay is used to identify potential endocrine activity underlying pubertal development and thyroid function in male rats. It consists of collections of in vivo endpoints that are interpreted according to expected patterns of response that together indicate a specific hormonal potential, rather than as individual endpoints. The response profile from the endpoints in the pubertal male assay reflects a potential for specific endocrine modalities (e.g., androgen agonism or antagonism, alteration of steroidogenesis). However, a chemical may not alter all endpoints indicative of a particular hormonal mechanism or may alter some endpoints but not others, depending on the dose. Due to variation of response profiles for some chemicals, U.S. EPA (2009) does not require consistency among endpoints (even redundant ones) to make interpretations. However, inconsistency in the response profile for a given chemical decreases confidence in the interpretation. Nonetheless, certain responses in the pubertal male assay are sufficiently interpretable (described below) to warrant placement in Rank 2.

In the pubertal male assay, age and weight at PPS are considered markers for the onset of puberty. Because 5α-DHT initiates PPS, the endpoint is androgen-dependent and specific for identifying endocrine activity (Stoker et al., 2000; U.S. EPA, 2011b). A decrease in age at PPS has been observed with the administration of exogenous methyl testosterone to male rats, indicating that acceleration of PPS is a sensitive and specific marker of androgen agonism (Stoker et al., 2000). Androgens are necessary for the development and maintenance of the organs in the male reproductive tract and hence, the weights of several glands and muscles within this tract are considered markers of androgenic action. These include seminal vesicle and coagulating gland (wet and blotted), ventral prostate, dorsolateral prostate, LABC muscle complex, epididymis, and testes. Exposure to methyl testosterone in the pubertal male rat usually causes a significant increase in reproductive organ weights (Stoker et al., 2000; Marty et al., 2001b). However, in the validation study conducted by U.S. EPA (2007b), administration of 80 mg/kg methyl testosterone significantly increased the weight of the ventral prostate and seminal vesicle and decreased the weight of the testes and epididymides. The decrease in weight of these organs is hypothesized to be a result of downregulation of gonadotropins and, subsequently, luteinizing hormone (LH) in the hypothalamus (Stoker et al., 2000). According to this hypothesis, increased male reproductive organ weight in the pubertal male rat is specific to androgen agonists but a decrease in testes and epididymides weight may also indicate androgen agonist potential. On the other hand, aromatization of methyl testosterone to estradiol has not been ruled out, as aromatase has been found to be significant in testes and epididymides in Rhesus monkey (Pereyra-Martinez et al., 2001; Shayu and Rao, 2006). The dichotomy in response confounds interpretation of the results, suggesting a Rank 2 classification.

In a prevalidation study for the male pubertal assay (Rocca and Pepperl, 2000), rats exposed to methyl testosterone showed significant changes in histopathology of the testes and epididymides. The testes displayed hypospermatogenesis and interstitial cellular atrophy. The epididymides displayed hypospermia secondary to a decreased production in the testes and degenerative germ cells due to release of degenerated cells from the testes. These finding are consistent with exposure to methyl testosterone and indicate androgen agonist potential. Thyroid antagonists also produce histological changes in the testes and epididymides, which confounds interpretation of the results (Rocca and Pepperl, 2000).

In the FSTRA, the Panel assigned reduced VTG in females Rank 2 status. Exposure to androgenic chemicals decreases the production of endogenous androgens, which are converted to estradiol (U.S. EPA, 2006). In female fathead minnows, VTG is produced in response to estrogenic compounds, including estradiol (Ankley et al., 2001). Therefore, the lower levels of estradiol result in decreased levels of VTG, as has been observed in response to 17β-trenbolone exposure (Ankley and Gray, 2013). While decreased VTG can indicate a specific endocrine activity, there are confounding factors that complicate interpretation. Production of VTG in female fish can also be decreased by hepatotoxicity, stress, and other nonendocrine modes of toxicity (Anderson et al., 1996a, 1996b; Lethimonier et al., 2000; U.S. EPA, 2006; OECD, 2009b; Celander, 2011; Bugel et al., 2013). Gonad histopathology was also given Rank 2 status because as the primary reproductive organ, the gonad produces physiologic responses to endocrine-active substances (Ankley et al., 2001; U.S. EPA, 2006). In response to androgenic chemicals, observations of hyperproduction of sperm in male fish and a decrease in yolk deposition in female fish have been observed (U.S. EPA, 2007a). These alterations in gonadal histopathology were classified as Rank 2 endpoints because while these changes may result from endocrine exposure, they are not necessarily specific to these substances. This confounds interpretation of the results. Additionally, despite standardization of histopathological primary diagnoses, they are still subject to individual interpretation and may reflect observation bias (U.S. EPA, 2006).

Rank 3 Endpoints

As stated above, significant growth responses in multiple androgen-dependent tissues (Cowper's gland, seminal vesicles, LABC, glans penis, and ventral prostate) in the androgenic mode of the Hershberger assay suggests androgenic potential (OECD, 2009a). The interpretation is most clear when several tissues respond concordantly, but is much less clear if only one tissue shows a growth response. Because the EPA requires a statistically significant response from two or more of the tissues to interpret a positive response, a single responding tissue is not considered sufficient evidence of androgenic activity (U.S. EPA, 2011b). A single responding tissue was classified as a Rank 3 endpoint because it produces only qualitative evidence of androgenic potential. It should be evaluated with additional endpoints to determine the relevance of the response.

Inhibition of aromatase in the aromatase assay is reflected by decreased estradiol and increased testosterone levels. The aromatase assay utilizes human placental recombinant microsomal aromatase to measure aromatase enzyme activity in vitro. Aromatase is a cytochrome P450 (CYP) complex that metabolizes androgens to estrogens. Therefore, alteration of aromatase activity levels affects the amount of estrogen present in the tissues and, subsequently, alters endogenous hormone levels and activity. Androgen agonists downregulate aromatase gene expression and subsequently inhibit their activity (Lanzino et al., 2013). While a significant decrease in aromatase activity is characteristic of androgen agonism, it is not definitive for this hypothesis. Other mechanisms of action, including cytotoxicity, will produce an apparent decrease in aromatase activity (Battelle, 2005). Inhibition of aromatase is classified as a Rank 3 assay because it neither determines the mechanism of action nor predicts effects at the protein level postprocessing. As a result, additional lines of evidence are needed to corroborate androgen agonism.

In the steroidogenesis assay, testosterone levels were assigned Rank 3 status for the androgen agonist hypothesis. This in vitro assay detects the synthesis of steroid hormones by measurement of their concentrations in a human adrenocortical carcinoma cell line, H295R. As with all in vitro assays, the steroidogenesis assay has limited ability for metabolism and cannot predict activity at the organism level. While the steroidogenesis assay can corroborate the results of other assays, it cannot establish whether the chemical alters activity at the AR. Androgen agonists, such as trenbolone acetate, significantly decrease testosterone concentrations in H295R cells (Gracia et al., 2007). The mechanism for this is unclear, but may be due to downregulation of 17β-hydroxysteroid dehydrogenase, an enzyme responsible for the conversion of androstenedione to testosterone (Hilscherova et al., 2004).

In the pubertal male assay, growth and circulating levels of testosterone were assigned Rank 3 status because although androgen agonists are known to significantly decrease body weight and serum testosterone and DHT levels, there are several confounding factors for these endpoints. Androgen agonists cause feedback inhibition of gonadotropin release and a subsequent decrease in testosterone production by the Leydig cells (O'Connor et al., 2000). This is especially true for growth, which is affected by a variety of factors including overt toxicity, food intake, and onset of puberty (Laws et al., 2007; Marty, 2013). Without corroboration from other assays, the effects cannot confidently be attributed to endocrine agonism.

Numerous endpoints in the pubertal female assay were assigned Rank 3 status, including growth, age, and weight at VO, uterine, ovarian, and adrenal gland weights, and uterine and ovarian histopathology. Similar to the pubertal male assay, the pubertal female assay is used to identify potential endocrine activity underlying pubertal development and thyroid function. It is a collection of in vivo endpoints that are interpreted according to expected response profiles that together indicate a potential for endocrine activity (e.g., estrogen agonist, antiestrogen). The pubertal female assay is not often utilized for the assessment of androgens and very little information is available on the subject. Additionally, the U.S. EPA did not utilize an androgen agonist during the pubertal female validation studies, making interpretation of the assay more difficult for this group of chemicals (U.S. EPA, 2007c). Chemical-specific variations in response also confound the analysis. The endpoints are classified as Rank 3 because they cannot indicate androgen agonism without corroboration from other assays.

Body weight is significantly increased in female pubertal rats exposed to androgen agonists (Kim et al., 2002; Clark et al., 2003). Body weight is classified as a Rank 3 endpoint because it can be affected by food intake and overt toxicity, confounding interpretation. At low concentrations (near physiological levels), exposures to aromatizable androgen agonists decrease the age and weight at VO. Nonaromatizable androgen agonists have no effect on this endpoint. Because serum estradiol concentrations are not increased after exposure to aromatizable androgens, it is hypothesized that early VO is the result of local aromatization to estrogen and direct action on the vaginal epithelium (Goldman et al., 2000). Higher concentrations of aromatizable androgens have the opposite effect. Administration of 1 mg/kg/day testosterone delayed VO and increased body weight at VO (Kim et al., 2002). This endpoint was classified as Rank 3 because the response is not dose-dependent and can be affected by changes in growth. Additionally, other mechanisms (e.g., estrogen agonists) produce similar results, that is, early VO, and may confound interpretation of the response (Goldman et al., 2000).

Both uterine and ovarian weights were reduced by administration of testosterone at 1 mg/kg/day (Kim et al., 2002). However, the effects on the uterus appear to be dose dependent since uterine weight was significantly increased after administration of levels above 20 mg/kg/day testosterone due to uterine stromal cell proliferation (O'Connor et al., 2000; Wason et al., 2003), an effect that may also be mediated via ARs (Schmidt and Katzenellenbogen, 1979; Beri et al., 1998; Wang et al., 2012). Because the pubertal female assay uses intact rats, cyclic changes in uterine weight also affect the variability of this endpoint, confounding interpretation. Histopathology results, particularly regarding stromal proliferation, may be necessary to draw conclusions. Adrenal weight to brain weight ratio is significantly increased by exposure to the androgen agonist 17α-methyltestosterone at high-dose levels (Wason et al., 2003), however, no change in adrenal weight is noted at lower exposure concentrations (Kim et al., 2002). Interpretation of this endpoint is confounded by the dependence on high doses, which could also conceivably produce a stress response. Exposure to androgen agonists results in atrophy of the ovary and inactivation of the interstitial glands. The number of follicles and corpora lutea may also be reduced. In contrast, the uterus displays hyperplasia and hypertrophy of the epithelium (Wason et al., 2003).

In the FSTRA, fecundity, testosterone and estradiol levels, GSI, behavior, and fertilization success were classified as a Rank 3 because they are relevant, but not specific to androgen agonism. These endpoints are not considered indicative of endocrine action unless Ranks 1 and 2 endpoints respond. The number of eggs spawned is mediated by the endocrine system. Changes in hormonal equilibrium, gonadal development, and vitellogenesis will also affect fecundity (U.S. EPA, 2006). If higher ranked endpoints are not positive for endocrine MoAs, the effects on fecundity cannot be definitively identified as evidence of androgen agonism (OECD, 2004a). Androgen agonists may increase or decrease or have no effect on testosterone and estradiol concentrations in fish (U.S. EPA, 2007a). Circulating levels of sex steroids can vary seasonally or even daily, confounding interpretation of the results. Due to the variety of causes, changes in the concentration of sex steroids can support Ranks 1 and 2 endpoints but have little value on their own (U.S. EPA, 2006). The GSI response to an androgen agonist, however, was less predictable with either no effect on GSI or increased or decreased GSI observed among male and female fish in response to androgen agonists (U.S. EPA, 2007a). A primary assumption for this endpoint is that the relationship between gonad and body weight is constant. This does not hold true during spawning as female GSI can vary as much as 45% over a 2-day period, confounding interpretation of the results (OECD, 2004a). Territorial aggressiveness and changes in reproductive behavior have been observed in female fathead minnows exposed to androgens (Ankley et al., 2001; OECD, 2009b). However, changes in behavior have a variety of causes including exposure to non–endocrine-active chemicals and systemic toxicity. Additionally, behavior is not routinely quantitated or systematically observed and incorporates subjectivity in interpretation of the endpoint (OECD, 2004a). Based on the nonspecific nature of this endpoint, it was classified as Rank 3 and should not be utilized as the only indicator of potential endocrine pathway interaction. Decreased VTG and disturbances in the normal levels of sex hormones decrease fertilization success. Because changes in sex hormones and decreases in VTG may be caused by nonendocrine MoAs, fertilization success is not specific for endocrine-active compounds. Additionally, fertilization measures the combined effects from both genders, making it difficult to assess gender-specific sensitivities (OECD, 2004a).

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN ANTAGONIST WITH COMPONENTS OF ANDROGEN PATHWAYS

The U.S. National Library of Medicine (2013) defines androgen antagonists as compounds that inhibit or antagonize the biosynthesis or actions of androgens. Direct acting androgen antagonists bind to the AR in target tissues and block androgen-induced receptor activation and transcription of androgen-dependent genes (Winters, 1998). Indirect acting androgen antagonists reduce intracellular steroid levels by mechanisms other than binding to the receptor, including decreasing the rate of steroid synthesis, reducing the plasma-free fraction, and increasing the rate of androgen elimination (OECD, 2004a). Table 4 lists proposed Ranks 1, 2, and 3 endpoints for the androgen antagonist hypothesis.

Table 4. Androgen Antagonist Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
  • Hershberger assay
  • Concordance of 5 endpoints:
  • Cowper's gland weight
  • Seminal vesicle weight
  • LABC weights
  • Glans penis weight
  • Ventral prostate weight
  • ARBA
  • AR competitive binding affinity
  • Pubertal male assay
  • Age and weight at preputial separation: if delayed (age and weight increased)
  • Seminal vesicle + coagulating gland weight (wet/blotted)
  • Ventral prostate weight
  • Dorsolateral prostate weight
  • Levator ani-bulbocavernosus muscle complex weight
  • Epididymis weight
  • Testes weight
  • Testes histopathology
  • Epididymides histopathology
  • FSTRA
  • Vitellogenin: increase in females
  • Secondary sexual characteristics: reduced in males
  • Gonad histopathology
  • Hershberger assay
  • Concordance of two or more endpoints (see Rank 1)
  • Steroidogenesis assay
  • Testosterone levels
  • Pubertal male assay
  • Testosterone levels
  • FSTRA
  • Fecundity
  • Estradiol
  • Testosterone
  • Gonad-somatic index
  • Behavior
  • Fertilization success
  • Hershberger assay
  • Only one of five endpoints respond (see Rank 1)

Rank 1 Endpoints

The antiandrogenic mode of the Hershberger assay measures responses to the test article following replenishment of androgen to castrated male rats by administration of testosterone propionate. Several organ weight endpoints measured in the antiandrogenic mode of the Hershberger assay—Cowper's gland, seminal vesicles, LABC, glans penis, and ventral prostate—are given Rank 1 status for the androgen antagonist hypothesis when a concordant response is produced. The reasons are the same as those explained for the androgen agonist hypothesis. Specifically, the Hershberger is an in vivo assay that measures responses in tissues that define the subject hormonal activity, and it is consistent and reproducible. Because the test is performed in vivo, it incorporates the ADME of the chemical into the response, furthering confidence that this assay is representative of organism-level effects. Antiandrogenic chemicals either bind to the AR and inhibit gene transcription or inhibit 5α-reductase. These two MoAs differ in their effects on androgen-dependent tissues. For some chemicals, the different tissue effects identified in the Hershberger assay indicate the MoA (U.S. EPA, 2011b). In the castrated-peripubertal male rat, these tissues all respond to antiandrogens with a decrease in absolute weight (OECD, 2009a). Chemicals that increase steroid metabolism may also cause a decrease in the absolute weight of these organs. The EPA requires a statistically significant decrease in two or more endpoints to suggest antiandrogenic activity (U.S. EPA, 2011b). Confidence that the chemical has antiandrogenic potential increases as the number of significantly affected organs increases, and decreases as fewer endpoints respond. The Panel therefore assigned a concordant response in all Hershberger endpoints Rank 1 for the androgen antagonist hypothesis.

Rank 2 Endpoints

Binding affinity, as measured by competitive binding to the AR in the ARBA, was assigned Rank 2 status for the androgen antagonist hypothesis. AR binding in vitro suggests the potential for activity via this receptor. However, binding cannot predict the activation of transcription or distinguish whether the chemical has agonist, antagonist, or other effects via the receptor. Additional endocrine assays are required to determine whether the compound is an androgen antagonist (U.S. EPA, 2011c). The general limitations of in vitro assays, explained above, apply to the ARBA as well.

As stated above, significant growth responses in multiple androgen-dependent tissues (Cowper's gland, seminal vesicles, LABC, glans penis, and ventral prostate) in the antiandrogenic mode of the Hershberger assay suggests antiandrogenic potential (OECD, 2009a). The interpretation is most clear when several tissues respond concordantly. The EPA requires a statistically significant response from two or more of the tissues to interpret a positive response (U.S. EPA, 2011b), and therefore, the Panel assigned concordance of only two of the five endpoints Rank 2 for the androgen antagonist hypothesis.

In the pubertal male assay, Rank 2 status was given to age and weight, if increased, at PPS. Significantly increased age or weight (because they are older) at PPS in pubertal male rats can be a sensitive indicator of androgen antagonism because PPS is androgen-dependent. However, it is not exclusive to androgen antagonism. Chemicals that disrupt hypothalamic–pituitary function or steroid hormone synthesis or metabolism may also delay PPS (Stoker et al., 2000). Therefore, it is important to corroborate designation of an androgen antagonist through responses in other assays.

Reproductive organ weight, also assigned Rank 2 for androgen antagonism in the pubertal male assay, include weights of seminal vesicle and coagulating gland (wet and blotted), ventral prostate, dorsolateral prostate, LABC muscle complex, epididymis, and testes, as these are all dependent on androgens for growth (Shin et al., 2002). Although reduced reproductive organ weights are sensitive to androgen antagonism, these are not specific responses and not all organs may respond as expected. Androgen antagonists that chronically elevate serum LH concentrations (such as flutamide) will result in an increase in interstitial fluid volume of the Leydig cells and increase testes weight (Stoker et al., 2000; U.S. EPA, 2007b). Differences in the reproductive organ weight profiles for antiandrogens indicate that some androgen antagonists will not decrease weight in all of the reproductive organs utilized as endpoints in this assay. This confounds interpretation of the test and necessitates the use of other assays to determine whether the chemical is an androgen antagonist.

Histopathology of the testes and epididymides in the pubertal male assay were also assigned Rank 2 status for the androgen antagonist hypothesis. After treatment with the AR antagonist, flutamide, male pubertal rats displayed hyperplasia/hypertrophy of the interstitial cells in the testes and atrophy of the epididymides (Rocca and Pepperl, 2000). These effects are considered consistent with the androgen antagonist activity of chemicals that elevate serum LH concentrations. No decrease was detected in spermatogenesis. Testicular and epididymal histopathology is classified as a Rank 2 endpoint because they may be confounded by disease or overt systemic toxicity. The histopathology results are also dependent on secondary mechanisms of the androgen antagonists. For example, hyperplasia of interstitial cells in the testes may not be seen if the androgen antagonist does not increase serum LH concentrations.

A strength of the FSTRA is that it is an in vivo test, which incorporates the ADME of the chemical in the results and provides some direct measurements of reproductive organ physiology and function. Several endpoints in the FSTRA were assigned Rank 2 for the androgen antagonist hypothesis, including increased VTG in females, decreased secondary sex characteristics in males, and gonad histopathology in females. Plasma VTG concentrations may be, but are not always significantly increased in female fathead minnows exposed to antiandrogens (U.S. EPA, 2007a; Martinović et al., 2008). This increase in female VTG is likely based on a variety of mechanisms including direct interaction and antagonism with the AR. However, it may also be an indirect effect of reduced oocyte maturation, increased estradiol, or other mechanisms (Martinović et al., 2008). Therefore, while this effect is responsive to androgen antagonist exposure, overt toxicity or other nonendocrine effects confound interpretation of the endpoint. After a 30-day exposure to the androgen antagonist vinclozolin, male guppies exhibited a partial loss of orange-yellow coloration (a secondary sex characteristic of the species). Similarly, a concentration-dependent demasculinization of males (e.g., reduced expression of tubercles) upon treatment with vinclozolin has been observed with fathead minnows (Martinović et al., 2008). Although reduced secondary sexual characteristic is suggestive of androgen antagonism, it is not diagnostic and is categorized as a Rank 2 endpoint. Alterations in gonad histopathology can be a sensitive indicator of endocrine dysfunction (U.S. EPA, 2007a). Exposure to vinclozolin altered gonad histopathology, including oocyte maturation and ovarian atresia in the early stages of vitellogenesis (U.S. EPA, 2007a; Martinović et al., 2008). In studies with the antiandrogen, flutamide, proliferation of Leydig cells, an increased number of spermatogonia, and an overall increase in testicular degeneration were observed in the testes (Jensen et al., 2004; U.S. EPA, 2007a). This endpoint was classified as Rank 2 because changes in gonad histopathology are not exclusively endocrine-mediated and may be the result of other methods of action or overt toxicity.

Rank 3 Endpoints

In the Hershberger assay, unique responses in weights of Cowper's gland, seminal vesicles, LABC, glans penis, and ventral prostate were assigned Rank 2 status. As stated above, a significant decrease in growth for two or more androgen-dependent tissues suggests a potential antiandrogenic MoA (U.S. EPA, 2011b). However, the situation is less clear if only one of the tissues shows a significant response. Because the EPA requires a statistically significant response from two or more of the tissues, one negative response is not considered sufficient evidence of antiandrogenic activity (U.S. EPA, 2011b). A single tissue with a significant decrease in growth is classified as a Rank 3 endpoint because it produces only qualitative evidence of antiandrogenic potential. A unique response should be evaluated with additional endpoints to determine the relevance of the response.

In the steroidogenesis assay, testosterone levels are assigned Rank 3 status for the androgen antagonist hypothesis. As stated above, the steroidogenesis assay is an in vitro assay utilized to detect endocrine effects mediated through the steroid hormone receptor. It is classified as a Rank 3 endpoint because it is not specific to androgen effects. Contrary to in vivo results in rats (see below), the androgen antagonist vinclozolin significantly decreased in vitro testosterone levels in the steroidogenesis assay (Hecker et al., 2006). This suggests that the steroidogenesis pathway does not mediate the antiandrogenic effects of vinclozolin. Because this assay is not specific for androgen antagonists, it should be utilized only as corroboration of results from Ranks 1 and 2 endpoints and should not be the determining factor for confirming or refuting the hypothesis.

Testosterone levels in the pubertal male assay are also assigned Rank 3 status for the androgen antagonist hypothesis. Androgen antagonists increase serum testosterone levels. The increase in serum testosterone is hypothesized to be a result of increased serum LH concentrations. Androgen antagonists that bind to the AR competitively inhibit the binding of endogenous androgens. This reduces the androgenic signal to the hypothalamus and increases LH secretion, which stimulates testosterone production in the Leydig cells (Shin et al., 2002). An increase in testosterone levels has several possible causes and is not interpretable in isolation. However, this endpoint is helpful when assessed with results from higher ranked endpoints.

Fecundity, estradiol and testosterone levels, GSI, behavior, and fertilization success in the FSTRA were given Rank 3 status for the androgen antagonism hypothesis. Reduced fecundity is observed after exposure to androgen antagonists. While this endpoint is consistent and reproducible, it is not specific. Because fecundity involves both sexes, it may decrease whether the chemical is an estrogen agonist, estrogen antagonist, androgen agonist, or androgen antagonist (U.S. EPA, 2007a). Additionally, fecundity can be influenced by a variety of nonendocrine factors. Therefore, it can support a determination, but does not indicate androgen antagonism by itself. Androgen antagonists usually increase circulating levels of estradiol and testosterone, however the response of sex steroid levels are not entirely predictable in the antiandrogen pathway (OECD, 2004a; U.S. EPA, 2007a). As with other endpoints, changes in sex steroid levels may be caused by non–endocrine-mediated MoAs. They may be useful, however, in providing additional evidence that the observed effects are endocrine-mediated and are often used to support an endocrine-mediated MoA for decreases in fecundity (U.S. EPA, 2007a). GSI is an indirect measure of reproductive readiness. The preliminary assumption for this endpoint is that the relationship between gonad and body weight is constant. However, this assumption does not hold true during spawning, indicating this endpoint cannot be interpreted without corroboration from Ranks 1 and 2 endpoints (OECD, 2004a). After a 30-day exposure to the androgen antagonist vinclozolin, male guppies exhibited a decreased GSI (OECD, 2004a). However, alterations in GSI were variable following different antiandrogenic exposures with the fathead minnow (U.S. EPA, 2007a). In zebrafish, antiandrogenic exposure reduces postdawn spawning behavior (OECD, 2009b). As stated previously, behavior endpoints are subjective and may be caused by a variety of nonendocrine mechanisms. Therefore, this endpoint should not be utilized as a sole means of determining potential interaction with an endocrine pathway. Finally, exposure to androgen antagonists can significantly decrease the number of embryos that hatch, even in the absence of overt toxicity (Jensen et al., 2004). An endocrine-related reduction in fertilization could be due to elevated levels of atresia in the female gonad and decreased spermatogenesis in males. Because fertilization success is not specific to androgen antagonist exposures, this endpoint was classified as Rank 3.

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN AGONIST WITH COMPONENTS OF THYROID PATHWAYS

Thyroid hormone is essential for normal development, growth, neural differentiation, and metabolic regulation in mammals and for metamorphosis in amphibians (Yen, 2001; Brent, 2012). The requirement for thyroid hormone is most apparent when the hormone is deficient during development, such as in maternal iodine deficiency or untreated congenital hypothyroidism. While developmental deficiencies produce profound neurologic deficits and growth retardation (Yen, 2001), more subtle and reversible defects may develop when ligand deficiency occurs in the adult (Brent, 2012). The effects of thyroid hormone are complex, involving the interactions of T4, T3, the two main isoforms of thyroid receptors, TRα and TRβ, each of which has two subtypes, and thyroid-stimulating hormone (TSH) (Brent, 2012). Hepatic metabolism and clearance of thyroid hormones is also critical for control of this pathway. Interaction of thyroid hormone with specific receptors in various tissues regulates transcription of a wide variety of genes, initiates nongenomic cellular effects, and modulates the cellular responses to a variety of other hormones (Yen, 2001; Brent, 2012). Because of its role in growth and metabolism and its involvement in the activities of other hormones, particularly the sex steroid hormones, the identification of specific hallmark mammalian responses, as has been done for androgens and estrogens, is very difficult for thyroid hormone. The metamorphic response to thyroid hormone is somewhat more specific in amphibians (Galton, 1992; Tata, 2006). Table 5 lists proposed Ranks 1, 2, and 3 endpoints for the thyroid agonist hypothesis.

Table 5. Thyroid Agonist Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
  • AMA
  • Asynchronous development
  • Thyroid histopathology
  • Pubertal male assay
  • Thyroid weight
  • TSH levels
  • T4 levels
  • Pubertal female assay
  • TSH levels
  • T4 levels
  • AMA
  • Advanced developmental stage
  • Hind limb length: increased
  • Pubertal male assay
  • Growth
  • Age and weight at preputial separation
  • Pituitary weight
  • Pubertal female assay
  • Growth
  • Age and weight at vaginal opening
  • Blood chemistry
  • AMA
  • Snout-vent length: reduced
  • Wet weight: reduced

Rank 1 Endpoints

Rank 1 endpoints for the thyroid agonist hypothesis are asynchronous development and thyroid histopathology assessed in the AMA. Thyroid hormone is an obligatory signal for the initiation and completion of amphibian metamorphosis and its effects on amphibian tissue are direct, local, and independent of the location of the target cells (Tata, 2006). Normal metamorphic development requires precise synchronization of the increase in thyroid hormone production and release (Galton, 1992; Tata, 2006), hence, asynchronous metamorphic development is a reliable indication of potential thyroid agonist activity and is given Rank 1 priority. Furthermore, it has been the Panel's experience that there is a low to no occurrence of asynchronous development in control tadpoles in the AMA, giving weight to the understanding that this observation is likely specific to alterations in the HPT axis. Xenobiotic chemicals capable of thyroid hormone activity appear to primarily affect processes related to thyroid hormone synthesis and iodine uptake and incorporation into thyroid hormone, and less prominently to interaction via thyroid hormone receptors (Degitz et al., 2005). Therefore, an in vivo assay is required to capture predominant modes of potential thyroid agonist (or antagonist) activity.

Unlike most other endocrine glands, the thyroid gland stores the hormone it produces extracellularly, in follicles. Thyroid hormone production during development and metamorphosis results in changes in thyroid histology, forming the basis for prioritizing this endpoint as Rank 1. The more frequently examined histological features used to measure thyroid activity in metamorphosing taxa are gland volume, follicle volume, number of colloid-filled follicles, and epithelial cell height (Jennings and Hanken, 1998). Thyroid gland hypertrophy, thyroid gland atrophy, follicular cell hypertrophy, and follicular cell hyperplasia are specified as core thyroid histopathology parameters examined in the AMA (Grim et al., 2009). It is important to note that the normal histomorphology of the thyroid gland changes as development of the tadpole progresses. Thus, these normal changes must be considered when thyroid histology of tadpoles in different developmental stages is compared (Grim et al., 2009). Exposure to the endogenous thyroid agonist, T4, resulted in thyroid gland atrophy among African-clawed frog tadpoles in the AMA (Coady et al., 2010), indicating this response is specific to agonists of the thyroid pathway.

Rank 2 Endpoints

In the pubertal male assay, thyroid weights and circulating T4 and TSH levels are proposed as Rank 2 endpoints. Although interpretable without corroboration from other endpoints in the battery, the rationale for including these endpoints in Rank 2, rather than in Rank 1, is due to normal variability in thyroid weights and T4 and TSH levels and the potential for these endpoints to be affected by systemic toxicity and stress. Thyroid hormone measurements have large coefficients of variation (27 and 58% for T4 and TSH, respectively; U.S. EPA, 2009), and are subject to rapid, stress-induced changes as may occur during necropsy. Thyroid weights have been found to be highly variable and often outside the maximum and minimum acceptable control limits set forth in the test guideline (EPF, 2013). Hepatic toxicity and associated liver enzyme induction may also produce indirect effects on thyroid hormones due to alterations in clearance and compensatory homeostatic changes.

In the pubertal female assay, T4 and TSH levels are proposed as Rank 2 endpoints. As with thyroid hormone measurements in the pubertal male assay, TSH and T4 levels can be variable and influenced by nonendocrine factors; thus, the interpretation is not as clear as required for Rank 1 status.

Advanced development, as measured by advanced developmental stage or increased normalized hind limb length relative to controls in the AMA is also assigned Rank 2. Although an expected effect of thyroid agonists, experience among the Panel indicates that interpretation of advanced development requires information on thyroid histopathology for both thyroid agonists and antagonists. Furthermore, advanced development is defined by either increased normalized hind limb length or advanced developmental stage relative to controls, but it is not necessary for both to be present for determination of a thyroid agonist.

Rank 3 Endpoints

Rank 3 endpoints in the pubertal male assay for the thyroid agonist hypothesis include growth, age, and weight at PPS and pituitary weight. The interpretability of the growth and age at PPS has been addressed generally and for other hypotheses previously. Pituitary weights have been found to be highly variable (EPF, 2013) and not interpretable without other markers of thyroidal effects in the pubertal female assay, also given Rank 3, which include growth, age and weight at VO and blood chemistry. The interpretability of these endpoints has also been discussed generally and for other hypotheses previously.

Alterations in growth in the AMA are considered to be Rank 3 endpoints for the thyroid agonist hypothesis. Reductions in tadpole growth (as measured by decreased wet weight and snout-vent length) have been coupled with advanced development rate as the growth phase of tadpole development is truncated. However, reduced tadpole growth, by itself, should not be relied upon to determine thyroid-specific agonism, since reduction in tadpole growth is a response that is subject to various modes of toxic action and ecological influences that are not specific to interactions with the HPT axis (U.S. EPA, 2007b).

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN ANTAGONIST WITH COMPONENTS OF THYROID PATHWAYS

Antagonists of thyroid hormone can profoundly retard normal metabolism and development, primarily via changes in thyroid hormone production or clearance (Degitz et al., 2005). Altered production and clearance of thyroid hormones is theoretically most evident in direct pathological changes or compensatory responses of the thyroid gland. As such, mammalian assays are relatively more useful in the detection of thyroid antagonists than agonists. Table 6 lists proposed Ranks 1, 2, and 3 endpoints for the thyroid antagonist hypothesis.

Table 6. Thyroid Antagonist Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
  • Pubertal male assay
  • Thyroid weight
  • Thyroid histopathology
  • Pubertal female assay
  • Thyroid weight
  • Thyroid (colloid area and follicular cell height)
  • AMA
  • Asynchronous development
  • Thyroid histopathology
  • Pubertal male assay
  • Liver weight
  • T4 levels
  • TSH levels
  • Pubertal female assay
  • Age and weight at vaginal opening
  • T4 levels
  • TSH levels
  • Pubertal male assay
  • Growth
  • Age and weight at preputial separation
  • Pituitary weight
  • AMA
  • Delayed development
  • Snout-vent length
  • Hind limb length
  • Wet weight
  • Pubertal female assay
  • Estrous cyclicity (diestrus)
  • Ovary histopathology: reduced ovary weight or atrophy

Rank 1 Endpoints

As in the case of the thyroid agonist hypothesis, asynchronous development (especially for peripheral alterations in deiodinase activity) and thyroid histopathology in the AMA are Rank 1 endpoints for the thyroid antagonist hypothesis. Previous research has indicated that the histopathological examination of the thyroid gland is a sensitive and specific indicator of thyroid activity, particularly for compounds that inhibit thyroid hormone synthesis in the AMA (Degitz et al., 2005; Coady et al., 2010; Pickford, 2010). Thyroid weight increase and histopathology in the pubertal male assay (Marty et al., 2001a), and thyroid weight and colloid area and follicular cell height in the female pubertal assay are reliably affected by thyroid antagonists. In mammals, alterations in thyroid histopathology are quantified as changes in follicular cell height and volume, with follicular cell hypertrophy and hyperplasia the most common responses to decreased circulating T4 and increased TSH. Although this is considered a Rank 1 endpoint, it should be noted that the rat overpredicts thyroid toxicity in man, due to its chronic “stimulated” state resulting from the relative weak binding of the circulating thyroid hormones (Jahnke et al., 2004).

Rank 2 Endpoints

For the thyroid antagonist hypothesis, liver weight, T4 and TSH levels in the pubertal male assay (Marty et al., 2001a; Stoker and Zorrilla, 2010), and age and weight at VO, T4 and TSH levels in the pubertal female assay (Marty et al., 1999; Stoker and Zorrilla, 2010) are given Rank 2 priority. The limitations of these endpoints relative to Rank 1 endpoints have been discussed generally above and for other hypotheses (Marty et al., 2001a, 2001b). Testicular weight, histopathology, and sperm counts (if conducted) in the pubertal male assay also may respond to thyroid hormone depletion postnatally as Sertoli cell production may be altered and testis weight increased during development and sperm counts increase (Cooke et al., 1993, 1996). This finding is more likely to be captured during a Tier II reproductive toxicity study, although the pubertal male animals are just past weaning at study initiation and might still be susceptible.

Rank 3 Endpoints

The Panel assigned growth, age and weight at PPS, and pituitary weights in the pubertal male assay and estrous cyclicity and ovary histopathology with reduced ovary weight or atrophy in the pubertal female assay as Rank 3 endpoints. The limitations of these various endpoints have been discussed generally above (Marty et al., 2001a, 2001b). In the pubertal female assay, high doses of thyroid antagonists are expected to produce irregular cyclicity (diestrus) and may induce ovarian atrophy (Hagino 1976; Armada-Dias et al., 2001; York et al., 2001; Hatsuta et al., 2004; Hapon et al., 2010; Li et al., 2011).

Thyroid antagonism increases tadpole growth because the tadpoles do not progress along the expected timeline for metamorphic reorganization and tissue loss; however, alteration in growth (both increased and decreased) is an apical endpoint that can respond to many different influences apart from the thyroid pathway (OECD, 2004b; 2007c; 2009c). Thus, growth changes in the tadpole can only be used as Rank 3 to support more definitive endpoint responses for the thyroid antagonist hypothesis. Likewise, when morphological signs of delayed development (such as reduced hind limb length or delayed developmental stage) suggest a test chemical may have potential antagonism in the thyroid pathway, this conclusion needs to be corroborated by more definitive endpoints such as thyroid gland histopathology, because delays in tadpole development can be the result of delayed growth rather than the result of specific perturbations in the HPT axis (Wilbur and Collins, 1973; U.S. EPA, 2007b).

HYPOTHESIS: THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN INDUCER OF STEROIDOGENIC ENZYMES

Steroidogenesis entails a process whereby cholesterol is converted to biologically active steroid hormones. This can be understood as a single process that is repeated, with cell-specific variations, in each endocrine gland (Miller and Auchus, 2011). Steroidogenesis is rate limited at conversion of cholesterol to pregnenolone by CYP11A1, which is the initial step in the process. Although mediated by a single enzyme, this initial step is enzymatically complex and subject to multiple regulatory mechanisms that enable precise quantitative control over the rate of conversion (Miller and Auchus, 2011). Qualitative control over the type of steroid produced is governed by the cellular content of the two classes of enzymes involved in steroidogenesis, CYP-dependent mixed function oxidases, and hydroxysteroid dehydrogenases (Miller and Auchus, 2011). Both classes of enzymes are relevant molecular targets for endocrine activity due to their role in maintaining the androgen/estrogen hormonal balance. Interactions between chemicals and critical steroidogenic enzymes may alter this balance, and hence, alter endocrine functioning (Quignot et al., 2012). Aromatase (CYP19) is a critical enzyme in the steroidogenic pathway and is principally responsible for the conversion of 19C-androgens to 18C-estrogens (Marty et al., 2001b; Quignot et al., 2012) and specific inhibitors of the enzyme are well characterized (Hecker et al., 2011).

The steroidogenesis induction hypothesis, as well as the steroidogenesis inhibition hypothesis (Section 8), are formulated differently in the present report compared with the Framework publication (Borgert et al., 2011a). In the Framework, interaction with aromatase was considered in separate hypotheses from interactions with steroidogenic enzymes. With the experience gained from presenting this scheme at numerous scientific meetings after the Framework was published and from conducting the Tier 1 ESB on dozens of chemicals, the Panel determined that it is more relevant to formulate these hypotheses with respect the potential of a chemical to interrupt steroidogenesis, of which aromatase should be considered a component. This revised formulation is also more consistent with the formulation of other hypotheses according to agonist and antagonist modalities. According to the EPA's EDSP guidance document (U.S. EPA, 2011a), five Tier 1 assays are capable of detecting an alteration in steroidogenesis due to chemical exposure: steroidogenesis, aromatase, pubertal male, pubertal female, and FSTRA. The aromatase and pubertal female evaluate estrogenic, the pubertal male evaluates androgenic, and the steroidogenesis assay and FSTRA evaluate both estrogenic and androgenic MoAs, both of which may be affected after significant alterations in steroidogenesis. Table 7 lists proposed Ranks 1, 2, and 3 endpoints for the steroidogenesis inducer hypothesis.

Table 7. Steroidogenesis Induction Hypothesis
Rank 1 endpointsRank 2 endpointsRank 3 endpoints
 
  • Steroidogenesis assay
  • Estradiol levels
  • Testosterone levels
  • Male pubertal assay
  • Testosterone levels

Rank 1 Endpoints

No in vivo endpoints were considered to be sufficiently specific and interpretable to warrant Rank 1 status for the potential induction of steroidogenic enzymes. This limitation arises, in part, because of the lack of clear inducers of steroidogenesis that could be used to verify responses in specific in vivo endpoints.

Rank 2 Endpoints

In the steroidogenesis assay, testosterone levels are the central product and intermediate in the production of estrogens and hence, measurement of this endpoint is assigned Rank 2 status, consistent with the Panel's principle that only in vivo assays should be assigned Rank 1. However, this assay is the most specific of the Tier 1 battery for interaction with the steroidogenic pathway despite that it is subject to the limitations of all in vitro assays. Accordingly, altered testosterone levels are expected after exposures to compounds that alter the steroidogenesis enzyme system. As the ultimate product of the steroidogenesis pathway, estradiol levels are also assigned Rank 2. Additional Rank 2 endpoints are not identifiable due to the lack of chemicals with known steroidogenic induction activity.

Rank 3 Endpoints

The lack of chemicals with known steroidogenic induction activity limits the endpoints that can be assigned relevance for this hypothesis, but it might be expected that steroid hormone levels would be increased in vivo by strong inducers of steroidogenesis. Therefore, testosterone levels in the male pubertal assay were assigned Rank 3 for induction of steroidogenesis.

THE CHEMICAL EXHIBITS THE POTENTIAL TO INTERACT AS AN INHIBITOR OF STEROIDOGENESIS ENZYMES

The brief description of steroidogenesis provided under the steroidogenesis induction hypothesis (Section 7) applies also to the steroidogenesis inhibitor hypothesis. Table 8 lists proposed Ranks 1, 2, and 3 endpoints for the steroidogenesis inhibitor hypothesis.

Table 8. Steroidogenesis Inhibition Hypothesis
Rank 1Rank 2Rank 3
endpointsendpointsendpoints
  • Pubertal female assay
  • Uterus weight
  • Steroidogenesis assay
  • Estradiol levels
  • Testosterone levels
  • Pubertal female assay
  • Ovary weight
  • FSTRA
  • Vitellogenin: reduced in females
  • Gonad histopathology: males
  • Aromatase assay
  • Aromatase inhibition
  • Pubertal female assay
  • Age and weight at vaginal opening: increased
  • Age at first estrus: increased
  • FSTRA
  • Fecundity
  • Estradiol levels
  • Testosterone levels
  • Gonad-somatic index
  • Behavior
  • Fertilization success

Rank 1 Endpoints

In the pubertal female assay, uterine weights are significantly reduced by potent aromatase inhibitors, although there is some concern about how effectively this endpoint would respond to substances with weak aromatase inhibition capacity (Marty et al., 1999). Because this in vivo endpoint measures an apical response encompassing ADME and allows an estimate of strength of response, the Panel assigned significant reduction in uterine weight in the pubertal female assay Rank 1 priority for the steroidogenic inhibitor hypotheses.

Rank 2 Endpoints

In the steroidogenesis assay, testosterone levels are the central product and intermediate in the production of estrogens and hence, measurement of this endpoint is assigned Rank 2 status, consistent with the Panel's principle that only in vivo assays should be assigned Rank 1. However, this assay is the most specific of the Tier 1 battery for interaction with the steroidogenic pathway despite that it is subject to the limitations of all in vitro assays. Accordingly, altered testosterone levels are expected of compounds that interfere with the steroidogenesis enzyme system. As the ultimate product of the steroidogenesis pathway, estradiol levels are also assigned Rank 2 endpoint.

In the pubertal female assay, ovarian weight is sensitive to perturbation of the steroidogenic enzyme system and is expected to respond to chemicals with such potential (Marty et al., 2001b; Stoker and Zorrilla, 2010). As such, ovarian is assigned Rank 2 for the steroidogenesis inhibition hypothesis.

In the FSTRA, a decrease in female VTG and gonadal histopathology, particularly in male fish, is considered a Rank 2 endpoint for the steroidogenesis inhibition hypothesis. In several investigations, VTG was decreased in female fathead minnows in response to inhibitors of steroidogenesis, while there was no detectable change in male VTG levels in response to inhibitors of steroidogenesis (U.S. EPA, 2007a). Reductions in female VTG are not considered to be Rank 1 endpoints, since other stresses and other toxicity pathways can result in reduced VTG levels (Anderson et al., 1996a, 1996b; Lethimonier et al., 2000; Celander, 2011; Bugel et al., 2013). Nonetheless, reduced female VTG in female fish would be expected to be a fairly sensitive response for detecting inhibition of steroidogenesis. Among male fathead minnows, testicular degeneration, enlarged seminiferous tubule lumen, proliferation of Leydig cells, and increased spermatozoa were observed when fish were exposed to steroidogenesis inhibitors (Ankley et al., 2007; U.S. EPA, 2007a). These responses were observed in several independent laboratories and thus support the use of male gonadal histopathology as a Rank 2 endpoint in the steroidogenesis inhibition pathway.

Rank 3 Endpoints

Aromatase inhibition in the aromatase assay is considered a Rank 3 endpoint for the potential to interact with the steroidogenic enzyme system. It is not given a higher status primarily because it comprises only a single component of the system. However, it should be considered a strong corroborative endpoint and should help to clarify whether the point of interaction is aromatase (CYP19) for compounds that produce responses in Ranks 1 and 2 endpoints.

In the pubertal female assay, age and weight at VO and age at first estrus are increased by potent aromatase inhibitors, although the response may be less reliable than for uterine weight reductions (Marty et al., 1999).

In the FSTRA, fecundity, sex steroid levels, GSI, behavior, and fertilization success were classified as Rank 3 because they are relevant, but not specific to inhibitors of steroidogenesis. Reduced fecundity was observed in response to inhibitors of steroidogenesis in the FSTRA (U.S. EPA, 2007a). Also, male GSI was increased and there was either no change or a decrease in female GSI in response to steroidogenesis inhibitors in the FSTRA (Ankley et al., 2007; U.S. EPA, 2007a). Therefore, increased male GSI may be informative to support the hypothesis of steroidogenesis inhibitors in conjunction with more definitive Ranks 1 and 2 endpoints. Likewise, sex steroid concentrations in the FSTRA can be informative as Rank 3 endpoints. When exposed to inhibitors of steroidogenesis, the general profile of sex steroid responses was a decrease in female estradiol and an increase in female testosterone, while male fish generally exhibited an increase in testosterone and 11-ketotestosterone (U.S. EPA, 2007a). These endpoints are considered Rank 3 due to their lower overall sensitivity, relatively greater variability in response, and their ability to be influenced by other modes of action apart from just steroidogenesis inhibition.

DISCUSSION

This work provides justifications for the assignment of relevance weight (WREL) rankings to experimental endpoints from the U.S. EPA's ESB according to their relevance for deciding eight specific hypotheses and an interpretive process by which the rankings are used to reach a WoE conclusion. Because data do not exist for weighting the endpoints based on positive and negative predictive values, the rankings proposed here were based on the collective scientific judgment of an expert panel of scientists from the EPF (the Panel) as informed by their experience with the assays and the relevant scientific literature cited herein. Although these WREL rankings are not quantitatively derived, their a priori explication nonetheless enhances transparency and renders WoE determinations amenable to methodological scrutiny.

There are several advantages to weightings and interpretive criteria derived and explained a priori, as offered here, rather than as used to explain WoE determinations post hoc. A priori rankings and interpretive criteria provide a transparent set of assumptions that can be used to conduct a WoE evaluation for any particular chemical, and thus, different analysts should derive similar WoE conclusion based on the same data. This helps to assure that WoE determinations are clear and consistent from chemical to chemical, an aspect that is particularly important for substances subjected to the EDSP ESB due to the high visibility of the program.

Clearly defined weightings and interpretive criteria also allow WoE determinations to be subjected to methodological scrutiny rather than to subjective debates about judgments. If different conclusions are reached by different analysts, the methodology provides a means of readily identifying the source of the difference, as the underlying assumptions of the method are defined a priori. WoE conclusions based on the Framework, including the WREL rankings and interpretive criteria proposed here, can be examined for sensitivity to the underlying assumptions of the method. For example, the rankings for particular endpoints could be reassigned to determine how their assessment affects the overall support for any hypothesized endocrine activity. This could be useful for judging the strength of the proposed rankings as well as for providing a means for individual analysts to test rankings that appear questionable or incorrect. This seems a particularly important strength, since different scientific judgments could be made for many of the proposed rankings. The ability to change the rankings and to include additional endpoints in specific ranked categories also provides a ready means for the methodology to incorporate new scientific information as it emerges.

The disadvantage of the relevance weightings and interpretive criteria proposed here is their lack of quantitative foundation. Although this does not diminish the advantages discussed above, it must be kept in mind that the EDSP was to be reviewed and reevaluated following the completion of an initial screening of 50 to 100 chemicals (U.S. EPA, 1999). A complete review of the data is needed and should include an evaluation of individual assay performance as well as an evaluation of the positive and negative predictive value of the battery as a whole for reproductive and developmental toxicity. Such a review and evaluation would provide a more quantitative basis for adjusting the methodology proposed here. Although an evaluation of positive and negative predictive values would not satisfy causality requirements for demonstrating that a particular adverse effect is mediated by an endocrine mode of action, such information would aid in determining the effectiveness and efficiency of the EDSP for protecting public health and environment.

This article uses the ESB assays to illustrate a process and rationale for transparently ranking endpoints for WoE evaluations. Although the process is broadly applicable, we do not portend to include all studies that might be used to assess potential endocrine activity. While such a treatise would be useful, it is beyond the scope of a single publication. Nonetheless, it is hoped that analysts addressing other endpoints and other hormonal modalities could apply the principles and rationale illustrated here to develop objective and transparent relevance rankings.

Identifying studies that might be used in WoE assessments requires careful consideration of the goals of the particular assessment and how unambiguously each endpoint would inform those goals. Several different WoE evaluations are implied by the U.S. EPA's EDSP (Borgert et al., 2011a), and different types of studies may be required for each. If the goal of the assessment is to identify the potential to act via specific endocrine modes of action, as is the intent of the ESB, then it is important to select endpoints with a high degree of mechanistic specificity. Many long-term studies measure apical endpoints that could be affected by endocrine activity, but which could also be affected by other types of toxicity. Even though such endpoints may be quite relevant for assessing adverse effects, they may have little utility for identifying a mode of action. Conversely, endpoints with sufficient mechanistic specificity to identify potential modes of action may not be definitive for adverse effects and hence, for WoE evaluations intended to inform risk assessments.

CONFLICT OF INTEREST STATEMENT AND FUNDING DISCLOSURE

Some substances under EPA EDSP Test Orders for endocrine screening are produced by the authors’ employers. This article has been reviewed in accordance with the peer- and administrative-review policies of the authors’ organizations. The views expressed here are those of the authors and do not necessarily reflect the opinions and/or policies of their employers. There are no contractual relations or proprietary considerations that restrict dissemination of the research findings of the authors. C. J. Borgert, E. M. Mihaich, and L. D. Stuchal are independent scientists/consultants who received financial support for portions of this project from the Endocrine Policy Forum. Time spent by other co-authors was supported by their respective employers or a personal contribution.

Ancillary