### 1. Introduction

- Top of page
- Summary
- 1. Introduction
- 2. The ToxCast Data
- 3. Model Description
- 4. Model Comparisons
- 5. ToxCast Data Application
- 6. Discussion
- 7. Supplementary Material
- Acknowledgments
- References
- Supporting Information

There are thousands of untested chemicals in common use. Comprehensive toxicity testing of all chemicals is infeasible due to high monetary and temporal costs (Judson et al., 2009). To address this problem, a new paradigm in toxicity testing focuses on screening larger numbers of chemicals on a diverse battery of relatively quick and inexpensive high-throughput screening (HTS) assays that measure a variety of cellular and biochemical responses. Each assay measures a single endpoint, such as transcription of a target gene or binding to a specific receptor protein. The aim of HTS is to predict which chemicals are most likely to perturb normal biological processes that lead to adverse human health and environmental effects, and focus scarce testing resources on those chemicals.

To predict potential chemical activity from HTS data requires statistical models to estimate the response of each chemical on each assay and to compare and rank chemicals. There are three main quantities used to compare chemicals: (1) the probability that an active response occurred, (2) the potency or concentration at which a response occurs, and (3) the efficacy or magnitude of the response. These three quantities form the basis of chemical prioritization. With improved estimates of these three quantities, predictive models will be able to better predict which chemicals are most likely to have potentially hazardous effects.

Dose–response modeling for HTS data is unique because there are a large number of curves to estimate but the data for each curve are sparse. For example, the ToxCast project at the US EPA (Dix et al., 2007; Kavlock et al., 2012) has screened nearly 2000 chemicals on over 700 HTS assays; however, each chemical-assay combination is tested at 6–10 unique concentrations and, in most cases, in singlicate at each concentration. Analysis is further complicated by assay and chemical effects, such as assays that are more or less sensitive, correlated assays that measure the same or similar cellular response, and chemicals that are highly active or not active on a variety of assays. Hence, HTS requires a dose–response method that is robust to the sparsity of the data for each chemical-assay combination, takes advantage of the larger number of chemicals and assays, and accurately estimates the efficacy, potency, and probability of an active response.

A variety of parametric models are used for estimating monotonic dose–response curves (Ritz, 2010). The most common method is the four-parameter log-logistic model (FPLL) which directly parameterizes the efficacy and potency with the (maximal response or upper asymptote) and (concentration at which the half maximal response occurs), respectively. Figure 1a shows an annotated example FPLL dose–response curve. The current release of EPA ToxCast results uses FPLL to fit each dose–response curve with least squares (Judson et al., 2010). Fitting 6–10 observations with a four-parameter model using least squares results in poor variance estimates (currently not provided in the ToxCast public release) and no estimates of the probability that an active response occurs.

There are several available frequentist approaches for monotonic curve estimation (e.g. Friedman and Tibshirani, 1984; Mukerjee, 1988; Mammen, 1991; Hall and Huang, 2001; Mammen et al., 2001; Wang and Li, 2008). Recently, several semiparametric Bayesian methods for monotone regression for a single curve have been proposed. Holmes and Heard (2003) proposed using piecewise constant functions with random knots, Neelon and Dunson (2004) utilized a piecewise linear spline model, and Curtis and Ghosh (2011) developed a Bernstein polynomial model. Also, Shively, Sager, and Walker (2009) modeled the monotonic function as the integral of a positive function. None of these general semiparametric regression methods directly parameterize the efficacy or potency of a dose–response relationship, which are widely used for comparing chemicals and predictive modeling.

Our problem is different from these because we are estimating the response for several chemicals on multiple assays. Bayesian hierarchical models have been used in many fields where the data takes a natural hierarchical structure. Several in vivo or developmental toxicity studies have used Bayesian hierarchical models to improve estimates when measuring the response of multiple correlated endpoints tested with a single chemical (e.g., Faes et al., 2006; Choi et al., 2010) or used a multivariate model that assumes correlation in residuals for multiple health outcomes (Neelon and Dunson, 2004).

To incorporate dependence in the regression function between four HTS assays and eight nanomaterials, Patel et al. (2012) estimate dose–duration-response surfaces using linear B-splines with two internal knots in both the duration and dose direction. The first knot parameterizes potency with the no observable adverse effect level (NOAEL), an alternative measure to . For each chemical, they model correlation in the knot location and basis coefficients across the assays, but they do not model correlation across chemicals within assay. While the direct parameterization of the NOAEL is appealing, this model does not directly parameterize the probability of a response or the efficacy, two important parameters for prioritization. In addition, the model does not include assay effects which we assume to exist in our data and can potentially improve fitting with the large number of chemicals but small sample size with each chemical-assay. While the simple choice of linear splines with two internal knots provides for a reasonable size model for a generalized additive model, this basis is not realistic for a one-dimensional model (dose only, not dose–duration surfaces).

In this article, we propose a Bayesian hierarchical model for dose–response curves that is specifically tailored to the high-dimensional, sparse data setting of the ToxCast project, called the zero-inflated piecewise log-logistic model (ZIPLL). ZIPLL is a mixture between a non-active response and an active response that extends FPLL to a more flexible spline formulation while maintaining direct parameterization of the efficacy and potency of each chemical-assay combination. Our Bayesian approach naturally estimates the three key summary statistics and measures of uncertainty for ranks of efficiency and potency which should allow for decision-makers to use the results appropriately when deciding which chemicals to consider for future, more comprehensive, testing. We use a hierarchical framework that borrows strength across chemicals and assays. This adds robustness, incorporates assay and chemical effects, and allows for estimation of joint distributions of responses across multiple assays. In addition, prior information and covariates can be included to exploit known relationships between chemicals, between assays, and between chemical-assay combinations.

### 2. The ToxCast Data

- Top of page
- Summary
- 1. Introduction
- 2. The ToxCast Data
- 3. Model Description
- 4. Model Comparisons
- 5. ToxCast Data Application
- 6. Discussion
- 7. Supplementary Material
- Acknowledgments
- References
- Supporting Information

The ToxCast project uses a diverse battery of HTS assays and informatic models to rapidly characterize the activity of thousands of chemicals. These chemical activity profiles are used to support decisions regarding prioritization for further testing (Reif et al., 2010), predict in vivo activity (Martin et al., 2011), and inform risk assessments (Judson et al., 2011). In support of these goals, the ToxCast project has tested over 2000 chemicals on over 700 HTS assay endpoints for which analysis is ongoing. The data for the first 309 chemicals tested for Phase I are publicly available (http://www.epa.gov/ncct/toxcast/data.html). Figure 2 illustrates the unique structure of the data.

In this article we use the 309 chemicals in the publicly available data and a subset of 81 assays comprising the multiplexed transcription factor reporter platform (Romanov et al., 2008, www.attagene.com). This platform enables high-content, functional assessment of transcription factor activity, which is a core component of cellular gene regulatory networks. Both *cis*-regulating response element constructs (CIS) and trans-activating (TRANS) potential of multiple nuclear hormone receptors are measured. These 48 CIS and 25 TRANS assays (plus 8 negative control assays) address relevant cellular processes including response to xenobiotics, genotoxic stress, hypoxia, oxidative damage, immune-modulation, and endocrine disruption. Martin et al. (2010) evaluated these assays’ response to the 309 Phase I chemicals.

Chemicals were diluted in dimethyl sulfoxide (DMSO) at, in general, six to ten unique concentrations on each HTS assay. The concentrations typically ranged from 0.046 to 100 or from 0.091 to 200 with each concentration three times the previous concentration. In cases of overt cytotoxicity, the concentration ranges were shifted up or down by a multiple of three in an attempt to recover the concentration range with a chance to show specific assay effects (Martin et al., 2010). Of the 309 Phase I chemicals four were tested in duplicate and one was tested in triplicate. The remaining chemicals were tested once at each concentration. The responses at each concentration are recorded in fold change over DMSO solution; hence, a response of 1 indicate no response. To reduce the inherent heteroskedasticity of the data, we log transformed the data before curve fitting, but return the data to the original scale before analyzing and plotting results.

### 3. Model Description

- Top of page
- Summary
- 1. Introduction
- 2. The ToxCast Data
- 3. Model Description
- 4. Model Comparisons
- 5. ToxCast Data Application
- 6. Discussion
- 7. Supplementary Material
- Acknowledgments
- References
- Supporting Information

The ZIPLL regression function is a mixture between an active and non-active response

- (1)

#### 3.1. Hierarchical Structure and Prior Specification

The chemical-assay specific parameters have normal priors

#### 3.2. Assay Effects, Chemical Effects, and Prior Knowledge

#### 3.3. Posterior Computation

The remaining parameters, and , do not have closed form posterior distributions. To reduce autocorrelation we use a resolvant transition kernel based on the Metropolis-Hastings kernel (Robert and Casella, 2004). We provide details on sampling these parameters in Web Appendix A. An R package to implement ZIPLL is provided in the supplemental material.

This algorithm performed well on our simulated and real data and by using the resolvant kernel there is a reasonably small level of autocorrelation. For the full 309 chemicals and 81 assays we ran the chain for 50,000 iterations and discarded the first 20,000 as burnin. The smaller simulation with 100 curves was run for 20,000 iterations with 5000 discarded for burnin. We assessed convergence by inspecting trace plots, and comparing multiple chains.

MCMC sampling is carried out in C called from R (R Development Core Team, 2011) with .C. Runtime for simulated data set of 100 curves of eight observations for 20,000 iterations is 42 seconds with ZIPLL. Analysis of all 309 chemicals and 81 assays with 50,000 iterations of ZIPLL including random assay effects and probit model for covariates as specified in Section 'ToxCast Data Application' took 19.9 hours. Both computation times are on a DELL Dual Processor Xeon Six Core 3.6 GHz machine with 60 GB RAM.

### 5. ToxCast Data Application

- Top of page
- Summary
- 1. Introduction
- 2. The ToxCast Data
- 3. Model Description
- 4. Model Comparisons
- 5. ToxCast Data Application
- 6. Discussion
- 7. Supplementary Material
- Acknowledgments
- References
- Supporting Information

We fit the 309 chemicals and 81 assays with ZIPLL, including assay random effects and probit model as specified in Section 'Assay Effects, Chemical Effects, and Prior Knowledge'. Figure 3 shows the dose–response estimates for 12 chemicals on pregnane X receptor response element (PXRE) fit with ZIPLL and the reported fits from the ToxCast public use files. The first row shows three chemicals where the ZIPLL posterior mean is similar to the FPLL fits reported in ToxCast. The second and third rows show chemicals where ZIPLL better fits the data by adapting to an asymmetric response pattern.

The bottom row of Figure 3 highlights the importance of probabilistic estimation of an active response. The three chemicals shown have similar response patterns; however, using the current ToxCast methodology one is marked active, having increased by at least one fold change, on PXRE while the other two are not. With ZIPLL, the estimated probabilities of response are between 0.26 and 0.87. Web Appendix C Figure 2 compares the ZIPLL probability with ToxCast indicator for all 309 chemicals on three assays. The majority of chemicals are considered not active or active with both methods. However, using ZIPLL we estimated that several chemicals have posterior probabilities of response between 0.1 and 0.9, suggesting that there is not conclusive evidence that these chemical responded or not, but are forced to be classified as either active or not in the ToxCast data. The set of chemicals having high non-zero posterior probabilities of response via ZIPLL yet a ToxCast call of no response include several with evidence of PXR activity from other ToxCast assays (e.g., Flumiclorac-pentyl) and/or independent structure-activity models (e.g., Butafenacil) (Kortagere et al., 2010).

#### 5.1. Summary of Active Responses and Assay and Chemical Effects

A natural result of the hierarchical analysis is estimation of the joint distribution of responses across assays. Figure 4a shows the number of active assay responses for each chemical. The number of assay responses reported in ToxCast tends to be around the lower bound of the ZIPLL posterior interval and is similar to the number of assay responses estimated if we consider anything with a ZIPLL posterior probability of 0.75 to be active. Overall, there are 2667 (2616, 2718) active assay-chemical combinations estimated with ZIPLL compared to 1887 reported in ToxCast. This suggests that some assay responses may be missed using the current ToxCast methods, potentially hindering prioritization efforts.

Figure 4b shows the posterior of the assay random intercept for the probit model of probability of response. The most and least sensitive assays had statistically significant random effects. At the lower end, the eight assays with the prefix “M_” are negative controls and all had effects around −10, while more potent assays like PXRE and PPAR had large positive effects. The posterior mean of the coefficient for LogP is −0.0005, and this effect was not significant. This may be due to selection bias. Solubility (low logP) was part of the selection criteria for the first 309 chemicals in order to accommodate solubility in DMSO; however, this restriction was relaxed for chemicals included in forthcoming ToxCast phases, so LogP may prove to be an important factor in future samples.

#### 5.2. Comparison with Reference Chemicals

A useful way to summarize the results for each assay is to compare chemicals with reference chemicals known to be active on the assay. For example, PPAR is a commonly used assay that has a plausible connection with neoplastic pathology (see, Peters et al., 1997;, 2007). Figure 5b highlights the response of four reference chemicals for PPAR: perfluorooctane sulfonic acid (PFOS), Diethylhexyl phthalate (DEHP), Phthalic acid, mono-2-ethylhexyl ester (PAMEHP), and perfluorooctanoic acid (PFOA) (Casals-Casas and Desvergne, 2011). Because the reference chemicals have known biological effects, other chemicals with a high probability of being more potent than the reference chemicals on a given assay may have greater potential for similar biological effects to the reference chemicals, and thus may be higher priority candidates for additional testing than chemicals that are not as potent as the reference chemicals. The four reference chemicals’ posterior mean potencies rank (with 1 being the most potent) 42.9 (35.0, 49.0), 105.7 (57.0, 154.0), 114.0 (68.0, 156.0), and 122.6 (72.0, 159.0), respectively, on this assay among 161 chemicals with at least 0.5 probability of activity, indicating there are many good candidates for further testing.

Another commonly studied assay is ER. Figure 5c shows results for ER with reference chemicals Bisphenol A (BPA) and Methoxychlor highlighted. These two reference chemicals have mean posterior rank 1 (1.0,1.0) and 7.4 (4.0,10.0), respectively, among the 103 chemicals with posterior probability of an active response of 0.5 or more. This implies there is at least 0.95 probability that BPA is the most potent chemical among the 309 and very few ToxCast chemicals are more potent than Methoxychlor.

### 6. Discussion

- Top of page
- Summary
- 1. Introduction
- 2. The ToxCast Data
- 3. Model Description
- 4. Model Comparisons
- 5. ToxCast Data Application
- 6. Discussion
- 7. Supplementary Material
- Acknowledgments
- References
- Supporting Information

Our proposed MCMC algorithm takes about 20 hours to fit 309 chemicals on 81 assays, longer than the published ToxCast method. However, for HTS projects like ToxCast, data are analyzed in large batches, so real-time updates are not necessary. As a result, emphasis is on model performance over efficient computation. In the case that a small number of chemicals were added, all hyperparameters could be fixed based on the full run and the posterior computed for the new chemicals in a few minutes. For larger batches, computation time for ZIPLL scales linearly for both the number of chemicals and number of assays, making runs on larger experiments feasible.

We applied ZIPLL to the ToxCast data and showed that the probabilities of response were largely consistent with the binary classification in the ToxCast public release data. However, in borderline cases ZIPLL added useful information by quantifying the uncertainty in the presence of a response. We also demonstrated the advantage of estimating the posterior distribution of the AC50. This allowed us to rank chemicals and estimate the posterior probability that a chemical is more potent than reference chemicals, which provides a useful tool for prioritization.

Ultimately, a comprehensive risk assessment must include not only coverage of all relevant exposure and hazard factors, but thorough characterization of individual factors as well. The dose–response model provided by ZIPLL will prove especially useful in such a scenario, where the more informative results characterize HTS hazard in a manner that can be quantitatively combined with other risk factors. With the addition of data from future HTS projects having expanded assay coverage and reference chemical sets, these rankings can be extended to estimate the joint probability that chemicals are more active than reference chemicals on multiple assays, thus providing a physiologically relevant, pathway-based hazard assessment.