The use of benchmark dose uncertainty measurements for robust comparative potency analyses

The Benchmark Dose (BMD) method is the favored approach for quantitative dose–response analysis where uncertainty measurements are delineated between the upper (BMDU) and lower (BMDL) confidence bounds, or confidence intervals (CIs). Little has been published on the accurate interpretation of uncertainty measurements for potency comparative analyses between different test conditions. We highlight this by revisiting a previously published comparative in vitro genotoxicity dataset for human lymphoblastoid TK6 cells that were exposed to each of 10 clastogens in the presence and absence (+/−) of low concentration (0.25%) S9, and scored for p53, γH2AX and Relative Nuclei Count (RNC) responses at two timepoints (Tian et al., 2020). The researchers utilized BMD point estimates in potency comparative analysis between S9 treatment conditions. Here we highlight a shortcoming that the use of BMD point estimates can mischaracterize potency differences between systems. We reanalyzed the dose responses by BMD modeling using PROAST v69.1. We used the resulting BMDL and BMDU metrics to calculate “S9 potency ratio confidence intervals” that compare the relative potency of compounds +/− S9 as more statistically robust metrics for comparative potency measurements compared to BMD point estimate ratios. We performed unsupervised hierarchical clustering that identified four S9‐dependent groupings: high and low‐level potentiation, no effect, and diminution. This work demonstrates the importance of using BMD uncertainty measurements in potency comparative analyses between test conditions. Irrespective of the source of the data, we propose a stepwise approach when performing BMD modeling in comparative potency analyses between test conditions.


| INTRODUCTION
An appreciation of dose-response analysis of genotoxicity data has accompanied a shift from a hazard identification testing approach, toward quantitative assessment of genotoxicity for risk assessment purposes (Macgregor et al., 2015;Dearfield et al., 2017). Under the auspices of the Quantitative Analysis Workgroup (QAW) of the Health and Environmental Sciences Institute Genetic Toxicology Technical Committee (HESI GTTC), different statistical methods for assessing dose-response relationships in genetic toxicology studies have been evaluated (Gollapudi et al., 2013). An overall conclusion was that the benchmark dose (BMD) approach for analyzing dose-response data derived from genotoxicity studies exhibits the most favorable characteristics. The BMD for continuous data is defined as the dose that results in a predetermined change, typically ranging between 1 and 10% in the response rate of an adverse effect relative to existing background incidence (Macgregor et al., 2015).

Unlike other dose-response analysis methods such as the No Observed
Genotoxic Effect Level (NOGEL) approach, the BMD method is advantageous since it is not restricted to the study concentration/dose selection.
The BMD method evaluates the entire range of concentrations/doses within the dataset while providing measures of uncertainty such as confidence limits (Crump, 1984;Slob, 2002). The uncertainty in the estimation of the BMD is defined as the range delineated between the upper (BMDU) and the lower (BMDL) confidence bounds.
BMDs and their associated confidence intervals (CIs) can be interpreted in a manner that conveys the potency of the studied test article.
For example, CIs have been plotted to illustrate compound potency ranking from in vivo carcinogenicity and genotoxicity studies, as well as providing empirical comparisons across endpoints. Additionally, combined analysis can be performed for multiple dose-response datasets for a shared endpoint differentiated by one or more covariates (Slob and Setzer, 2014). This is known as the BMD combined covariate approach. A prominent study by Wills and colleagues  applied the BMD combined covariate approach to in vitro genotoxicity data and demonstrated that the precision of the BMD increases when compound, or another condition that pertains to mode of action (MoA), serves as the covariate. The precision of the BMD is defined by the ratio of the BMDL to the BMDU and has potential implications for regulatory decision making when used to define a Point of Departure (PoD) for extrapolation to a human exposure limit (White et al., 2020). Wills et al. (2016) states that as CIs represent the range in which the true BMD lies, potency differences are only statistically defendable when there is no apparent overlap between intervals. Thus, evaluation of the entire BMD CI (BMDL and BMDU) is imperative to drawing potency conclusions from BMD analysis of dose-response datasets. Researchers have used BMD CIs of chemical classes of interest to plot compound genotoxic and/or carcinogenic potency in rank order (Hernández et al., 2011;Soeteman-Hernández et al., 2015a;Soeteman-Hernández et al., 2015b;Wills et al., 2016Wills et al., , 2017. Some prominent examples where BMD CIs were used in comparative genotoxicity potency analyses include studies by Allemang et al. (2018) and Wheeldon et al. (2020). Allemang et al. (2018) performed BMD analyses to evaluate the relative genotoxic potency of 15 pyrrolizidine alkaloids (PAs) via in vitro micronuclei formation in HepaRG cells. Wheeldon et al. (2020) performed BMD analyses to evaluate the comparative genotoxic potency of 8 Topoisomerase II poisons studied in human lymphoblastoid TK6 cells using the MultiFlow DNA damage response assay. Both publications identified the utility of BMD CIs in compound comparative genotoxicity potency analyses to support read across and MoA determination for a limited number of compounds of interest. In all of the aforementioned studies, compound potency comparisons within and between genotoxicity endpoints were always performed by scrutinizing the shape and steepness of the underlying dose-response curve, and by graphically representing the BMD CIs.
Potency comparisons and measures of correlation were consistently evaluated by CIs spanning orders of magnitude versus deriving numerical values for comparison purposes.
Herein we consider a previously published in vitro genotoxicity dataset where the authors performed BMD analyses and drew conclusions without considering the uncertainty associated with the BMD measurements. Tian et al. (2020) investigated the use of phenobarbital/ β-napthoflavone-induced rat liver S9 at maximal non-cytotoxic concentration (0.25% vol/vol final) in a flow cytometry based multiplexed DNA damage response assay. The laboratory was able to maintain the S9 enzyme/co-factor mix with cells and test compound for the entire exposure period of 24 hr in the TK6-based assay. The investigators focused on 15 chemicals: 8 of which are clastogens that are known to require metabolic activation to maximize formation of DNA-reactive metabolites; 5 are cytotoxicants; and 2 are direct acting clastogens that do not require metabolic activation. In addition to determining compound MoA through the use of biomarker responses, the authors applied the BMD approach with the aim of calculating a numerical "S9 potentiation ratio" value, which served as a comparison metric obtained by division of the BMD value in the absence of S9 by the BMD value in the presence of S9. Upon further consideration of this approach, we believe that there are significant shortcomings in the use of solely a BMD point estimate in their comparative potency analysis. Using the same logic highlighted by Wills et al. (2016); it is irreconcilable to rely upon a BMD point estimate to robustly compare potencies, since the BMD CI represents the range in which the true BMD lies. To our knowledge, there are no published reports that critique the use of a BMD point estimate, and how this differs from use of BMD CIs in comparative analysis between experimental conditions. Herein, the Tian et al. (2020) data is re-analyzed to further inform use of the same in vitro genotoxicity/low concentration S9 dataset. Primarily, we stress that the use of a BMD point estimate value does not provide an accurate representation of the likely potency range of the test compound. Said another way, the BMD point estimates and associated "S9 potentiation ratio" comparison metrics did not convey information about the uncertainty of the measurements that are consistent with the BMD uncertainty measurement approach that is advocated in the scientific literature.
This current report focuses on reanalysis and augmentation of the previously published Tian et al. (2020) dataset. Specifically, we convey the BMD uncertainty measurements that relate to the relative genotoxic potency of the 10 clastogenic compounds. To this end, we calculate "S9 potency ratio CIs" using the BMD uncertainty measurements (BMDL and BMDU) between S9 exposure conditions (the presence and absence of S9) and utilize said ratios to derive robust potency conclusions as a follow up to the Tian et al. (2020) work. Readers are encouraged to refer to the original Tian and colleagues' article for further context (Tian et al., 2020).

| In vitro genotoxicity dataset
The data were derived from a previously published article in which 15 compounds were studied using the in vitro MultiFlow ® DNA Damage Assay in the presence and absence of low dose (0.25% vol/vol) S9 (Tian et al. 2020). The in vitro MultiFlow DNA Damage Assay multiplexes several biomarkers that are responsive to diverse forms of DNA damage into a single flow cytometric analysis. The multiplexed biomarkers include: (a) phosphorylation of H2AX at serine 139 (γH2AX) for the detection of DNA double strand breaks, (b) phosphorylation of histone H3 at serine 10 (p-H3) to identify mitotic cells, (c) nuclear p53 content as an indicator of p53 activation, (d) frequency of 8n+ cells to monitor polyploidization, and (e) relative nuclei counts (RNC) to provide information about treatment related cytotoxicity . A detailed description of the assay falls out of scope of this article; thus, interested readers are encouraged to refer to several publications that describe the MultiFlow assay Bryce et al., 2017;Bryce et al., 2018;Dertinger et al., 2019). As previously mentioned, the Tian et al. (2020) data included 8 direct acting clastogens that require metabolic activation, 5 cytotoxicants and 2 direct acting clastogens that do not require metabolic activation. In this reanalysis, we focused on the 10 clastogens, since this is where Tian et al. (2020) suggested the greatest differences in potency between S9 exist. The raw data were provided by Litron Laboratories and included 4 and 24 hr p53, γH2AX, and 24 hr RNC responses for the 10 clastogens: 2-acetylaminofluorene, 2aminoanthracene, 7,12-dimethylbenzanthracene, benzo[a]pyrene, cyclophosphamide, dibenzo[a,l]pyrene (also known as dibenzo[def,p]chrysene), diethylnitrosamine, mitomycin C, 2-amino-1-methyl-6-phenylimidazo [4,5-b]pyridine (PhIP), and resorcinol. Tian et al. (2020), performed BMD analysis on individual compounds with the S9 exposure (with or without [+/−] 0.25% vol/vol S9) condition serving as the covariate. Hence, individual dose-response curves on a compound basis already existed for this dataset. Here, the individual dose-response curves were scrutinized so that compounds and/or endpoints with little to no evidence of a dose-response could be disqualified from reanalysis. Hence, we excluded 4 hr p53 dose responses for diethylnitrosamine; 24 hr p53 dose-responses for diethylnitrosamine;
PROAST version 69.1 operating in R 4.0.2 was used to analyze the continuous dose-response data (http://www.proast.nl) for the 4 hr and 24 hr γH2AX, 4 hr and 24 hr p53, and 24 hr RNC endpoints.
We applied the exponential model as a sequence of nested models with increasing number of parameters (EFSA 2009;Slob and Setzer 2014;EFSA 2017). In this reanalysis, compound was selected as the covariate-differing from Tian et al. (2020) where S9 condition was the covariate-since this includes more dose-responses in a single analysis to increase the precision of the BMDs. PROAST provided the option to select model 3 or 5 from the family of models. In doing so, the number of parameters were increased to test whether the associated log likelihood is significantly increased, thus informing if the additional parameters were required for describing the doseresponse (Slob, 2002;EFSA, 2009;Slob and Setzer, 2014;EFSA, 2017). Either model 3 or 5 was finally automatically selected as the most appropriate model to describe the shape of the dose-response curve. Readers are advised to refer to Slob and Setzer (2014) for detailed information about the dose response models (including models 3 and 5) and their algorithms.
Consistent with the approach utilized by Tian et al. (2020), an arbitrary critical effect size (CES) of 0.3 was selected for this reanalysis. A CES of 0.3 represents a 30% change in response compared to the concomitant control. CES −0.3 for the RNC endpoint represents a 30% decrease in response for this endpoint analyses.
There is a lack of consensus on the appropriate choice of CES reported in the literature for in vivo endpoints (White et al., 2020), thus a lengthy discussion on CES falls outside the scope of this report.
In any event, we justify the use of CES 0.3 for these in vitro endpoints since the resulting BMDs do not lie in the extremities of the dose response curves where the associated uncertainties could be relatively high.
The value of the BMD is a point estimate with an associated level of precision (the ratio between the BMDL and BMDU), thus 90% BMD CIs were obtained for each compound and endpoint combination in the presence and absence of S9 and were graphically plotted.

| Unsupervised clustering
Lower and upper values of the "S9 potency ratio CIs" (algorithm described in the results and discussion section) were evaluated using JMP software's unsupervised clustering platform (JMP, v12.0.1). Lower and upper "S9 potency ratio CI" values associated with the following 5 biomarkers were used as variables: 4 hr p53, 24 hr p53, 4 hr γH2AX, 24 hr γH2AX, and 24 hr RNC. The analysis options were set as follows: clustering method = hierarchical; method for calculating distances between clusters = "Ward"; data as usual = "Standardize Robustly"; data visualization = "Dendrogram," with "two-way clustering."

| BMD confidence intervals
Values of the BMD, BMDL, and BMDU were estimated and collated for all compounds (+/− S9) and endpoints that were included in the BMD analysis. The BMD CIs were plotted for each compound with +/−S9 CIs side-by-side (Figures 1-10). The BMD point estimates are included in the comparative potency plots to aid in graphical representation of the BMD point estimate relative to the corresponding BMDL and BMDU. This is important since some individuals misinterpret the BMD to be at the midpoint (or geometric mean) between the BMDL and BMDU. This is a misconception as evident by the differing lengths of the CIs either side of each BMD point estimate displayed in For the most part, as one would expect, the tested clastogens that require metabolic activation to exert genotoxic effects show an increase in potency when exposed to the low concentration S9 system. This is evidenced by +S9 CIs that reside in more potent regions of the comparative potency plot by several orders of magnitude compared to −S9 CIs. An outlier is 2-acetylaminofluorene where all F I G U R E 5 BMD confidence intervals for cyclophosphamide. Two-sided confidence intervals were obtained for all endpoint's BMD analyses. BMD point estimates are displayed as data points F I G U R E 6 BMD confidence intervals for dibenzo[a,l]pyrene. 24 hr RNC endpoint without S9 yielded a disproportionately high BMD with infinite BMDU: BMD restricted and only displaying the BMDL. BMD point estimates are displayed as data points endpoints except for 24 hr γH2AX and 24 hr RNC display CIs that reside in the same order of magnitude (Figure 1). This suggests that S9's potentiating effect was restricted to these endpoint/timepoint combinations. The comparative potency plots also show that the direct acting clastogens, mitomycin C and resorcinol, were not potentiated by S9-rather, to a certain extent, the presence of S9 slightly decreased their genotoxic potency. Tian et al. (2020) derived S9 potentiation ratios by comparative analysis of the BMD point estimates across the S9 exposure conditions for each compound and endpoint. There is a concern that the use of BMD point estimates can misrepresent the potency effect of S9 in this in vitro system since the BMD measure of uncertainty is disregarded.

| S9 potency ratio CIs
F I G U R E 7 BMD confidence intervals for diethylnitrosamine. 24 hr γH2AX endpoint without S9 yielded a disproportionately high BMD with infinite BMDU: indicated as a dashed line spanning the width of the plot. BMD point estimates are displayed as data points F I G U R E 8 BMD confidence intervals for mitomycin C. Two-sided confidence intervals were obtained for all endpoint's BMD analyses. BMD point estimates are displayed as data points Tian et al. (2020) correctly pointed out that the BMD estimate can be beyond the top concentration used in the benchtop experiment. In these instances, the authors calculated the "S9 potentiation ratio" by restricting the BMD to the top concentration (denoted with a greater than [>] symbol in the tabulated results section of their article). However, these tight BMD restrictions led to mischaracterization of the true BMD ratio. We contend that one should only limit BMD values for graphical display where the value is disproportionately larger than other endpoints and S9 conditions for the same compound. In the BMD analysis reported in this re-analysis, disproportionately high BMD values also coincided with unbound BMDUs.
The BMD CIs obtained in this re-analysis were used to calculate an "S9 potency ratio CI" that is derived for each compound and endpoint BMD analysis that returned a dose response by comparing the F I G U R E 9 BMD confidence intervals for PhIP. 24 hr p53 endpoint without S9 and 4 hr p53 without S9 infinite BMDUs indicated with dashed positive direction lines. BMD point estimates are displayed as data points F I G U R E 1 0 BMD confidence intervals for resorcinol. Two-sided confidence intervals were obtained for all endpoint's BMD analyses. BMD point estimates are displayed as data points T A B L E 1 Compound/endpoint combination BMD ratios and "S9 potency ratio CIs" (original and log scale) compared with the Tian et al. (2020) "S9 potentiation ratios" conditions (+/− S9) included here, the same algorithm could apply to 2 or more experimental conditions where one potency is compared to another potency of interest. The "S9 potency ratio CIs" were calculated in the original scale so that the values can be compared to the "S9 potentiation ratios" values obtained from BMD-BMD ratios by Tian et al. (2020). Values greater than 1 represent increased compound potency for a particular endpoint after S9 exposure, whilst values less than 1 confer the inverse. 0 values represent CIs that overlap and hence potency differences are statistically indefensible. The BMD-BMD ratios from our analysis are also shown for comparative purposes. We have also displayed the S9 potency ratios in the Log scale for the sole purpose of aiding in demonstration of differences in orders of magnitude. The values are displayed in Table 1 in comparison with the S9 potentiation ratios from Tian et al. (2020).
Comparing our "S9 potency ratio CIs" with the S9 potentiation ratio obtained by Tian et al. (2020) shows that in almost all cases, the "S9 potentiation ratio" that was derived from BMD point estimates is either mischaracterized through tight restrictions on the BMD (where the researchers denoted these ratios with a > sign) or is in the upper range of our "S9 potency ratio CI" range, indicating a tendency for over estimation of the S9 bioactivation effect.
Calculating an "S9 potency ratio CI" for these experiments accurately conveys the uncertainty measurements which should be accounted for when assessing potency comparisons across conditions.
We can summarize the "S9 potency ratio CIs" calculated here by stating that S9 increases the potency of the compounds ranging from

| S9 potency ratio CI range
As previously mentioned, the "S9 potency ratio CI" values span several orders of magnitude. In order to objectively define the range of "S9 potency ratio CIs" obtained, we analyzed the lower and upper values (Log scale) of each "S9 potency ratio CI" via unsupervised hierarchical clustering. The clustering was based on squared Euclidean distance "Ward's method" (Ward and Hook 1963) between points. The resulting groups are presented in Figure 11 in the form of a two- The BMD-BMD ratio is presented to compare with the "S9 potentiation ratio" from Tian et al. (2020). In essence, the Tian et al. (2020) "S9 potentiation ratio" is a BMD-BMD ratio. The majority of BMD-BMD ratios from the re-analysis are similar to the Tian et al. (2020) "S9 potentiation ratios" showing that the BMD analyses do not differ significantly (same order of magnitude). One outlier is the analysis of the benzo[a]pyrene 24 hr γH2AX endpoint where the BMD-BMD ratio is significantly smaller than the "S9 potentiation ratio." This is likely due to increased precision resulting from combined covariate analysis of the clastogen compounds in the reanalysis of this endpoint. The significant difference between "S9 potentiation ratios" and the BMD-BMD ratios is the tight restrictions placed on the BMDs to calculate the "S9 potentiation ratio" denoted by the greater than symbol ( There are 4 distinct clades identified in the dendrogram: (a) Chemicals whose genotoxic potency was dramatically increased in the presence of S9 (dibenzo[a,l]pyrene, cyclophosphamide, 2-amino anthracene, and PhIP); (b) chemicals whose genotoxic potency was increased in the presence of S9 (dimethylbenzanthracene, benzo [a] pyrene, and 2-acetylaminofluorene); (c) chemicals whose genotoxic potency was not increased in the presence of S9 (mitomycin c, resorcinol, and diethylnitrosamine); and (d) a subset of clade 3, clade 4, whose genotoxic potency was reduced in the presence of S9 (mitomycin c, and resorcinol; indicated by a strong dark blue in the heat map).

| CONCLUSIONS
The results of the data analysis presented here illustrate the necessity to utilize the full BMD CIs to draw conclusions from dose-response datasets. We have demonstrated that in comparative potency analysis, use of BMD point estimates can yield mischaracterized or overestimated potency ratios during instances where the width of the CIs varies considerably between conditions.
Interpretation of BMD CIs relies upon visual scrutiny by the evaluating scientist, and we speculate that one could be overwhelmed with the results from hundreds of compounds from in vitro screening experiments. Therefore, the evaluating scientist may wish to use statistical methods such as hierarchical clustering to aid in data interpretation. When objectively considering the range of "S9 potency ratio CIs" presented in our analysis, one is provided with a scale of high to low, zero, and negative S9 potency effects. While we illustrated the potency effect of S9 in an in vitro genotoxicity test system, the same approach can be applied to any comparative genotoxicity potency analysis. We propose that investigators apply the following stepwise approach when performing comparative potency analysis of 2 or more experimental conditions using the BMD methodology: 1. Utilize the results of combined covariate BMD analyses to plot BMD CIs that represent potency across the endpoints evaluated F I G U R E 1 1 Unsupervised hierarchical clustering results are shown as a two-dimensional dendrogram with heatmap for the 10 clastogens "S9 potency ratio CIs." The values of the lower and upper confidence limits of the "S9 potency ratio CIs" were included in the analysis. The lower and upper bound of the "S9 potency ratio S9 CIs" are plotted on the X-axis per endpoint combination. Compounds are plotted on the Y-axis. Increasing intensities of red indicate a strong tendency for S9 exposure to increase the genotoxic potency of a compound for a specific endpoint. An increasing intensity of blue indicates the converse, where the presence of S9 decreases the genotoxic potency of a compound for a specific endpoint. Gray-blue represents compound/endpoint combinations where the dose-response following S9 exposure was not statistically significantly different (overlapping CIs), or where a zero value was included to accommodate the clustering method in instances where no S9 potency ratio CIs were obtained (infinite BMDUs). There are 4 distinct clades that group compounds into (1) high, (2) low, (3) zero, and (4) negative (as a subset of clade 3), effects on potency as a result of S9 exposure. Abbreviation: DMBA = 7,12-dimethylbenzanthracene under different dependent variables. For graphical purposes only, limit disproportionately high BMDs that display infinite BMDU values.
2. Derive the lower and upper values of "potency ratio CIs" between conditions by comparing BMDU condition 1 to BMDL condition 2, and BMDL condition 1 to BMDU condition 2 (ad infinitum), respectively.
3. Objectively identify groups in the data that best describe the magnitude of the difference in potency observed between conditions.