Interlaboratory validation of the ToxTracker assay: An in vitro reporter assay for mechanistic genotoxicity assessment

ToxTracker is a mammalian cell reporter assay that predicts the genotoxic properties of compounds with high accuracy. By evaluating induction of various reporter genes that play a key role in relevant cellular pathways, it provides insight into chemical mode‐of‐action (MoA), thereby supporting discrimination of direct‐acting genotoxicants and cytotoxic chemicals. A comprehensive interlaboratory validation trial was conducted, in which the principles outlined in OECD Guidance Document 34 were followed, with the primary objectives of establishing transferability and reproducibility of the assay and confirming the ability of ToxTracker to correctly classify genotoxic and non‐genotoxic compounds. Reproducibility of the assay to predict genotoxic MoA was confirmed across participating laboratories and data were evaluated in terms of concordance with in vivo genotoxicity outcomes. Seven laboratories tested a total of 64 genotoxic and non‐genotoxic chemicals that together cover a broad chemical space. The within‐laboratory reproducibility (WLR) was up to 98% (73%–98% across participants) and the overall between‐laboratory reproducibility (BLR) was 83%. This trial confirmed the accuracy of ToxTracker to predict in vivo genotoxicants with a sensitivity of 84.4% and a specificity of 91.2%. We concluded that ToxTracker is a robust in vitro assay for the accurate prediction of in vivo genotoxicity. Considering ToxTracker's robust standalone accuracy and that it can provide important information on the MoA of chemicals, it is seen as a valuable addition to the regulatory in vitro genotoxicity battery that may even have the potential to replace certain currently used in vitro battery assays.

Genotoxicity testing is an essential part of chemical safety assessment.Recommendations for in vitro and in vivo testing are provided for various industries by different regulatory authorities, including human pharmaceuticals (ICH, 2011), industrial chemicals (2016), foods and food additives (Hardy et al., 2017), and cosmetic ingredients (SCCS, 2021).With the report from the U.S. National Research Council (NRC) in 2017 on Toxicity Testing in the 21st Century (Thomas, 2018), as well as from various international initiatives to improve the safety testing approaches (e.g., US Tox21, EuToxRisk, and Partnership for the Assessment of Risks from Chemicals (PARC)), comes a strong call for a major paradigm shift in toxicity testing.The use of novel approaches should allow genotoxicity risk assessment to move beyond classical hazard-based endpoints and non-relevant test systems to the integration of mechanistic-based testing strategies that include qualitative and quantitative assessment of the genotoxic mode-of-action (MoA).The ToxTracker assay fits into such an approach as it is highly accurate and predictive for in vivo genotoxicity while providing mechanistic insight into the type of DNA damage and if DNA damage may be secondary to effects on non-DNA targets and processes.In fact, in various legislations, there is already room for the use of New Approach Methods (NAMs) for such chemical safety assessment.Data from NAMs are used in conjunction with an Adverse Outcome Pathway (AOP) framework to demonstrate key biological events that are of toxicological importance, and together this forms an Integrated Approach to Testing and Assessment (IATA) that allows biological hazard assessment to progress to a better understanding of risk (SCCS, 2021).
ToxTracker is a stem cell-based reporter assay that identifies genotoxic compounds with high accuracy by combining six fluorescent reporter genes that are specifically activated by different cellular signaling responses that are associated with direct genotoxicity as well as cytotoxic effects like oxidative stress or protein damage that can indirectly lead to genotoxicity (Hendriks et al., 2012(Hendriks et al., , 2016)).The biomarker genes that are applied in ToxTracker were selected from toxicogenomics studies in which mouse embryonic stem cells (mESC) were exposed to 40 different genotoxic and non-genotoxic carcinogens (Hendriks et al., 2011).At the time of assay development, these primary cells were selected as they are genetically stable, have an infinite life span, a high cell proliferation rate and are proficient in all major DNA damage signaling and cell cycle regulation pathways (Hendriks et al., 2013).Additionally, mutations in stem cells were shown to play a crucial role in tumorigenesis (Yin et al., 2021), indicating that mESCs are highly relevant for genotoxicity and carcinogenicity hazard assessment.Induction of the fluorescent reporters (green fluorescent protein (GFP) signals) in ToxTracker are measured by flow cytometry in intact individual cells, across both exposed and nonexposed cultures.Simultaneously, cytotoxicity of the tested compounds is determined by relative cell count, that is, cultures exposed to a compound compared to the vehicle control cultures.By detecting the activation of these reporter genes following chemical exposure, ToxTracker can discriminate between induction of DNA damage, oxidative stress, and protein damage, which provides insight into the MoA of genotoxic substances in a single high throughput test (Figure 1) (Hendriks et al., 2016).
Genotoxicity is primarily predicted in ToxTracker by the induction of either of the two independent fluorescent reporters Bscl2-GFP and Rtkn-GFP.The Bscl2 reporter is activated upon the formation of bulky DNA adducts and subsequent inhibition of DNA replication, which is a potent activator of the DNA damage response (Hendriks et al., 2012).These replication-blocking DNA lesions often lead to the formation of gene mutations (Hoeijmakers, 2001).Activation of the Rtkn genotoxicity reporter is associated with induction of DNA double-strand breaks (Liu et al., 2004;McCool & Miyamoto, 2012).
Many mutagenic and clastogenic compounds cause DNA damage by direct interaction with the DNA and typically activate both the Bscl2 and Rtkn ToxTracker reporters.For such DNA reactive substances that may pose a cancer risk at very low doses, conceptually the risk management process involves determining or controlling to an exposure level that has negligible increases in overall human cancer risk (Step, 2023).
In contrast, there are also substances that are genotoxic without directly reacting with DNA, for example by inducing oxidative stress or by interfering with proper progress of cells through mitosis (Kirsch-Volders et al., 2003).The prototypical examples of indirect genotoxins are tubulin poisons, which interfere with chromosome segregation during mitosis.This causes numerical changes in chromosome segregation between daughter cells and subsequent G1 instability in the next cell cycle (Lynch et al., 2019).Hence, induction of the Rtkn-GFP reporter and arresting of cells in mitosis due to spindle assembly checkpoint activation is typically observed for this class of compounds (Brandsma et al., 2020).Also, compounds that do not bind to the DNA but cause high levels of oxidative stress in the cells can lead to genotoxic effects.Insufficient or faulty repair of oxidized nucleic acids (e.g., 8-oxo-G) that are caused secondary by oxidative stress, can lead to mutations or chromosomal aberrations, recently described in n.d.).In ToxTracker, induction of oxidative stress is detected by the Srxn1 and Blvrb reporters that are associated with the two major antioxidant pathways in the cell (Chang et al., 2004;Komuro et al., 1996).Additionally, protein misfolding or damage can lead indirectly to DNA damage.Secondary genotoxicity can be caused by induction of the unfolded protein response (UPR, detected by the Ddit3 reporter), which is a potent trigger of apoptosis, leading to DNA breaks and chromosome fragmentation (Wang & Kaufman, 2014).For these types of non-DNA reactive genotoxicity, various compensatory mechanisms exist (e.g., reduced GSH), which may permit exposure levels that are biologically non-adverse.Hence, a non-linear dose response relationship would be expected for some types of non-DNA reactive genotoxicity and therefore a "threshold" would exist where an exposure level that would be considered safe could be derived for compounds acting via an indirect MoA (Nohmi, 2018).Finally, the Btg2 reporter is a component of the p53-dependent DNA damage response and is involved in regulation of the G1/S cell cycle checkpoint but also is induced by various other cellular stressors, including DNA repair, cytostasis, and cytotoxicity (Rouault et al., 1996).Induction of Btg2 reporter may indicate genotoxic stress, but also may indicate a stalling of cell cycle to facilitate homeostatic restoration after a period of stress or other processes that may lead to cytotoxicity.For this reason, to accurately predict the genotoxicity of compounds, both direct and indirect genotoxic effects should be considered and integration of all six reporter genes in ToxTracker is required for reliable chemical safety assessment.
Extensive technical validation showed that ToxTracker combines a very high sensitivity (94%) and specificity (95%) for the detection of in vivo genotoxicity with the ability to provide insight into the MoA of genotoxic agents (Brandsma et al., 2020;Hendriks et al., 2012Hendriks et al., , 2016)).In many cases, ToxTracker was able to correctly predict the genotoxic MoA of the tested compounds, including discrimination between direct genotoxicity, secondary genotoxicity related to oxidative stress and differentiation between a clastogenic or aneugenic MoA.Induction of the ToxTracker reporter genes shows a strong correlation with the standard in vitro and in vivo genotoxicity assays.In a comparative study with 66 compounds from the ECVAM library of reference compounds (Kirkland et al., 2016), activation of the Bscl2-GFP reporter gene for induction of mutagenic DNA adducts showed a 93% correlation with a positive result in the Ames test and/or mouse lymphoma assay (MLA) for gene mutations.A negative result for the Bscl2-GFP ToxTracker reporter showed a 91% correlation with negative Ames and MLA results.Induction of the Rtkn-GFP reporter gene in ToxTracker indicated the induction of DNA double strand breaks and shows a very strong correlation of 92% with a positive result in the in vivo micronucleus (MN) and/or chromosomal aberration (CA) assays (Wills et al., 2021).A negative Rtkn-GFP reporter response correlated in 91% of the cases with a negative in vivo MN/CA result.Interestingly, 30%-40% of compounds that were negative in ToxTracker and negative in the in vivo MN assay did induce MN in vitro, underscoring the limited specificity of the in vitro MN assay (Fowler et al., 2012b).In many cases, the discrepancy between the in vitro MN and ToxTracker or in vivo MN assay could be explained by high levels of oxidative stress induction by the compound.Also for non-genotoxic compounds the MoA information can be valuable.Sustained induction of oxidative stress and the unfolded protein response have been shown to play a role in chemical carcinogenesis (Kakehashi et al., 2013;Madden et al., 2019).
To investigate how ToxTracker may complement the standard battery of in vitro genotoxicity assays, a comprehensive interlaboratory validation trial of the ToxTracker assay was organized.The primary goal of this validation ring trial was to determine if ToxTracker is able to correctly identify the genotoxic properties of compounds when rolled out to independent laboratories, according to OECD GD-34 (OECD, 2005).A broad selection of well-established genotoxic and non-genotoxic compounds was tested in ToxTracker to establish the sensitivity and specificity of the assay.In this interlaboratory validation study, the transferability and reproducibility of the ToxTracker assay was also established.The genotoxicity outcomes for the tested compounds were compared from various repeat tests within a laboratory and between laboratories to calculate the within-laboratory and F I G U R E 1 Schematic representation of the genotoxic and non-genotoxic endpoints covered in the ToxTracker assay.The shading of informative boxes on the periphery indicates the color of data presented within figures herein.The lines encasing the circular image represent overlap between the different pathways, where applicable, demonstrating the specificity or promiscuity of each pathway.
between-laboratory reproducibility (WLR and BLR, respectively).The secondary goal of the validation project was to investigate the MoA information that is provided by ToxTracker and the relevance for the genotoxicity prediction of the tested compounds.Information about the MoA of genotoxic and non-genotoxic compounds was applied to discriminate between direct and indirect genotoxic compounds and to better understand the results from ToxTracker in relation to results from the current standard in vitro and in vivo genotoxicity assays.Here we show the high accuracy of ToxTracker to predict the genotoxic properties of compounds with an excellent WLR and BLR.The MoA information on genotoxicity from ToxTracker was used to explain differences in results from the standard in vitro and in vivo genotoxicity assays.

| Dose range finding assay
A dose range finding experiment was performed using unmodified mouse embryonic stem (mES) cells (strain B4418) to select suitable concentrations for the definitive ToxTracker assays.mES cells were exposed to 11 concentrations of test compounds at four-fold dilutions in single wells.Cytotoxicity was determined after 24 h by relative cell count in cultures exposed to the compound and their related vehicle control cultures and a maximum test concentration inducing 50%-75% cytotoxicity was selected.In cases where limited or no cytotoxicity was observed, 1 mg/mL, or the maximal soluble concentration, was selected for the definitive assay.Selection of the highest testing concentration was based on ICH S2 (R1) and recommendations from the IWGT (Galloway et al., 2011;ICH, 2011).In ICH S2 (R1), it is recommended to use 1 mM or 0.5 mg/mL as top concentration for in vitro mammalian cell assays.
Only in case of a very low molecular weight, higher test concentrations should be considered.To avoid the possibility that compounds be tested at a too low concentration, we set the top concentration for all compounds at 1 mg/mL or the maximum soluble concentration.
Briefly, all assays were performed in 96-well plates, cells were seeded 24 h prior to exposure, and typically five concentrations were evaluated per compound with concurrent negative and positive controls.
Induction of GFP-reporters and the number of viable cells was determined by flow cytometry 24 h after exposure initiation and each experiment was conducted independently in triplicate.Relative GFP induction was calculated by taking the mean channel fluorescence intensity (MFI) from all three experiments and dividing by the MFI of corresponding vehicle control treated cultures.Cytotoxicity was determined using relative cell count (RS, cell count data gated on whole single cells) in compound exposed cultures compared to vehicle control cultures.
The six independent mES reporter cell lines were seeded on gelatin-coated 96-well plates.Twenty-four hours after seeding, medium was refreshed, and test material formulations were added to the cells in the presence and absence of aroclor-1254 induced rat S9 liver fraction (0.25% v/v) and cofactors (Regensys™) to induce metabolic activation (Moltox, Boone, NC).The S9 fraction metabolization protocol was previously optimized for ToxTracker to allow measuring induction of all ToxTracker reporter genes following chemical exposure (Czekala et al., 2021).Five concentrations were evaluated in consecutive two-fold dilutions for every substance, based on the results of the dose range finding study, in three independent experiments.Induction of the GFP reporters (MFI) was determined 24 h after exposure initiation using a flow cytometer equipped with 488 nm excitation and an appropriate filter set to collect green emission (525 nm).
Average GFP reporter induction (MFI) across the three experiments was compared with the GFP MFI from corresponding vehicle control treatments to calculate a fold-increase.Simultaneously, cytotoxicity was calculated using relative cell count as indicated for the dose-range finding assay.However, unless otherwise noted, for the definitive assay, cytotoxicity is represented as the mean across all cell lines ± standard deviation (SD), as generally the different cell lines have similar tolerability to chemical exposures (Hendriks et al., 2012).Positive controls were included in all experiments: cisplatin (DNA damage), diethyl maleate (oxidative stress), tunicamycin (UPR), and aflatoxin B1 (metabolic activation of progenotoxin by S9 liver extract) (see Figure S1 for details).

| S9 metabolization
ToxTracker relies on cofactor supplemented rat liver homogenate (S9 fraction) for metabolization of compounds.Originally, the standard S9 protocol that is also used in the in vitro MN assay was applied to ToxTracker (Hendriks et al., 2012(Hendriks et al., , 2016)).In this protocol, cells are exposed to compounds in the presence of 1% or 2% (v/v) S9-mix for 3-4 h, followed by a recovery period (e.g., 17-24 h) in fresh culture media before analysis.Exposure times are limited to 3 h because of the potential toxicity of S9-mix.Although when using this S9 protocol, genotoxic compounds are effectively metabolized and correctly identified as genotoxic by ToxTracker, the recovery time after exposure resulted in a strong reduction in signals for oxidative stress and protein damage induction.To improve the sensitivity for detection of all cellular responses, cells were exposed to 0.25% S9-mix (Moltox, Boone, NC) for 24 h continuously and ToxTracker reporter activation was analyzed immediately after exposure without a recovery period (Figure S4) (Czekala et al., 2021).

| Quality control criteria
To ensure reliable and consistent classifications, various quality control metrics and data acceptance criteria were defined for the Tox-Tracker assay (see Appendix S1, and Table S1).Proper growth of the GFP reporter cell lines was monitored and criteria for the minimal proliferation rate were defined.In every ToxTracker experiment, positive controls were used to induce DNA damage (cisplatin), verify liver S9 fraction activity (aflatoxin B1), and induce oxidative stress (diethyl maleate) or protein unfolding (tunicamycin), thus confirming the critical biological aspects of the assay during each trial.Minimum induction levels for the different reporters were defined following treatment with set concentrations of positive control compounds.Of note, for some of the reporter genes, the minimum induction level following exposure to the positive control compound is higher than the two-fold induction threshold for a positive ToxTracker classification as described below.In situations where these minimum induction levels by the positive controls were not reached, the experiment was invalidated.

| Interlaboratory trial management
The international ToxTracker validation trial was organized according to OECD guidance document 34 (OECD, 2005) with a validation management team (VMT) consisting of several recognized experts with established expertise in genetic toxicology and experience with conducting an interlaboratory validation study.The VMT was responsible for defining the validation project structure, selecting experienced laboratories from different industries, setting different milestones for the project, and analyzing test results.The ToxTracker interlaboratory validation project was divided into three phases; installation, using eight genotoxic and non-genotoxic compounds (not reported); proficiency demonstration, using six blinded compounds (ampicillin, mannitol, o-anthranilic acid, ethyl methanesulfonate, benzo (a)pyrene, and cisplatin) and the final validation phase, using 64 compounds with 24 or 30 blinded compounds per laboratory, so that each compound was tested at least three times in three independent test facilities.All compounds were purchased, coded, and distributed by the VMT to the participating laboratories as powder.ToxTracker cell lines and required cell culture reagents were provided by Toxys.

| Compound selection criteria
The aim was to have a broad selection of compounds (pharmaceuticals, agrochemicals, food and cosmetics ingredients, and chemicals) to cover as many chemical classes as reasonably possible with sufficient in vitro and in vivo genotoxicity data available to predict the expected outcome in ToxTracker.Compound selection was based on publicly available lists and databases (Kirkland et al., 2011(Kirkland et al., , 2016;;Madia et al., 2020).All compounds used in this study were obtained from Sigma Aldrich.Genotoxicity profiles of these compounds were previously established and were based on the weight of evidence (WoE) from various in vitro and in vivo mutation and genotoxicity assays that are publicly available.The procedure and considerations for selection of the compounds were as described for the JaCVAM international validation of the in vivo comet assay (Morita et al., 2015).The selected compounds can be divided into four groups: (I) genotoxic carcinogens, (II) genotoxic non-carcinogens, (III) non-genotoxic carcinogens, and (IV) non-genotoxic non-carcinogens.The primary objective was to determine whether ToxTracker could discriminate between DNA-reactive (genotoxic) substances and non-DNA-reactive (nongenotoxic) substances.The ToxTracker trial was solely focused on T A B L E 1 ToxTracker assay response criteria for each cell line.

Relative GFP induction
Relative cell survival Dose-response Call T A B L E 2 Matrix for deciding overall ToxTracker predictions from the results of three independent assays.

Calls in three experiments (in any order)
Overall call for reporter withor + S9 condition genotoxicity prediction, carcinogenicity was not considered as a criterion for compound selection.The compound list contained organic and inorganic, aromatic and aliphatic molecules to cover a broad chemical space.Also, a number of compounds were selected that require metabolic activation in the liver.A full list of the selected compounds and their genotoxicity classification can be found in Table S2.
All compounds were coded before distribution to the participating laboratories.

| ToxTracker prediction model
The criteria for a positive or negative result for each GFP-reporter, were previously established (Boisvert et al., 2023;Hendriks et al., 2016) and are defined in Table 1.GFP reporter inductions at concentrations inducing ≤75% cytotoxicity were acceptable for Tox-Tracker analysis.A compound was classified as genotoxic if at least a two-fold increase in expression of the Bscl2 and/or Rtkn reporter(s) was induced.Similarly, a two-fold increase is GFP signal was applied as a positive response threshold for the other four cell lines.This twofold increase as cut-off for a positive response is based on three times the standard deviation (SD) of fluorescence levels in solvent control cultures.The validity of this approach was recently confirmed using a bootstrapping analysis of more than a 1000 vehicle control treated cultures (Boisvert et al., 2023).Compounds were classified as nongenotoxic in ToxTracker if the induction levels of the genotoxicity reporters (Bscl2 and Rtkn) were less than 1.5-fold.In the case that fluorescent reporter activation exceeded 1.5-fold but remained below a two-fold increase, a borderline/weak positive score (+) was applied, but only in cases where a clear dose response was also evident.
Each experiment was classified as positive, weak positive, or negative and an overall call for each tested compound was made according to the ToxTracker prediction model (Table 2).In case no clear overall positive or negative classification could be made, compounds were classified as equivocal.If within an experiment, the positive controls met the data acceptance criteria, but results for a specific compound within that experiment did not (e.g., cytotoxicity of all tested concentrations >75%), the compound was classified as inconclusive.
For every compound, which was tested in each of three independent laboratories, the classifications were compared, and a final classification was made based on a weighted calculation (see Section 2.9).
For assessment of MoA, induction of the other ToxTracker reporters was assessed and compounds were classified accordingly.
The criteria for a positive and negative test result were identical for all cell lines and for an overall conclusion, test results from the three independent repeat experiments were weighted according to the prediction model in Table 2 as described above and below under Section 2.9.Using data for chemicals that passed acceptability criteria and quality control assessments in every individual laboratory, the overall sensitivity and specificity of ToxTracker to predict (in vivo) genotoxicity was determined using a weighted calculation (Tables 4   and 5).

| Within and between laboratory reproducibility
The WLR and BLR of the predictions were assessed using the concordance of the predictions for the genotoxic reporters (Bscl2 and/or Rtkn).For the WLR and BLR calculations, only compounds that passed acceptability criteria and quality control assessments in at least two laboratories (N = 59) were used.Compounds for which acceptable data were only available from one laboratory were excluded from the WLR and BLR calculations.
For the WLR, at least two valid experiments were required, and concordance was concluded when the final prediction based on the genotoxic reporters was the same in all valid experiments.For the assessment of the BLR, an overall final prediction was then T A B L E 3 Overall genotoxicity classifications of chemicals evaluated during phase 2 laboratory proficiency testing.The predictions were considered concordant among the laboratories when the final prediction was the same.The same approach was followed for the WLR and BLR assessment for oxidative stress, protein damage, and p53-associated cellular stress.

| Predictive performance
The predictive capacity of the assay was evaluated by comparing the overall final prediction results for genotoxicity, on the basis of the individual laboratory results, with the existing proposed classification based on available historical genotoxicity data.To avoid any bias in data analysis and performance calculations, all evaluations were performed on coded compounds.Therefore, 2 Â 2 contingency tables (genotoxic vs. non-genotoxic) were constructed and sensitivity (probability of predicting positive given the true state is genotoxic), specificity (probability of predicting negative given the true state is non-genotoxic) and accuracy were calculated.A weighted calculation approach was used for each chemical the overall final prediction for each laboratory was taken into account and a correction factor was applied so that all chemicals had the same weight.For example, a chemical (e.g., 1,2-dimethylhydrazine, see Table 4) that resulted two times in a positive prediction for two laboratories and one time in a negative prediction for the third laboratory, received a weight of 0.67 (2/3) for genotoxic and 0.33 (1/3) for non-genotoxic.For the sensitivity and specificity calculations, 1,2-dimethylhydrazine was included as 0.67% true positive and 0.33% false negative.All descriptive and statistical analyses were performed in R version 4.1.1(R development core team, 2021).

| RESULTS
The international interlaboratory ring trial for the ToxTracker assay was organized to (i) establish the transferability and reproducibility of the assay, (ii) evaluate the accuracy of ToxTracker to predict in vivo genotoxicity, and (iii) validate the application of the mechanistic information that is provided by ToxTracker.In vivo genotoxicity was defined as giving a positive in vivo result in the transgenic rodent (TGR), Pig-a gene mutation, MN, CA, and/or comet assays.While good intra-and inter-laboratory reproducibility is essential for regulatory adoption and subsequent applications of the assay, the accuracy of an T A B L E 5 Summary of pooled ToxTracker data from validation laboratories in the ring trial.
Code Compound standard battery of in vitro (Ames, MN, and CA) and in vivo (TGR, MN, and comet) genetox tests (Fujita et al., 2016;Kirkland et al., 2011Kirkland et al., , 2016;;Madia et al., 2020;Morita et al., 2015).Data from the interlaboratory trial, specifically from the Bscl2 and Rtkn genotoxicity reporters in ToxTracker, were compared to responses from this expert WoE genotoxicity assessment.For 25 of the 32 expected genotoxic compounds, there was full concordance between ToxTracker results across all laboratories (Table 4).For three compounds (1,2-dimethylhydrazine, benzo[a]pyrene, and 2,6-diaminotoluene), two laboratories reported a positive classification, but one laboratory classified the compounds as negative.Four expected genotoxic compounds were classified as non-genotoxic by all laboratories, which will be discussed below.From the 32 expected non-genotoxic compounds, none were overall classified as genotoxic (Table 4).Some laboratories suffered from a relative high number of inconclusive results, mostly because the data did not meet the acceptance criteria.In these cases, issues with accurate cell count in the flow cytometer resulted in unreliable cytotoxicity calculations was the main cause.
After applying data acceptance criteria, the genotoxicity predictions across all laboratories were determined using a weighted calculation (Table 4).Overall, the ToxTracker assay correctly identified genotoxic compounds with a sensitivity of 84.4% (27 of 32 expected positives) and a specificity of 92.1% (29.17 of 32 expected negatives).
The accuracy of identifying genotoxic compounds in this validation study was in line with previous ToxTracker validation reports that reported a 94% sensitivity and 95% specificity for the genotoxicity prediction (Hendriks et al., 2016).

| Genotoxic mode-of-action assessment in ToxTracker
The only activation of the Rtkn, but not the Bscl2 genotoxicity reporter was observed.This is often observed for compounds causing indirect genotoxic effects, including aneugens or oxidative stress-inducing compounds.For example, the major mechanistic pathway for the genotoxicity of 1,2-dibromoethane is through binding to the cellular antioxidant GSH (Cho & Guengerich, 2013).Also, for 8-hydroquinoline, the primary genotoxic MoA was reported to occur through induction of oxidative stress (Barilli et al., 2014).Accordingly, when comparing dose response relationships across ToxTracker reporters, 1,2-dibromoethane and 8-hydroquinoline predominantly activated the Srxn1 oxidative stress reporter, followed by an increase in the Rtkn reporter (Figure 3).This suggests that the clastogenic DNA lesions induced by these compounds were caused by oxidative stress.
Overall, all tested non-genotoxic compounds in the validation trial were correctly predicted as non-genotoxic in ToxTracker.In a weighted approach, none of these compounds induced the Bscl2 and Rtkn genotoxicity reporters (Table 6).Approximately 50% of the tested non-genotoxic compounds induced oxidative stress or protein unfolding, which have been associated with carcinogenicity (Hendriks et al., 2013;Hernández et al., 2009).For example, lead (II) acetate primarily activated the Srxn1 reporter and is known to induce DNA strand breaks in vitro secondarily due to oxidative stress (among other potential mechanisms), which over time can be tumorigenic in experimental models (NTP; Woźniak & Blasiak, 2003).Lead acetate is classified as class 2B probable human carcinogen by IARC (IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 2006).
Although there are some reports that show a small increase in chro- to be the primary mechanism of toxicity for lead (II) acetate and TBHQ (Figure 4).In the validation trial, when using the weighted approach, lead (II) acetate and TBHQ were classified as non-genotoxic, although one laboratory reported a positive Rtkn response (DNA strand breaks) after lead (II) acetate and TBHQ exposures.Induction of protein damage also has been associated with cytotoxic effects, which can lead to cytogenetic aberrations, primarily in vitro (Fowler et al., 2012a).In this  information regarding biological activity, whether it results in genotoxic or non-genotoxic effects, can be valuable to more accurately predict in vivo genotoxic potential of compounds.The MoA information can also be used to explain discrepancies between various in vitro and in vivo genotoxicity assays in a WoE approach for genotoxicity hazard assessment.
To further explore the contribution of the different reporter genes to the overall genotoxicity predictions, ToxTracker results were compared to outcomes of the standard in vitro and in vivo genotoxicity assays (Table 6).For this comparison, compounds were selected with MoAs involving either oxidative stress or unfolded protein response for which data are available in the public domain that allow a WoE-based classification.For all 11 compounds that were expected to induce oxidative stress, activation of the oxidative stress reporters (Srxn1/Blvrb) was observed by the laboratories.As expected, the four compounds that have been shown to induce the UPR indeed activated the Ddit3 reporter in ToxTracker.Importantly, many of compounds that induced oxidative stress or protein unfolding were predicted to be non-genotoxic in ToxTracker and were also negative in the standard in vivo genotoxicity assays.In contrast, many of these compounds were classified as genotoxic in at least one of the standard in vitro genotoxicity assays (Ames, MN, and CA).

| Within-laboratory and between-laboratory reproducibility
One of the primary objectives of the ToxTracker ring trial was to establish the transferability and reproducibility of the assay.All compounds were tested in all cell lines and evaluated for biomarker activation that included induction of DNA damage, oxidative stress, protein damage, and p53-associated cellular stress.For each of these biomarkers, the results from the triplicate independent experiments were analyzed, and positive, weak positive, or negative classifications for each of the reporters in each experiment were compared (Table S5).
The experiments were considered reproducible if a laboratory came to the same conclusion across the different independent experiments.
In cases where induction of a reporter was different between experiments, but combined results (ÀS9 and +S9 fraction together) resulted in similar calls across experiments, the results were indicated as reproducible.
The reproducibility of each biomarker classification (genotoxicity, oxidative stress, protein damage, and p53 activation) for every compound was determined within each laboratory among their independent experiments (Table 7).Overall, the WLR to predict genotoxicity varied between 73.1% and 96.7% with an average WLR of 85.5%.
The overall WLR (average across all reporters) varied from 97.5% for the laboratory with the best overall performance and 71.1% for the laboratory with the lowest reproducibility.Together, these WLR calculations indicate an excellent transferability of the ToxTracker assay.
To determine the BLR, classifications of the tested chemicals for their induction of DNA damage, oxidative stress, protein damage, and p53-associated cellular stress were compared among the participating laboratories.For each lab, an overall classification was made for each chemical and for every reporter in ToxTracker from the three replicate experiments.These overall classifications were compared between laboratories to establish the BLR.As an example, the ToxTracker results for etoposide from the three laboratories were summarized in Table S6.For the BLR calculation, the results for the Bscl2 and Rtkn reporters were combined since a positive call for either of these biomarkers would lead to a positive genotoxicity classification for a compound.The same combinatorial approach was used for the Srxn1 and Blvrb oxidative stress reporters.
The BLR was determined for 59 compounds in the ring trial for which acceptable data from at least two laboratories were available.
The BLR was calculated for the different toxicological endpoints in ToxTracker.The BLR herein for the seven validation laboratories varied between 83.1% for genotoxicity predictions and 71% for reproducing a primary oxidative stress signal (Table 8).The overall reproducibility of predicting protein damage and p53-associated cellular stress was 82.5% and 78.3%, respectively.

| DISCUSSION
The interlaboratory validation of the ToxTracker assay was performed according to OECD guidance document 34 when reasonably possible (OECD, 2005).The project was coordinated by the experts in the VMT, including selection and coding of the compounds, data acceptance, and data analysis.Acceptable results were analyzed for induction of genotoxicity as well the ability to provide information about their genotoxic and cytotoxic MoA.All test results were compiled into a large database (Appendix S2) and the VMT first verified if concurrent positive controls met data acceptance criteria, namely did the positive control compounds induce the fluorescent reporters above the minimum threshold for a positive response (as defined by the protocol) within acceptable cytotoxicity limits.In some cases, one of the reporters did not meet all acceptance criteria, for example, cytotoxicity of the compounds was higher than the cut-off or activation of one of the genotoxicity reporters did not meet the minimal genotoxicity reporter activation after exposure to the positive control compounds cisplatin or aflatoxin B1 in presence of liver S9 fraction (Figure S3).
For clarification, the minimal reporter induction levels for some of the ToxTracker reporters is higher than the two-fold threshold for a positive ToxTracker result as set in the prediction model (Tables 1 and   2).In those cases, the VMT assessed if there was sufficient evidence that: (1) all cell lines were performing correctly, (2) the positive control compounds were active, and (3) the liver S9 fraction metabolization was sufficient for use, to accept or reject the control data.In the data analysis database, these experiments were marked as "acceptable with restrictions".In case the positive controls in a certain experiment did not meet the acceptance criteria, all the results from the test compounds in that experiment were considered invalid and were removed from subsequent analysis and interpretation.

| Genotoxicity prediction
The balanced accuracy (the mean of sensitivity and specificity) of Tox-  (Corvi et al., 2008;Pfuhler et al., 2021).From the 32 expected genotoxic compounds in the ring trial, four were classified as non-genotoxic across laboratories; acrylonitrile, benzene, cadmium chloride, and dimethylnitrosamine (NDMA).Acrylonitrile was reported positive in the Ames mutation assay and showed mixed results in other in vitro genotoxicity assays (Whysner, Ross, et al., 1998).However, the in vivo MN and CA assays were negative and no DNA adducts were detected following in vivo exposure.Additionally, the carcinogenicity of acrylonitrile was suggested to be related to epigenetic mechanisms (Whysner, Steward, et al., 1998).
Benzene is a very potent human carcinogen and in vivo mutagen (Eastmond, 2000;Whysner, 2000).However, benzene is generally negative in the standard in vitro genotoxicity assays (Whysner et al., 2004).Some benzene metabolites have reportedly induced cytogenetic abnormalities (IARC, 2018), and have been implicated in oxidative stress (Badham et al., 2010;Rothman et al., 2021).However, in vitro metabolization by exogenous rat S9 liver fraction is designed for hazard assessment, and as such, favors formation of certain phase 1 metabolites over others as not all enzymes are induced to the same degree (Easterbrook et al., 2001).Hence, the lack of adequate metabolite formation is the likely cause for the negative ToxTracker result for benzene.Cadmium chloride induces MN and CA in vitro and in vivo (Fahmy & Aly, 2000) yet has not been reported to be DNA reactive (Stannard et al., 2023).Oxidative stress was induced across all participating laboratories while the genotoxicity reporters remained negative.Taken together, although genotoxic, cadmium chloride does not appear to be directly DNA reactive.Finally, NDMA was classified as non-genotoxic whereas nitrosamine compounds are generally potent mutagens in vivo.NDMA was positive in the Ames mutation assay as well as the in vitro MN at concentrations above 25 mM and requires 30% rat or hamster S9 fraction to be metabolically activated (Westphal et al., 2000).In the ToxTracker validation, the maximum concentration tested by the laboratories was set at 1 mg/mL, thereby limiting exposures to non-cytotoxic concentrations.NDMA has previously been classified as genotoxic in ToxTracker when tested up to 25 mM in the presence of 1% hamster liver S9 fraction (unpublished data).
The expected genotoxic compounds 1,2-dimethylhydrazine, 2,6-diaminotoluene, and benzo[a]pyrene were classified as nongenotoxic by one of the three laboratories.This was predominantly due to the complete independence of assay conduct after providing blinded compounds to participating laboratories.The negative result for 1,2-dimethylhydrazine appears to be related to differences in concentration selection.One laboratory selected an eight-fold lower concentration to test in ToxTracker than the other two laboratories (31.5 μM vs. 250 μM).Similarly, 2,6-diaminotoluene was tested at a 15-fold lower concentration in one of the laboratories, likely resulting in the negative ToxTracker data (Table 4).At first, the negative result for benzo[a]pyrene seemed to be caused inadequate by S9 metabolization as there was no cytotoxicity in the presence of liver S9 fraction.
However, the positive control Aflatoxin B1 did result in the expected ToxTracker reporter activation suggesting the S9 mix was suitable for use.Hence, it appears to be related to formulation preparation, but the etiology is unknown.
From the 32 expected non-genotoxic compounds that were included in the validation trial, none were classified overall as genotoxic, although two compounds were classified equivocal (Table 4).
For a number of compounds, a positive genotoxicity result was reported by one laboratory.In this laboratory, lead (II) acetate, TBHQ, vanillin, erythromycin stearate, and diclofenac activated the Rtkn-GFP reporter, which indicates the formation of DNA strand breaks, while the Bscl2-GFP reporter (indicator of bulky DNA lesions and fork collapse) remained negative.This pattern of reporter activation is typically observed for compounds that are indirect genotoxins, and the observed genotoxicity is often secondary to the induction of oxidative stress.Indeed, for lead (II) acetate and TBHQ, indirect genotoxicity due to oxidative stress has been reported (Liu et al., 2018;NTP).It is therefore interesting and relevant that for all of these Rtkn-GFP reporter positive compounds, the laboratories reported activation of the Srxn1 and Blvrb reporters (Table 5).P-nitrophenol activated both Bscl2 and Rtkn genotoxicity reporters in one laboratory, indicating direct DNA reactivity, but this result could not be confirmed by the other laboratories.P-nitrophenol is negative in the standard battery of in vitro genotoxicity assays, but there are various reports of positive CA and MN tests in vivo (Kirkland et al., 2016).Also, melamine, a non-genotoxic compound, was classified as genotoxic by one laboratory but was negative in the other two laboratories.
All compounds that were classified as non-genotoxic in Tox-Tracker were negative in the Ames bacterial mutation assay.However, a number of compounds that were predicted non-genotoxic in ToxTracker (no activation of Bscl/Rtkn reporters) have been reported to induce positive results in the in vitro MN or CA assay (Fowler et al., 2012b;Kirkland et al., 2016).Occasionally, positive results from these in vitro clastogenicity assays do not correctly predict in vivo genotoxicity.Various reasons for this discrepancy have been proposed, including misleading in vitro positive responses caused by high

≥2. 0
at 1 or more concentrations ≥0.25 Yes + ≥2.0 at 1 or more concentrations ≥0.25 No + <1.5 at all concentrations ≥0.25 but approaches 0.25 No À <1.5 at all concentrations ≥0.25 but limited by precipitation No <1.5 at all concentrations ≥0.25 but with limited toxicity and not limited by : positive overall ToxTracker classification.À: negative overall ToxTracker classification.(+): a weak ToxTracker response.(À): a negative ToxTracker classification at tested concentrations but could have been tested as higher concentrations.
ToxTracker results for etoposide from three independent laboratories demonstrating consistency in the pattern of reporter responses across independent labs despite differences in instrumentation, personnel, and strength of formulations.(a) Activation of the six different ToxTracker reporters following exposure to increasing concentration of the test compound.The Bscl2 and Rtkn reporters indicate genotoxicity, Srxn1, and Blvrb are induced by oxidative stress, Btg2 is associated with the p53 tumor suppressor response and Ddit3 is induced by protein misfolding.The dashed red line indicated the two-fold induction level as threshold for a positive ToxTracker classification.(b) Cytotoxicity of the compound is determined by relative cell count.
differential induction of the six different ToxTracker reporters that respond to the induction of DNA damage, oxidative stress, protein damage, and p53-associated cellular stress can provide insight into the MoA of genotoxic compounds.The secondary objective of this interlaboratory validation was to investigate whether the additional biological information provided by ToxTracker provides insight into the MoA of compounds that have either conflicting in vitro genotoxicity test results or discordant test results when moving from in vitro to in vivo genotoxicity tests.Insight into MoA is critical to not only understand DNA reactivity, but also to explain the differences between test results empirically and improve overall genotoxicity predictions (Allemang et al., 2021; Brandsma et al., 2020).For the genotoxic compounds with an expected direct DNA damaging MoA, for example, etoposide, mitomycin C, cisplatin, 5-fluoruracil, both Bscl2 and Rtkn genotoxicity reporters were activated (Table5).Compounds that require metabolic biotransformation activated the genotoxicity reporters only in the presence of liver S9 fraction, for example, cyclophosphamide, benzo[a]pyrene, and 7,12-dimethyl-benzanthracene.For most of the genotoxic compounds, activation of the p53 tumor suppressor-associated Btg2 reporter and occasionally the oxidative stress reporters Srxn1 and Blvrb were also observed.However, when evaluating the dose response curves for each marker across these genotoxic compounds, induction of the genotoxicity reporters is clearly the primary response (e.g., for etoposide shown in Figure2).For a number of compounds, F I G U R E 3 ToxTracker results for 1,2-dibromoethane and 8-hydroxyquinoline.(a) Activation of the six different ToxTracker reporters following exposure to increasing concentrations of test compound in absence of S9.The Bscl2 and Rtkn reporters indicate genotoxicity, Srxn1 and Blvrb are induced by oxidative stress, Btg2 is associated with the p53 tumor suppressor response and Ddit3 is induced by protein misfolding.The dashed red line indicated the two-fold induction level as threshold for a positive ToxTracker classification.(b) Cytotoxicity of the compounds is determined by relative cell count.
ToxTracker trial, tunicamycin and p-nitrophenol were classified as non-genotoxic but both compounds activated the Ddit3 reporter for protein unfolding.Tunicamycin and p-nitrophenol are non-genotoxic in vivo, but induce genotoxicity in vitro in the MN and/or CA assays(Bryce et al., 2014).Taken together, these examples show thatF I G U R E 4ToxTracker results for the non-genotoxic compounds lead acetate, tert-butyl hydroquinone (TBHQ), tunicamycin and p-nitrophenol.Note, all these compounds induce micronuclei in vitro, but are considered to be misleading or irrelevant positives.(a) Activation of the six different ToxTracker reporters following exposure to increasing concentrations of test compound.The Bscl2 and Rtkn reporters indicate genotoxicity, Srxn1 and Blvrb are induced by oxidative stress, Btg2 is associated with the p53 tumor suppressor response and Ddit3 is induced by protein misfolding.The dashed red line indicated the two-fold induction level as threshold for a positive ToxTracker classification.(b) Cytotoxicity of the compounds is determined by relative cell count.T A B L E 7 Within-laboratory reproducibility of overall ToxTracker calls across the seven participating laboratories.
Summary of 32 in vivo genotoxic and non-genotoxic compounds assayed in the ToxTracker interlaboratory validation trial.
Note: P: positive, GFP induction for at least one tested concentration was ≥2 for the Bscl2 and/or Rtkn cell lines.N: negative, GFP induction at all tested concentrations was <2 for the Bscl2 and Rtkn reporter cell lines.Abbreviations: AMP, ampicillin; ANT, o-anthranilic acid; BaP, benzo(a)pyrene; BLR, between lab reproducibility; CIS, cisplatin; EMS, ethyl methanesulfonate; MAN, mannitol.T A B L E 4 derived from the repeated experiments for each laboratory taking only conclusive results into account (negative, positive, or equivocal).
Comparison between ToxTracker and standard genotoxicity assays for compounds with an oxidative stress or protein reactive MoA.
Note:The table indicates the concordance between the calls from the three independent experiments for the genotoxicity reporters (Bscl2 and Rtkn), oxidative stress (Srxn1/Blvrb), protein damage (Ddit3), and p53 activation (Btg2).