Understanding the Development, Standardization, and Validation Process of Alternative In Vitro Test Methods for Regulatory Approval from a Researcher Perspective

Due to economic, practical, ethical, and scientific reasons, researchers, among others, are pushing for alternative in vitro test methods to replace or reduce existing animal experiments. In order for these tests to be more broadly used by the industrial sector and regulatory bodies, orchestrated efforts are required to show the robustness and reliability of in vitro methods, which can accelerate the use for early screening testing. Another way of increasing the use of alternatives is to coordinate validation studies, that is, multi-laboratory trials, and to gain regulatory approval and instatement as test guidelines or standard method. However, awareness of the exact standardization, validation, and approval process has been a major obstacle for many researchers. Herein, the process has been broken down into three main phases: i) test method development; ii) intra- and inter-laboratory validation; and iii) regulatory acceptance. This general


Introduction
There is a need to provide alternative test strategies at all stages of development in pharmaceutical, cosmetic, and chemical industries for i) pre-screening of candidate molecules to provide information about efficacy and unwanted toxicity effects, [1] and ii) to generate data to support risk assessment. [2] Therefore, intensified efforts have been made during the last years toward systematic development and evaluation of relevant and more robust non-animal models.
The first activities in alternative testing are going back to the "Three Rs (3R principle, i.e. replacement, reduction, refinement)," which was the basis for the 1958 book The Principles of Humane Experimental Technique by William Russell and Rex Burch. [3] The 3Rs principle "advocates the search for the 1) replacement of animals with non-living models; 2) reduction in the use of animals; and 3) refinement of animal use practices." A concrete example is the replacement of the Draize in vivo rabbit eye and skin irritation assay to test chemicals or cosmetic products. The Draize test has been introduced in 1940 and is an accepted test method by the Organization for Economic Co-operation and Development (OECD). [4,5] Several alternative methods have been developed over years to replace this test. [6,7] In 2015, the OECD 439 has been published and provides since then a procedure allowing the identification of skin irritants based on reconstructed human epidermis combined with several validated test methods to determine the skin irritancy of test substances. [8,9] Alternative method development is driven by the desire to reduce high attrition rates during product development and the need to better understand mechanisms of action and toxicity. Increasing ethical concerns about the use of animals for safety testing that have already partly translated into more stringent regulatory frameworks are other key drivers that foster the development of alternative methods. [10,11] The variety of invertebrate animal models, [12] in silico, [13] and in vitro human-based methods is huge. The approaches range from physiologically based pharmacokinetic modeling to 2D cell cultures to more complex 3D cell cultures and co-cultures including organ-on-a-chip models. [14][15][16] The progress in these fields has resulted in improved in vitro-in vivo extrapolation (IVIVE) outcomes contributing toward a significant reduction in the number of laboratory animals in the field of pharmacodynamics and pharmacokinetics assessment [17] as well as in hazard and risk research. [18] In addition, the concept of integrated approaches to testing and assessment (IATA) has been introduced following more flexible, non-formalized judgment based approaches (e.g., grouping and read-across) to more structured, prescriptive, rule based approaches (e.g., integrated testing strategy [ITS]). [19] IATA can include a combination of complementary approaches (e.g., in vitro, ex vivo, in silico) assessing one endpoint for hazard and risk assessment. [20,21] Another aspect important for regulatory acceptance of alternative methods is human relevance. In 2012, the OECD launched a new program on the development of adverse outcome pathways (AOP). [22] An AOP is an analytical construct describing a sequential chain of causally linked events at different levels of biological organization that lead to an adverse health and can be applied as a framework to develop IATA. AOPs are the central element of a toxicological knowledge framework being built to support chemical risk assessment based on mechanistic reasoning. This framework takes human epidemiology and human data from in vitro experiments in consideration to describe causally connected key events (KEs) resulting in a specific adverse outcome (AO). [23] For some KEs in vivo data from animal experiments are also considered, and it is important to note that some AOPs are species-specific. The organizing of mechanistic information helps to identify causally connected KEs and to choose the cell systems and assays best suited for investigating the AO. By applying this concept, we recently presented a robust human alveolar tissue model to predict inflammation and fibrosis upon exposure to carbon nanotubes. [24] All the efforts described have not yet fully eliminate animal testing within research and industries and, in our opinion, it will need much more work to ban the use of animals, and in some areas, this will even not be possible. An important initiative was announced in 2019 by the U.S. Environmental Protection Agency (EPA) in Washington, D.C. to stop conduction or funding studies on mammals by 2035 [25] and with this strong commitment, the way is paved toward non-animal tests in the future. Also, recently, the European Commission (EC) published a chemicals strategy for sustainability on 14 October 2020 with the aim to accelerate innovation for safe and sustainable chemicals. [26] The 3Rs principles have also been incorporated in legislation, such as in Directive 2010/63/EU, articles 4 and 13 which states that an animal test must not be conducted if an alternative method is available, [27] the European Union (EU) Cosmetics Regulation 1223/2009 [28] and registration, evaluation, authorisation, and restriction of chemicals (REACH). [29] Putting the 3Rs principle further into action, a partnership called the European Partnership to Promote Alternative Approaches to Animal Testing (EPAA) was founded in 2005. [30] This is a collaboration between the EC, the European trade association, and companies from different sectors to facilitate the collaboration between partners and to promote results via various activities and events. There are many other initiatives and organizations that work in a similar way to collaborate with multiple sectors to address solutions for scientific challenges, which are valuable to academia, industry, non-governmental organizations as well as to the regulatory community and society in general. This includes for instance national 3R centers, European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC), [31] or the Health and Environmental Sciences Institute (HESI). [32] As an example, the Agricultural Chemical Safety Assessment (ACSA) Technical Committee of HESI has proposed a tiered testing approach for assessing the safety of crop protection chemicals. [33] Developments such as these are what will help lead the movement toward greater use of alternatives in the future.
Many research laboratories in science and industry are working to develop and optimize alternative in vitro testing models for hazard assessment of novel chemicals and drugs toward the 3R's at an early stage of chemical/drug development allowing to identify toxicity and mechanisms of action of the compound. These alternative tests have the potential for highthroughput screening especially as part of drug discovery, [34] to reduce costs and animal use across the industry. In addition, such tests can be more broadly used in basic research and for education; thus they have the potential to replace many animal experiments in research laboratories without formal validation. On the other side, many institutions such as industry will be wary of a new test if it is not first trialed, analyzed, standardized, validated, and approved by others and finally accepted by the regulator.
Approval processes are described and process description guidelines exist, [35] but we need to understand that validation is complex, time and resource consuming, and can take up to several years to get an official, for example, OECD acceptance and test guideline. Academic researchers are usually not trained in regulatory toxicology. Many stakeholders are involved in the validation process, and researchers usually provide only the initial concept, for which an understanding of the entire process is important.
Through literature review and interviews with researchers and experts at universities in Switzerland, at a biotech company providing alternative in vitro solutions to animal experimentation, at non-governmental organizations (NGO's), and at federal offices in Switzerland (some of the interviewed partners are listed as authors or in the acknowledgment, others preferred to remain anonymous), we gained an understanding of the different perspectives on the challenges faced during this process and compiled this information needed for the approval process. As the focus of our own work is on alternative human cell models, we present the procedure for cell culture-based approaches and summarize the information that is relevant for the test development. We are aware that also non-testing approaches such as in silico methods based on quantitative and qualitative structure-activity relationships (QSARs) have been introduced to model pharmacodynamics, pharmacokinetics, and toxicological hypothesis. [13] Indeed, in silico models such as ToxCast from U.S. EPA [36] are incorporated into REACH, [37] the chemical legislation in the European Union (EU). The principles for QSAR validation are stated in the OECD guideline "Guidance document on the validation of (quantitative) structure-activity relationships ((Q) SAR) models" [38] and will not further be explored herein.

Regulatory and Validation Bodies
When new discoveries are made and new alternative testing methods are developed, there are different ways in which they can be assessed and introduced into widespread use. The first and fastest method is basic peer review, this is the quickest and easiest way to have work reviewed and implemented by other researchers. [39] But validation of a test and approval by an official government agency or international organization would help encourage the use of alternative methods in preference over animal testing. One example is the directive 2010/63/EU, which has regulated this activity in Europe since 2013. [40] It states that an animal test must not be conducted if an alternative method is available. However, the validation and approval are often a slower process than basic peer review and can take several years to decades, and it has been shown that it takes time to implement the new measures in practice as a delay in formal transposition in many member states was reported. [41] In addition, we do not know how this directive is enforced in different countries and it also might depend on geography, sector, and topic. On the other side, official approval usually leads to more widespread use and can set the standard that all institutions will use as the validation process supports the scientific validity of an alternative tests. [42] When it comes to alternatives to animal testing both peer reviews, validation, and official approval are currently in use, depending on the type of test and whether the test is intended to be shared and used as a new standard. This section will cover exemplary regulatory and validation bodies as well as specific concepts around the world relevant to the current regulations of animal testing and alternatives and how they interact (summarized in Figure 1), which will help to understand the complete process described in Section 3.

International Organizations that Develop and Implement Guidelines
We start with an overview of international organizations and regulatory bodies that are involved in the development of guidelines and/or implementation of approved test methods, as understanding their requirements and decision-making is important for the validation process.

The OECD
The OECD is an international organization with 37 member countries whose goal is to "promote policies that will improve the economic and social well-being of people around the world" and is the main regulatory institution worldwide. [43] Policies issued by the OECD deal with everything from taxation and labor, to chemical testing, and research. The OECD covers a broad range of regulations, one of which being the approval and standardization of scientific testing methods as OECD test guidelines (TGs). OECD TGs are internationally accepted standard methods for assessing the effects of chemicals on human health and the environment and is currently a collection of about 150 of the relevant internationally agreed testing methods to assess the safety of chemicals. [44] The TGs are being used by professionals in the industry, academic researchers, and governmental bodies involved in testing substances such as industrial chemicals, pesticides, and cosmetics. OECD TGs can be developed de novo or can be based on existing international and national standards, guidelines, and guidance material (e.g., International Organization for Standardization [ISO], American Society for Testing and Materials [ASTM], EU and/ or member countries' documents). Also, existing TGs are subjected to re-evaluation to ensure that they match the scientific state of the art. The approval and update of TGs are overseen by the Working Group of National Coordinators (NCs) of the TGs program (WNT). [45] The WNT decides if the existing criteria for validation of a new test are met and follows existing methods, and then the test is formally validated and published as an OECD TG. Depending on the scientific progress and the regulatory needs of OECD member countries, there is a periodical demand to develop new OECD TGs or revise existing TGs. This usually reflects scientific progress in the area of substances' hazard identification or new knowledge in alternative method development and is often related to the animal welfare aspects and the cost-effectiveness improvement. Proposals to develop or update TGs can be made by the NCs of the TG Program, Business and Industry Advisory Committee to the OECD, non-governmental organizations, scientific societies, and the Secretariat. Alternatively, a proposal may come from a workshop or an expert meeting.
Most relevant to alternative methods is the OECD's capacity to standardize testing methods as guidelines that its member countries must accept and implement. To support this, the OECD has set a standard for Good Laboratory Practice (GLP) for researchers to follow that allows for the Mutual Acceptance of Data (MAD) between members. [46] Recently, a guidance document on Good In Vitro Method Practices (GIVIMP) for the development and implementation of in vitro methods for regulatory use in human safety assessment was published by the OECD. [47] In addition to the principles of Good Cell Culture Practice (GCCP) that provides a minimal set of requirements for documentation of cell culture practice for in vitro methods, [48] the GIVIMP includes detailed and specific principles of best practice for the handling and management of cell and tissue culture systems. Moreover, guidance for the laboratory environment in which test data are generated and recorded is included. The aim of this comprehensive guidance document is "to reduce the uncertainties in cell and tissue-based in vitro method derived predictions by applying all necessary good scientific, technical, and quality practices from in vitro method development to in vitro method implementation for regulatory use. This guidance document also applies to in vitro methods already accepted by the OECD." Each OECD member country has an NC who represents that country's regulatory body within the OECD. The function of the NC is to review project proposals from his/her country presented for validation and standardization and select which projects to push forward through the approval process outlined by the OECD TGs Program. He or she coordinates all relevant aspects of the process, including having the new method tested by validation bodies (described in Section 2.2) and advocating for the method to become a new guideline. For researchers developing a method to become an OECD TG, it is important to get in contact with the NC early on during the development phase. Based on our own experience the optimal time point to have NC's or other validation and regulatory bodies involved in the development is when interlaboratory comparisons, that is, round-robin experiments in different laboratories, are planned. Even if a new method is not yet fully developed, the process can already be launched and the NC's office can provide important information and recommendations that can importantly facilitate and shorten the procedure. Whether tests are finally accepted is determined by key factors such as relevance, importance, and feasibility. In addition, based on our discussion with industry partners participating in EU projects preference is given to high throughput screening approaches, which are fast, simple, and cheap, or cell models, which are easy to reproduce. These aspects are up to the discretion of the NC and may depend on the priorities of the member country; the process could therefore be politically and not only scientifically motivated. It is also important to note that the NC does not only work with alternative methods; however, the percentage of accepted tests that involve alternative methods rose from about 13% in 2007 to about 48% in 2014. [49]

Other Organizations, Institutions, and Regulatory Bodies
There are many other possibilities for the implementation of in vitro test methods via various organizations, institutions, and regulatory bodies and we only describe some examples. Briefly, the processes to submit a new guideline are more or less the same and include the submission of a draft guideline to a committee and reiterations in working groups before the final decision for implementation is made.
One example is to develop an ISO standard in form of, for example, technical reports or guidelines to provide standardized protocols to improve manufacturing and safety in various fields. ISO is an independent, non-governmental organization with members from 165 countries. [50] The standards are developed by the ISO technical committees and subcommittees and the process is subdivided into six stages, which are the proposal stage, preparatory stage, committee stage, enquiry stage, approval stage, and publication stage. [51] A proposal, which is a response to a request from industry or other stakeholders such as consumer groups, requires that a 2/3 majority of the national bodies voting approve it and that a minimum of fivemember bodies who voted in favor of it indicate that they are willing to participate actively in the work. [52] From the first proposal to final publication, the development of a standard usually takes about 3 years.
For the quality control of medicinal products to be marketed in Europe the European pharmacopoeia (Ph. Eur.) [53] provides a legally recognized list of accepted alternative methods as monographs. According to the EU, under Article 13(1) of the Directive 2010/63 [27] Ph. Eur. methods are recognized under EU legislation, and therefore, any alternative methods listed in these monographs must be used instead of the animal test they replace. A technical guide is available for authors of monographs on the Ph. Eur. website [54] and new monographs have to be submitted to the national pharmacopoeia authority (NPA) and the final decision on whether the monograph will be added to the Ph. Eur. work program will be taken by the Ph. Eur. Commission. The process to develop a new monograph in the pharmacopoeia typically takes 2-3 years. [55] One interesting development is the replacement of animal testing for batch testing of botulinum toxin products that is used in numerous medical conditions. [56] Several monographs listed in Ph. Eur. require the use of animal use to assess efficacy and toxicity, however, some monographs were updated in the last years to encourage the validation of alternative methods. [57] The International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) was created in 1990 as a trilateral program between Europe, Japan, and the United States. Their mission is "in bringing together the regulatory authorities and pharmaceutical industry to discuss scientific and technical aspects of drug registration", [58] which includes the aim to reduce duplication of animal testing and implementation of alternative test methods. [59,60] Over the years, additional regulatory authorities and regional harmonization initiatives have joined ICH as observers. The time required for the development and implementation of new harmonized guidelines depends on the guideline and the complexity of the topic and varies between months to years. [61] Then, the International Cooperation on Harmonization of Technical Requirements for Registration of Veterinary Medicinal Products (VICH) is a similar program to harmonize technical requirements for veterinary product registrations [59] and they committed to minimize the use of test animals and costs of product development in 2007. [62]

Validation Bodies
The basic principle of the validation process is to assess that the alternative method is fit for its intended use. Validation builds on confidence to reach acceptance and by definition from the OECD guidance document 34 is "the process by which the reliability and relevance of a particular approach, method, process or assessment is established for a defined purpose." [63] It is an independent assessment and generally consists of the generation, collection, and evaluation of data to establish scientific evidence that the alternative method is capable of consistently producing data that is reliable (reproducible) and relevant for the intended purpose. [39,64] The role of national and international organizations worldwide, such as OECD related working groups, European Center for the Validation of Alternative Methods (ECVAM), Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM), Japanese Center for the Validation of Alternative Methods (JaCVAM), Health Canada (HC), and Canadian Centre for Alternatives to Animal Methods (CCAAM/ CaCVAM), Korean Centre for the Validation of Alternative Methods (KoCVAM), etc., is to contribute to both an effective validation process and to ensure the quality of the validated alternative method. [47] As one example, ECVAM serves as an institution that promotes and validates alternative testing methods. Most notably, it is the main center for external validation of alternative methods used by the OECD. The EU Reference Laboratory for alternatives to animal testing (EURL ECVAM) is an integral part of the Joint Research Centre (JRC), the science and knowledge service of the EC, and is located at the JRC site in Ispra, Italy. [65] The mandate of EURL ECVAM is specified in the EU legislation on the protection of animals used for scientific purposes and includes a number of duties to advance the 3R's of animal procedures and specializes in the independent evaluation of the relevance and reliability of tests used for assessing medicines, vaccines, medical devices, cosmetics, and household and agricultural products. When validating a new method, it is necessary that multiple laboratories test for transferability and variation. To accomplish this, ECVAM has organized the EU Network of Laboratories for the Validation of Alternative Methods (EU-NETVAL). EU-NETVAL consists of 37 laboratories that have been carefully selected and follow GLP practices. These laboratories assist ECVAM in the assessment of new alternative testing methods. [66] The test method submitters are involved at all relevant stages and have the possibility to comment on the reviewer feedback prior to publication of the method.
The US has a similar organization, ICCVAM, which has been established as a permanent committee of the National Institute of Environmental Health Sciences (NIEHS) under the National Toxicology Program's Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM). [67] ICCVAM is composed of representatives from regulatory and research agencies and serves as an advisory and peer-review body. [68] Similar to ECVAM, ICCVAM facilitates the development of test methods that replace, reduce, and refine the use of animals in testing and both organizations also coordinate their activities. Collaborations and processes have been put in place to coordinate the international adoption of new test methods recommended by ECVAM and ICCVAM. These collaborations involve the sharing of expertise and data for test-method workshops and independent scientific peer reviews, and the adoption of processes to expedite the consideration of test methods already reviewed by the other organization. [69]

Distributors
Distribution of protocols for alternative models is manifold and can for instance be done via publication of methods in journals or by uploading protocols or Standard Operating Procedure(s) (SOPs) on dedicated websites. For validated and approved test models, guidelines can be made available from, for example, ISO, OECD, or ECVAM. In addition, many in vitro test suppliers provide primary cells or cell lines and test kits (as an example consult the following website [70] ) and improved tissue models are commercially available from a variety of suppliers such as for reconstructed skin models. [71]

The General Process from Test Method Development to Validation to Regulatory Approval
Validation is the scientific process to determine the relevance and reliability of a method for a specific purpose. [72] While it is important to understand the process behind validation, it is necessary to first understand who is involved in moving the process forward. This section will go in-depth into the individual steps within the regulatory process to make clear its intricacies. When considering the regulatory process for alternatives to animal experimentation, it is helpful to split it up into three main steps as also summarized in

Development of an Alternative Method
Prior to the development of the alternative test, it is important to first consider relevant needs originated from researchers or industry developers. A test in high demand has a greater likelihood of gaining eventual OECD approval, especially in a shorter time frame, which means that the requirements are the same but more resources can be put in. If the test is not relevant, other tests with a more pressing need will be prioritized. It is recommended to discuss the early stage of the method development with national and international experts from the field and similar research interests. [73] This can be done in form of a scientific advisory board where members have to sign a nondisclosure agreement; this will also accelerate the possibility to transfer the protocols for inter-lab testing to laboratories with the required infrastructure. In Europe, the EURL ECVAM network of national regulators can be consulted for the preliminary assessment of regulatory relevance. Consultants with expertise in 3Rs validation and independent (bio)statisticians can also help researchers to understand the requirements needed for the validation process and in planning (pre)validation exercises.
There are multiple design requirements to be considered during the primary development stage of an alternative testing method. A set of criteria to define the readiness of test methods for hazard evaluation can support the assay development in a regulatory context as summarized at a stakeholder workshop with scientists from academia, industry, and regulatory authorities. [2] Another important document to support test developers is the GIVIMP guidance document. [47] There are three broad parameters by which an alternative method should be tested: relevance, accuracy, and reproducibility. According to the OECD glossary from the GIVIMP guidance document, the term "relevance" describes whether a procedure is meaningful and useful for a particular purpose, and "accuracy" refers to the closeness of a measured value to a standard or known value. "Reproducibility" means that the outcomes of the same test yield the same data when conducted in separate experiments within the same as well as in different laboratories. These are key aspects of an alternative test with the goal of validation and approval as an OECD TG. Importantly, if the test cannot be used by other labs to test the same endpoint, then the alternative testing method will likely not pass external validation, in addition, intellectual property (IP) protection on certain elements of a test would be a major issue since all OECD tests should be readily accessible, however, IP protected methods might still be accepted by regulators. One example is the U.S. Food and Drug Administration (FDA) approved in vitro cell-based assay from Allergan, Inc. for use in the stability and potency testing of Botox (onabotulinum-toxinA) and Botox cosmetics. [74] To ensure this, internal verification (or conducting a prevalidation process of the alternative method) at an early step of the test development is vital. At this point, the test method is also assessed for "repeatability" to assess the closeness of results between a series of measurements, "selectivity" which is defined as the ability to identify an analyte in the presence of other substances, that is, interference, "sensitivity" to assess the capacity of the method to discriminate small differences, and "stability" to verify the stability of test items under storage and in test conditions. [47] It is imperative that the developing method is robust and yields the same results for specified compounds and endpoints when conducted independently within the same laboratory (within-laboratory, i.e., intralaboratory assessment) and in different laboratories (around three or more independent laboratories, i.e., interlaboratory assessment). It is equally important to compare the alternative methods to endpoints generated by their animal testing counterparts, that is, testing a chemical with known toxicity with the alternative method to verify its predictivity and accuracy. [75] If such an approach is used, it is important to take into account the limitations of the animal test and the aim should be to improve predictivity and to improve human relevance. Therefore, it is also recommended to use data from human chemical exposure studies to avoid uncertainties and possible inherent interspecies extrapolation. [76] However, there is no set process for a researcher to use during this optimization phase, and it is possible to customize this process on an individual basis. The recommended process referred to as pre-validation that starts after the method development, is the check for optimization of alternative methods to maximize accuracy and reliability against the same seven modules that the validation management group uses in their own evaluation (described in Section 3.2). [72]  Finally, the test method developer is responsible for providing a clearly written and well-documented description of the alternative method and related SOP(s), taking into consideration all aspects described in GIVIMP.

Validation of the Method
Only when an SOP is successfully completed, based on the criteria described above, the alternative test should be elevated to organizations such as, for example, EURL ECVAM or ICCVAM. Validation within EU-NETVAL follows the procedures and testing practices in accordance with OECD TGs using principles such as GLP, which help to ensure the procedures are harmonized. This ensures that the data generated in any OECD member country using these principles will be accepted under the MAD "for assessment purposes and other uses relating to the protection of human health and the environment" by reducing duplicate testing. [77] Following the criteria of the MAD system is critical for an alternative method to be considered as a validated testing method and eventually as an OECD TG.
A validation study begins with a sponsor, that is, an entity, which commissions, supports, and/or submits a non-clinical health or environmental safety study and also normally finances the study. [78] This in practice means that the sponsor drives and assigns a validation manager/management team to design and carry the validation out. This management team can then delegate various responsibilities to task groups. The management team also oversees the lead laboratory for the study. The lead laboratory can be a highly qualified test facility for instance from the EU-NETVAL network but also any adequately equipped laboratory with trained personnel can be assigned and is in charge of data collection and instructing the other participating labs on the SOP [63,79] (Figure 3).
There are several potential candidates to be a sponsor of a validation study such as international bodies, government entities, or validation organizations for alternative methods (ECVAM, ICCVAM), national organizations, other independent organizations such as PETA International Science Consortium Ltd. (PISC), or commercial sponsors. If the validation study is organized by the OECD, the sponsor may be an OECD expert group, task force, working group/party whose members are nominated by the governments of the respective countries. [63] These bodies oversee the validation study, which is managed by a validation manager/management team.
In an attempt to make the validation process more efficient, ECVAM has developed a modular approach to alternative test validation. [80] This approach breaks down the main components of validation into seven modules and it is important to mention that the modules are not sequential and can be performed independently, however, the test definition is usually the first module as this impacts all other modules. The purpose of these modules is to define what data is needed for independent validation and peer review (Figure 4).
By having this defined list of requirements, researchers can develop tests while keeping the data needed for validation in mind, thus streamlining the process later on. Once test developers have finished development and have taken into account all of the modules, the independent validation process can begin. Briefly, the seven modules are as follows (for a more detailed description the reader is referred to refs. [63,80]):

Test Definition:
The definition states the underlying reasons (purpose) for the development of the test, as well as defines any necessary information needed to properly conduct it. The specific scientific purpose of the test must be stated here. Additionally, the test definition must include a clearly defined protocol that complies with GLP and GCCP to promote and improve standardization, quality assurance, and reporting of the work. Such guidance is needed to assure standard practices and conditions for the comparison of data between laboratories and experimentation performed at different times. [81][82][83] The definition should include any necessary SOPs to allow for replication and finally must define specific endpoints and measurements, specify how to derive, express, and interpret results; as well as include a set of adequate controls. For instance, reference chemicals can be used to evaluate the performance of a specific assay. [84] The OECD published a method documentation for the purpose of safety assessment, the Guidance Document 211 (GD211), [85] to guide test developers. As this is targeting mainly regulators, a detailed toxicity test method template (ToxTemp) was proposed by Krebs and colleagues [86] as a guide about the details required and how individual questions should be answered for researchers. 2. Intra-Laboratory Repeatability and Reproducibility: The test must be proven to be repeatable, that is, repeated measurements under identical conditions, and reproducibly within the same laboratory, that is, usually done with different operators over time within the same laboratory setup and done at the development laboratory. Any variation in results within the same lab must be addressed. 3. Inter-Laboratory Transferability: The transferability of a test plays a crucial role in its robustness. The goal is to show that the test can successfully be repeated in a laboratory other than the one that has developed the test or assisted in its optimization, this step is also described as pre-validation. [42] The data gathered here will provide an estimation of the training required for a naive laboratory (lab without prior experience with the specific type of test) to reproduce the test, as well as identify additional sources of intra-and inter-laboratory variation. 4. Inter-Laboratory Reproducibility: During this process, external laboratories carry out the test method and test it against a large assortment of substances. This is usually organized by ECVAM and involves three or four well-trained laboratories. The assessment of inter-laboratory variation can also be done with a more limited selection of test substances that still cover all toxic effects that the test can demonstrate. This allows the predictive capacity to be tested against a large number of substances in a single lab to save time and money. 5. Predictive Capacity: The predictive capacity of a test is a demonstration of how well the test can predict the reference standard and can generally be referred to as the accuracy of the test. The reference standard being tested against can be defined from an existing in vivo study or a human effect where the reference has induced a specific outcome. The predictive capacity is influenced by the quality of the reference standard and the range and number of substances tested. The current ECVAM process states that predictive capacity must be tested in at least three laboratories; however, if interlaboratory variability has already been determined to be of acceptable levels, testing of predictive capacity may be done at just one laboratory. 6. Applicability Domain: The applicability domain of a test must clearly describe the particular purposes for which the test can be applied. This includes any toxicological endpoints, chemical classes, test materials, and physicochemical properties or products that might be assessed with the test. Any changes to the applicability domain may require additional peer review. 7. Performance Standards: At the end of the validation process, the performance standards of the test are evaluated. This is a compilation of essential test method components, reference substances, and accuracy and reliability values that can be used to demonstrate equivalence in performance between future tests and the current test. This can allow for faster approval of similar tests or changes to the current test in the future.
Once the validation study has been completed, and the validation manager/management team is satisfied that all necessary data has been collected and evaluated, a test method may move on to the independent peer-review stage. There are two options to conduct this step. First, the validation study sponsor may choose to organize a panel of independent peer reviewers to assess the results of the validation study. This may be the preferred route for TGs that are being updated and hence the method is replacing an existing one. Second, if the test method is being proposed as a new TG, the validation study sponsor can still decide to organize peer review before submitting the test to the OECD, or it can submit the test without peer review and the OECD will organize the review panel. If peer review is completed prior to the test method being submitted for approval as a TG, a complete report of the independent panel must be provided, including detailed reasoning behind any conclusions and recommendations, as well as any comments made on the test method. Otherwise, the OECD will discuss and appoint a responsible party to organize any necessary peer review.
When selecting a panel of experts for peer review, there are certain criteria to follow to ensure accurate and unbiased review. First, the experts must demonstrate expertise in one or more of the scientific fields relevant to the TG. Second, it is important that some of the peer reviewers have experience in the development, conduct, and evaluation of validation studies for toxicology tests. In addition, one or more reviewers of the panel should also have an understanding of animal welfare issues and the 3R principles. Finally, it is crucial that all peer reviewers are independent and not subject to any conflicts of interest. Once put together, it is up to the peer review panel to review all information from the validation process and come to a conclusion as to whether or not the TG has thoroughly satisfied its intended purpose and goals. The review process should be open and transparent and may allow for public comment. At the end of this review process, the panel should provide a report containing an assessment of the usefulness and limitations of the TG.
After peer review has been completed it is up to the validation study sponsor to determine, based on the panel's review, if the TG can be accepted as validated for its intended purpose and if it can be recommended for the next level, that is, regulatory acceptance.

Regulatory Acceptance
Once all seven modules in the validation process have been satisfactorily completed as judged by the validation manager/ management team, and the TG has passed peer review, the method can be determined to be officially validated. The last step in the process is regulatory acceptance. During this phase, the alternative method and data generated by third party testing will be elevated, for example, to the OECD or any other regulatory body for review to be potentially instated as an approved test method. This step is dealt with on a case-by-case basis. Thus, the potential need for an alternative test, along with its robustness and transferability, plays a role in the speed of the process. For the OECD TGs, consistent communication with the NC of the TG program is also important in expediting this process, as he or she works directly with the OECD and third party institutions in determining these factors. It is important to add that the regulatory acceptance procedures can vary among but also within countries and adherence to the OECD guidance document 34 on the Validation and international acceptance of new or updated test methods for hazard assessment (OECD Series on Testing and Assessment, Number 34, 2005 [87] ) is recommended to increase acceptance level. After submission to the OECD, all NCs from the member states must vote unanimously. That is to say, if only one country representative does not agree on the acceptance of the alternative method, then the document will be sent back to the working group for revision or it will not become an OECD TG. [63] This final process can be lengthy and can take up to or sometimes over a year since all representatives from the member countries can comment on the TG and also ask for changes. It also could be that the science for a specific test has moved on during this time and more adaptations are needed. Once the decision is made, the new or revised TG will join a growing faction of internationally accepted testing methods as a standard for assessing the potential health effects of chemicals and/or drugs.

Paths to the End-User and Implementation of TGs
The end-user is the entity, for example, industry or small and medium-sized enterprises, who finally have to use the validated and regulatory accepted method for the approval of a new product. It is also possible that the end-user has been involved in the design of the new or adapted TG. There are two main paths that can be followed when taking an alternative TG from the development phase to the end-user.
The first path and the one described in the previous sections is the regulatory approval procedure through organizations such as the OECD. By going through an international organization, with many member countries, the potential number of end-user increases significantly. When an OECD TG is internationally accepted as a standard method, it is used by regulatory authorities and testing laboratories for registration, evaluation, and approval of known and newly developed chemicals.
In principle, there is a second potential path to end users. Test developers do not have to submit their alternative methods to international regulatory agencies and can go straight to distributors after validation. Distribution of protocols for alternative models is, as already described earlier, manifold such as via publication of methods in journals or by uploading protocols or SOPs on dedicated websites. The interest of test developers is mainly driven by the opportunity to create IP and, thus, generate possible revenues or those methods can be shared with others for future research. The drawback of this procedure is a limited end-user number, as every end-user has to determine whether the test is acceptable and valid for their purposes or not; each country has their own regulatory agencies and practices, and it is not uncommon to see large discrepancies in their scientific regulatory policies. Thus, the end-users must decide if they want to use the test without the backing of an internationally accepted organization like the OECD.

Alternative Methods Accepted for Regulatory Toxicology-Some Examples
The number of pre-validated and validated methods to partially or fully replace animal testing, for example, genotoxicity, safety testing in cosmetic products, has increased during the last years and an overview of validated and accepted methods is provided by Kandárová and Letašiová [42] or can be accessed from the nonanimal methods for toxicity testing AltTox website. [88] In addition, the collection of about 150 of the relevant internationally agreed testing methods are listed on the OECD website. [89] The EURL ECVAM tracking system for alternative methods toward regulatory acceptance (TSAR) also lists alternative test methods that have been submitted to EURL ECVAM. [90] The aim of this tracking system is to provide in a transparent manner the proposals of alternative methods for validation through to its final adoption by inclusion into the regulatory framework (e.g., EU legislation, OECD guidelines or guidance documents, European Pharmacopoeia, ICH, or ISO). This platform also helps to understand the test method developers why and at which stage methods failed.
Most of the approved documents relevant to alternative methods were developed to support the revised skin and eye irritation TGs [6] such as: Three-dimensional skin model development for cytotoxicity testing of chemicals and cosmetics in in vitro skin models goes already back to the 1980s and in literature, the first co-cultures of skin tissue composed of human epidermal keratinocytes and dermal fibroblasts have been described in the 90s. [16,91] The background and management of the validation study on in vitro tests for acute skin irritation sponsored by ECVAM is described by Spielmann et al. [92] and shows very detailed the structure of such a process and stakeholders involved. In 2015, the OECD guideline 493 has been introduced and provides since then an in vitro procedure allowing the identification of skin irritants. [8,9] This alternative approach is well characterized, validated, and accepted as an effective replacement method for animal experimentation. Within this guideline, the use of a reconstructed human epidermis, which is composed of several epithelial cell layers that closely mimic the histological, morphological, biochemical, and physiological properties of the upper parts of the human skin, is recommended in combination with validated test methods to assess cell viability. In addition to the OECD guideline, current legislative dossiers such as REACH, the Cosmetics Directive, and the classification, labeling, and packaging (CLP) directive continue to stimulate the implementation of skin models in general. Within the regulation 1907/2006 for REACH, only this alternative testing shall be conducted for skin corrosion and irritation as standard information for substances. Commercially available models such as the EpiDerm skin model [93] as well as the EPISKIN [94] are two examples of validated 3D models for this guideline and a more comprehensive overview of the available models is given by De Wever et al. [95] In addition to these validated and regulatory approved alternative test methods, the European Commission's JRC has established the database on alternative methods (DB-ALM). [96] This is a public, factual database service that provides evaluated information on the development and applications of advanced and alternative methods to animal experimentation in biomedical sciences and toxicology, both in research and for regulatory purposes. The information is presented as evaluated datasheets in the form of summary records and/or detailed information. In December 2014, the OECD has adopted a guidance document no. 211 for "Describing non-guideline in vitro test methods" to facilitate their consideration in regulatory applications. Herein, the DB-ALM is indicated as one of the publicly available repository/libraries of methods that may be helpful in the dissemination of comprehensive method descriptions.

Challenges to Develop Alternative In Vitro Test Methods
As already stated, there is an ascertainable trend toward the development of more predictive and reliable alternative testing systems to reduce time and cost investments as well as ethical considerations. From a researcher's perspective, the understanding of the processes finally resulting in an approved and accepted TG is helpful for the test development. It is also important to understand, that some of the accepted TG's only reflect one specific endpoint and therefore a battery of different tests might be required for one specific chemical, toxicant, or drug.
One of the biggest challenges in our opinion is that many researchers usually do not work with shared standardized and detailed protocols, thus the comparison of data between laboratories and even within one laboratory is challenging. Addressing reproducibility and reliability in cell culture research requires a detailed description of the biological test systems as highlighted in a recent review. [97] To finally compare the results among laboratories, SOPs have emerged as a set of specifically written instructions that document a routine approach to follow and usually much more detailed than in a published paper [98] and are already used as a deliverable in many EU research projects. Researchers have to understand the importance of implementing SOPs in their laboratory for tests, not only to achieve interlaboratory harmonization and pave the way for validation and regulatory acceptance but also to guarantee a comparison of data. Other challenges are that, the use of emerging technologies requiring very specific equipment or complex and difficult to reproduce cell models that might not be easily transferable to third laboratories can make the validation and regulatory process difficult. Also, the concept of IATA where different alternative methods are combined to predict one endpoint is currently a challenge for the OECD as this is not governed by a single test guideline. [73] The OECD has published a guidance document including templates for a harmonized and structured documentation that can facilitate the evaluation of IATA in regulatory decision-making within OECD member countries. [99] One example which can be used as an inspiration is the endpoint for human skin sensitization, where a mechanistically informed IATA is based on an AOP describing the linkages between the chemical interaction with a biological system at the molecular level and the subsequent biological effects at the subcellular, cellular, tissue, organ, and whole animal or human population. [100] Another challenge is the use of approved TGs for new substance classes such as nanomaterials (i.e., materials with any external dimension or internal/structural dimension in the nanoscale). Nanomaterials fall under the definition of a "substance" in REACH. However, due to the specific properties and behavior of nanomaterials in a biological environment, for example, as in in vitro test methods, there is a need to adapt some of the existing OECD TGs. In addition, new TGs for nano-specific properties are currently being developed. [101] In response to this need, the OECD Working Party on Manufactured Nanomaterials (WPMN) was established in 2006 to oversee testing and assessment programs to investigate the applicability of OECD TGs. Other initiatives, such as the EU projects NANoREG [102] and ProSafe, [103] have identified a number of existing TGs that need to be adapted in order to be applicable to nanomaterials. Much more work will be required to identify modifications in existing TGs or replacements to address relevant regulatory endpoints for nanomaterials.
Finally, the costs for the development of test methods and the validation process have to be taken into account. This definitively depends on the complexity of the method and endpoints. Usually, researchers apply for research grants from national or international foundations. A survey of EU member states reported on funding for alternative (3Rs) methods in the range of 18.7 million in 2013. [104] Based on our own experience it can take several years and several € hundred thousand up to € 1-2 million to come up with the first version of the model ready to enter the pre-validation phase. The main costs are then associated with the validation process which can take again several years and depend on the complexity of the assay and the number of laboratories involved. The example of the validation process for skin irritation assays as outlined by Spielmann and colleagues [92] had a cost range of roughly € 800 000. [79]

Conclusions
There is a need for hazard tests within the field of regulatory toxicology and biomedical research in the context of the 3Rs concept. Our motivation to work on the development of alternative models is: i) to get a mechanistic understanding of toxicant-cell interactions, which is not possible in vivo; ii) to explore the boundaries of tissue engineering to design humanrelevant models; and iii) our responsibility for animals. Based on this it is our vision to move from test models "which are nice" to test models "that are regulatory relevant" allowing an acceleration of the safe development of chemicals and drugs. Another incentive could be the generation of IP and developing a commercial product as shown by companies who sell regulatory approved cell or tissue models. Finally, also the regulatory authorities can increase the acceptance of non-animal alternatives by developing performance-based standards and by promoting trials combining traditional and new methodology data to build up experience in the new methods. [105] Also, recently the EU, but also other national and international committees, has started to actively promote the replacement, reduction, and refinement of animal testing. Many researchers in academia and in industry work in the field of alternative test method development. However, for these tests to be accepted by a broad community they must first be standardized, validated, and get approved. Our aim was to get an understanding of the (pre-)validation and approval process and to outline the relevant steps required for the development of an SOP and then an approved guideline/test method from a researcher, that is, test developer, a perspective which is also summarized in Figure 5.
The regulatory approval process for a new alternative test method to be instated as an, for example, OECD TG was discovered to take a significant amount of time which means several years up to a decade. Two factors were identified and attributed to this significant time span. First, the complexity of the alternative field: the scientific knowledge pushing alternatives forward is on the edge of current knowledge and only grows in intricacy with each discovery/invention. Thus, the risk mitigation behind these tests requires close attention to detail and thoroughness in the validation process. Second, the communication between the different stakeholders, for example, industry, researchers, and regulatory bodies is fundamental and has to be promoted more actively. This includes the development of strategic roadmaps to guide the application of new approaches as for instance done by ICCVAM in 2018 who published "A strategic roadmap for establishing new approaches to evaluate the safety of chemicals and medical products in the United States." [106] Only by understanding the different processes and the interaction of the various stakeholders of the standardization, validation, and approval procedure for an alternative test method researcher can plan the required steps more optimal.