Adding experimental treatment arms to Multi-Arm Multi-Stage platform trials in progress

Multi-Arm Multi-Stage (MAMS) platform trials are an efficient tool for the comparison of several treatments. Suppose we wish to add a treatment to a trial already in progress, to access the benefits of a MAMS design. How should this be done? The MAMS framework requires pre-planned options for how the trial proceeds at each stage in order to control the family-wise error rate. Thus, it is difficult to make both planned and unplanned design modifications. The conditional error approach is a tool that allows unplanned design modifications while maintaining the overall error rate. In this work, we use the conditional error approach to allow adding new arms to a MAMS trial in progress. We demonstrate the principles of incorporating additional hypotheses into the testing structure. Using this framework, we show how to update the testing procedure for a MAMS trial in progress to incorporate additional treatment arms. Simulations illustrate the possible operating characteristics of such procedures using a fixed rule for how and when the design modification is made.


Introduction
During Phase II of the drug development process it is common to have several competing treatments, these may be different doses of the same drug or entirely different treatment regimes.Jaki and Hampson (2016) note that, given the high failure rate and cost of Phase III trials, it is key to that careful consideration be given to which treatments should be carried forward for further study.Multi-arm multi-stage trials (MAMS) (Royston and others, 2003;Jaki and Magirr, 2013;Wason and Jaki, 2012) compare several experimental treatments with a common control allowing for the efficient selection of appropriate treatments (Jaki, 2015).
MAMS trials reduce the expected number of patients by dropping treatments that are demonstrated to be ineffective/showing lack of promise or stopping the trial altogether if efficacy has been demonstrated.Given the multiple hypotheses and highly adaptive nature of the design, MAMS studies require specialist testing methodology in order to control the error rate of the trial (Stallard and Todd, 2003).Magirr and others (2012) introduced the generalised Dunnett family of tests, where group sequential testing boundaries are defined to account for the multiple analyses, while accounting for the correlation introduced by the comparison of several experimental arms to a common control (Dunnett, 1955); Urach and Posch (2016) extend this directly defining all elements of the testing procedure.Alternatively fully flexible testing methods have been proposed (for example, (Bretz and others, 2006;Schmidli and others, 2006;Posch and others, 2005;Koenig and others, 2008;Bauer and Kieser, 1999)), allowing decisions about which arms should remain in the study to function separately from the hypothesis testing.Both methods require the pre-definition of all study hypotheses, so that the overall testing procedure may be constructed to give strong control of the Family-Wise Error Rate (FWER) (Dmitrienko and others, 2009).
It is possible that not all experimental treatments are available at the start of the trial as, for example, see in the STAMPEDE trial (Sydes and others, 2009).STAMPEDE started with five comparisons and subsequently added several more to the protocol.Including further experimental treatments into the trial in progress maintains the benefits of a MAMS design reducing logistical and administrative effort, speeding up the overall development process (Parmar and others, 2008), efficiency in the multiple comparisons and allowing direct comparisons of the treatments within the same trial.
Treatments may be added to the trial in progress by adjusting the pre-planned testing structure provided no use has been made of the data observed in the trial (thus requiring no interim analysis has been conducted).Bennett and Mander (2020) demonstrate how to suitably adjust the sample size for each treatment arm for such additions.It is possible that treatments may become available after some interim analysis, our methods allow modification at any stage of the trial with the only restriction being that no conclusion of statistical significance has been made.
The conditional error approach (Proschan and Hunsberger, 1995) allows for design modifications during the course of a trial, where these modifications have not been pre-planned.It has been shown these modifications may be accounted for in the setting of treatment selection (Koenig and others, 2008;Magirr and others, 2014) however, adding hypotheses to a testing framework requires further restrictions on any introduced hypotheses (Hommel, 2001).We propose a general framework using these principles for the inclusion of additional hypotheses to a testing procedure that allows the inclusion of existing trial information.We show how to apply this in the setting of MAMS designs, demonstrating how to construct an appropriate hypothesis testing structure for the updated trial such that the FWER is strongly controlled.

A two arm trial
Suppose we plan a two arm trial with a continuous outcome to compare a new treatment, T 1 , and a control, T 0 .Let µ 1 and µ 0 be the expected responses for patients on treatments T 1 and T 0 respectively, and define the treatment effect as θ 1 = µ 1 − µ 0 .We investigate the one sided null hypothesis H 01 : θ 1 0.
The trial will recruit a total of n patients randomised equally between treatment and control.
Let X i,k ∼ N (µ k , σ 2 ) for i = 1, ..., n/2 and k = 0, 1 then θ1 is the estimate of the treatment effect.For ξ 1 = θ1 √ n 2σ this has corresponding Z-value, We reject H 01 at leve α when , where Φ is the standard normal cdf.

Adding a treatment
Suppose for τ ∈ (0, 1) after τ n observations a new treatment, T 2 , becomes available.Let µ 2 be the expected response for patients receiving this new treatment and define the corresponding treatment effect by θ 2 = µ 2 − µ 0 with corresponding null hypothesis H 02 : θ 2 0.
Suppose, we maintain the pre-planned elements of the trial concerning treatments T 1 and T 0 , such as the same sample size per treatment.Notationally it is convenient to define stage 1 and stage 2 consisting of the patients recruited before and after the treatment is added.From the stage 1 data we find and from the stage 2 data we find The overall Z-value may be reconstructed from the stagewise Z-values We recruit a further (1 − τ )n/2 patients to T 2 in stage 2, maintaining equal randomisation to all treatments.Since T 2 is added to the trial for stage 2 for ξ 2 = θ2 is based only on the data available from the second stage of the trial from which we construct the Z-value.Due to the common control and equal randomisation Z (2) 1 and Z 2 have correlation 1/2.

Hypothesis testing
For the two arm trial we constructed our hypothesis test in order to control the type I error rate at some pre-defined level α.A natural extension in the case of multiple hypotheses is the Family-Wise Error rate (FWER), for the event R that we reject one or more true null hypothesis the FWER is defined as P θ (R).
Suppose, when adding T 2 we test each null hypotheses at a nominal level α = 0.05, Figure 2.3 shows the impact on the FWER as we vary τ .Under extremes of τ = 0 and τ = 1 the trial is not altered and should be designed accordingly to achieve a FWER of α.For all values in between we see that the FWER is inflated when compared to the nominal α.As is typical in a confirmatory setting (Dmitrienko and others, 2009) we require strong control of the FWER, that is (2.1) Sugitani and others (2018) propose methods that account for the introduction of the additional hypothesis, testing any introduced hypothesis based strictly on the data collected after their introduction at level α (Hommel, 2001).We build on this approach, incorporating existing information where possible.
We construct an overall closed testing procedure (Marcus and others, 1976) that accounts for the adaptive nature of the trial within each test (Koenig and others, 2008) No changes have been made to the recruitment or analysis concerning H 01 , so as before we reject H 01 when Z 1 > Φ −1 (1 − α) at the end of the trial.It is useful to discuss constructing this test using the conditional error principle (Proschan and Hunsberger, 1995).Given z (1) 1 we define the conditional error rate The probability of rejecting the null hypothesis for the remainder of the trial must not exceed 1 ).Writing the test in terms of the stage 2 observations while incorporating the stage 1 data, we reject H 01 when Z (2) 1 1 ) be the probability density function of which in turn guarantees control of the error rate at the pre-specified level α, that is There is no existing for H 02 and thus the test must be constructed purely based only on the stage 2 trial data used to construct Z 2 .We reject the test for There is no pre-planned test for H 0,12 , however there is pre-existing information for H 01 in the form of Z (1) 1 .Hommel (2001) show how to use such first stage information in the test of an intersection hypothesis, when adding some initially excluded hypotheses after an interim analysis which we apply to the added hypothesis.Clearly if H 0,12 is true this implies that H 01 is also true.Since H 01 is true we compute the conditional error rate A(z 1 ) as described previously, furthermore under H 01 , z (1) 1 is distributed such that equation 2.2 holds as before.Thus we may construct the test of H 0,12 at the end of the trial at level A(z (1) 1 ) allowing for the incorporation of the stage one data given by Z (1) 1 .
For example, consider a Dunnett test (Dunnett, 1955) for H 0,12 .Let and define the distribution We construct the Dunnett p-value, and may reject H 0,12 when P D < A(Z (1) 1 ).
The choice to recruit a further (1 − τ )n/2 patients to each treatment after the interim analysis is not required.The total number of patients recruited in stage 2 is free to vary however, changes to the ratio of patients on treatment and control requires slight further modification (although the ratio remains fixed after the modification is made).If the ratio of patients between the existing treatment and control differ before and after the design modification it is no longer possible to weight the Z-values in order to recover the pooled test statistic, in which case Z 1 would need to be constructed by using the weighted inverse normal (Bauer and Köhne, 1994;Lehmacher and Wassmer, 1999;Hartung, 1999) with weights defined at the time the modification is made.

Simulation study
For combincations (ξ 1 , ξ 2 ), with σ/n = 1, δ = Φ −1 (0.95) + Φ −1 (0.9) and τ = 0.5 we simulate 1,000,000 realisations of Z 1 and Z 2 assuming equal sample size in each treatment at each stage in R (R Core Team, 2019).Table 1 shows estimates of the probabilities of an error for the local hypothesis tests, as required this is α whichever combination of null hypotheses are true.1. Probabilities of rejecting components of the closed testing procedure under proposed testing procedure, type I errors highlighted in bold, δ = Φ −1 (0.95) + Φ −1 (0.9) such that we have power of 0.9 when testing H01 in the original trial.
We compare the overall trial performance of the method proposed above with basing the test for the intersection hypothesis only on evidence for H 01 , that is we reject H 0,12 when Z 1 > Φ −1 (1 − α) (treating the first null hypothesis as a gate keeping procedure (Dmitrienko and Tamhane, 2007)).In both procedures Z (1) 1 is used in the test of H 0,12 by the argument that Table 2. Probabilities of global rejection of null hypothesis using the conditional error approach, type I errors highlighted in bold, δ = Φ −1 (0.95) + Φ −1 (0.9) such that we have power of 0.9 when testing H01 in the original trial.
In Figure 2 we examine the probabilities of rejecting the intersection hypothesis H 0,12 for all combinations of H 01 and H 02 true and false.When H 01 is false the conditional error is likely to be higher than the pre-planned α, giving a high chance of rejecting H 0,12 ; when H 01 is true and H 02 is false there is a small reduction in the probability of rejecting H 0,12 , this explains the ) ) ) ) ) ) ) ) ) ) ) ) ) ) Fig. 2. Conditional error rate, A(z1(1)), against probability of rejecting the intersection hypothesis P (Reject H0,12|z (1) 1 ) and corresponding density of conditional error f (z 1 , δ = Φ −1 (0.95) + Φ −1 (0.9) such that we have power of 0.9 when testing H01 in the original trial.
deficit of our proposed procedure when ξ 1 = δ and ξ 2 = 0. Conversely when H 01 is true the conditional error is likely to be quite low: when both null hypotheses are true this corresponds to a low probability of rejecting H 0,12 however, when H 02 is false we recover some possibility of rejecting H 0,12 allowing us to reject H 02 globally.

General rule for adding hypotheses
Suppose there is a set of existing null hypotheses E with a pre-planned closed testing procedure, and we wish to add a set of new null hypotheses N .Let H e be the intersection of some subset of the existing null hypotheses e ⊆ E and H n be the intersection of some subset of the new hypotheses n ⊆ N .To construct an updated closed testing procedure there are three forms of null hypoothesis to consider.then H e ∩ H n may be based on the data relating to both H e and H n and is tested at α e ; while if H n is not added the test for H e ∩ H n is implicitly that of H e also tested at α e .In either case the test of H e ∩ H n is tested at α e ensuring that an equation of the form 2.2 holds whatever decision is made while proposing changes to the trial design.Noting that any procedure that gives strong control of the FWER is a closed testing procedure Burnett and Jennison (2021).So we may add hypotheses to any procedure that ensures strong control of the FWER while maintaining the statistical integrity of the trial.The penalty for doing so compared is the test of hypotheses of the form H e ∩ H n .

Alteration of a Multi-Arm Multi Stage trial in progress
4.1 Multi-arm multi-stage trials With multiple experimental treatments to compare with the control, we should consider a MAMS design (Jaki, 2015;Wason and others, 2016).This allows us to compare the treatments in the same trial, while incorporating pre-planned interim analyses to facilitate early stopping.This ensures: poorly performing treatments may be dropped for futility; alternatively the trial may be stopped early to declare efficacy, reducing the overall development time and number of patients.
This early stopping can be done while formally testing null hypotheses and controlling the FWER (Equation 2.1), through the use of generalised Dunnett testing procedures (Magirr and others, 2012).We use the extension of this proposed by Urach and Posch (2016).This directly defines all elements of the closed test allowing us to directly apply our rule for adding hypotheses from Section 3. Suppose we have K novel treatments, T 1 , ..., T K to compare against a common control.We define the null hypotheses H 0i : θ i 0 and corresponding alternatives H 1i : θ i > 0 for all i = 1, ..., K.A MAMS designs will simultaneously test these K hypotheses over J analyses.
Let n be the number of patients to be recruited to the control arm in the first stage of the trial.We assume patients are randomised at the desired rate in each stage of the trial.At analysis j = 1, ..., J the trial will have recruited r k n patients to treatment k = 0, 1, ..., K (r Treatments may be dropped futility at each analysis (and removed from any further consideration), suppose treatment k * is stopped at analysis j * we have r k * for all j j * .If all T 1 , ..., T K are dropped for futility the trial stops recruiting.Alternatively the trial may stop early if a treatment or treatments have been selected for further study, such as when the trial is stopped due to a treatment-control comparison yielding statistical significance (Urach and Posch, 2016).
From the observations at each stage j = 1, ..., J and treatment k = 1, ..., K we construct estimates θ(j) k .Then defining , we find the corresponding Z-values k .
In the testing procedures that follow we require that the ratio of patients assigned to each treatment remains consistent through each stage of the trial, that is for all k = 1, ..., K and j, l = 1, ..., J (Koenig and others, 2008).

The Generalised Dunnett procedure
Recall that R is the event that we reject one or more true null hypothesis then extending Equation 2.1 to K null hypothesis strong control requires that The generalised Dunnett method (Magirr and others, 2012) simultaneously tests the null hypotheses, defining group sequential testing boundaries that account for the correlation structure of comparing multiple treatments to control to achieve the desired FWER.
We define efficacy boundaries u = (u 1 , ..., u J ) where the null hypothesis in treatment group k = 1, ..., K, H 0k , is rejected at analysis j if Z (j) k > u j (and the trial is stopped).We define futility stopping boundaries l = (l 1 , ..., l J ) where if Z (j) k < l j the corresponding treatment is dropped for futility.
To achieve strong control of the FWER it is sufficient to choose u and l such that under the global null, θ 1 = ... = θ K = 0 which we denote by 0, P 0 (R) α (Magirr and others, 2012).Such testing boundaries may be computed using familiar group sequential theory testing.

Group sequential closed testing
Let K be the set such for any I ⊆ (1, ..., K) we have that ∩ i∈I H 0i ∈ K. Constructing tests for each H 0m ∈ K at level α.We reject H 0k globally when all tests including H 0k are rejected at level α for k = 1, ..., K.
The generalised Dunnett defines the test for the intersection of all null hypotheses H 01 ∩ ... ∩ H 0K and implicitly tests all H 0m ∈ K using the same u and l.Urach and Posch (2016) extend this by directly defining all tests required for the closed testing procedure; for each The futility boundaries l = (l 1 , ..., l J ) must be the same for all hypotheses.

Adding experimental treatment arms
Suppose at the J th (J ∈ (1, ..., J)) interim analysis of a MAMS trial in progress we wish to add T 1 new treatments.We now have up to K = K + 1 + T treatments in total (in the case that all K + 1 original treatment arms are all still in the trial).We have planned recruitment k n for treatment k = 1, ..., K + T at stage j = 1, ..., J where r (j) k = 0 for all k > K.When modifying the trial we define a modified recruitment plan, recruiting r (j) k n patients for each treatment k = 1, ..., K + T at each remaining stage of the trial j = J , ..., J (we could also use this opportunity to modify the number of stages); for j J we know that r (j) k n while for j > J we fix the planned recruitment for the remainder of the trial at this point.As in Section 2.2 we use the independent increments of the Z-values splitting the trial according to patients recruited before and after the J th analysis.For j = J + 1, ..., J and k = 0, 1, ..., K the sample that would have been recruited is given by r . For each k = 1, ..., K and j = J + 1, ..., J we define weights, and re-construct the Z-values for the remainder of the trial as Weighting together the Z-values in this way will allow us to modify the ratio of patients recruited to each treatment at the time of the design modification.As in Section 4.1 these ratios must remain fixed for all stages of the trial after the modification has been made.

Incorporating additional hypotheses
We now have null hypotheses H 0i : θ i 0 and alternatives H 1i : θ i > 0 for all i = 1, ..., K +T , and require strong control of the FWER across all K +T tests.We construct a closed testing procedure following the rule introduced in Section 3. We define three sets: the set of existing null hypotheses H 01 , ..., H 0K and all intersections, K; the set of added null hypotheses H 0K+1 , ..., H 0K+T and all intersections, T ; and the set of all intersections between existing and added null hypotheses, KT .
The conditional error rate of each test for H 0m ∈ K is maximised under the global null (Stallard and others, 2015).Given the existing estimates, θ ) and under the originally planned trial described in Sections 4.1, 4.2 and 4.3 we write the conditional error for each H 0m ∈ K under the global null as As in Equation 2.2 we have that under the global null as required.It is useful to re-write the testing boundaries for each H 0m ∈ K in terms of only the data collected after stage J , that is for j = J + 1, ..., J and k = 1, ..., K where if Z (j) k < l k,j,m T k is dropped for futility.This allows computation of the conditional error rate based Z (j) k for j = J + 1, ..., J and k = 0, 1, ..., K.
For each H 0m ∈ K the hypothesis test must be constructed at level B m ( θ(J ) ).For each H 0m ∈ T the hypothesis test must be constructed at level α.For each H 0m ∈ KT the hypothesis test must be constructed at level B m ( θ(J ) ).This ensures each test for the trail as a whole is constructed at level α as required, while including any existing trial data and allowing for any changes to recruitment for T 0 , T 1 , ..., T K , and the FWER is strongly controlled.

T. Burnett and others
For each hypothesis H 0m ∈ K ∪ T ∪ KT we define the testing boundaries for the modified trial at the required error rate u m = (u J +1,m , ..., u J,m ) and l m = (l J +1,m , ..., l J,m ).At stage j = J + 1, ..., J for treatment k = 0, 1, ..., K the recruitment is governed by r . For each experimental treatment from the first stage of the trial k = 1, ..., K we define weights for data before and after stage J , for j = J + 1, ..., J and and construct the Z-values for for the hypothesis tests as allowing us to write the testing boundaries for each H 0m ∈ K in terms of only the data collected after stage J , that is for j = J + 1, ..., J and k = 1, ..., K u j,k,m = u j,m − w where if Z (j) k < l k,j T k dropped for futility (note for k > K u j,m and l j ).With this in place u m and l m may be computed as per the generalised Dunnett test.

An illustrative example
For the initial design consider a three stage trial to compare two treatments with a control, recruiting n = 10 patients to each treatment at each stage of the trial; that is J = 3, K = 2 and r k = (1, 2, 3) for k = 0, 1, 2. Under this design we test the null hypotheses H 01 : θ 1 0 and H 02 : θ 2 0. The testing boundaries are constructed for a FWER of α = 0.05, let δ = Φ −1 (0.75) √ 2 and σ = 1.At a configuration of θ = (δ, 0) we have a target power of 1−β = 0.9.
Defining the triangular testing boundaries (Whitehead, 1997) we first compute the testing boundary for H 01 ∩ H 02 using the mams() function of the MAMS package in R (Jaki and others, 2019).
This sets futility boundary for all tests with the upper boundaries computed for testing both H 01 and H 02 separately.
Suppose after the first analysis J = 1 we add two further treatments T = 2, adding the null hypotheses H 03 : θ 3 0 and H 04 : θ 4 0. Given Z (1) = (2, 1.5), the trial would continue in all arms at the interim analysis.Computing the conditional error rate for each existing test we construct all required tests as described in Section 4.5.Using triangular testing boundaries ensuring all lower boundaries correspond to those of H 01 ∩ H 02 ∩ H 03 ∩ H 04 .We continue recruiting 10 patients per treatment per stage, allowing for a maximum total sample size of 130 patients.
Table 3 shows the operating characteristics of the updated trial based on 1,000,000 simulations of the remainder of the trial.Due to the tests being conditional on the first stage observations the probabilities of rejecting the null hypotheses under the global null are not 0.05.Since Z (1) 1 > Z (1) 2 we observe higher probabilities of rejecting H 01 than H 02 for equivalent values of θ 1 and θ 2 , for example when θ 1 = θ 2 = δ the probability of rejecting H 01 is 0.13 higher.Similarly since Z (1) 1 > 0 and Z (1) 2 > 0 the probability of rejecting H 01 or H 02 is higher than the probability of rejecting H 03 or H 04 , for example when θ 1 = θ 2 = θ 3 = θ 4 = δ we have probabilities of 0.94, 0.81, 0.59 and 0.59 of rejecting H 01 ,H 02 , H 03 and H 04 respectively.We also see the benefit of incorporating all treatments in the same trial, with a reduction in the expected sample size and a chance to reject multiple null hypotheses, when there are more beneficial treatments overall.0.01 0.12 0.24 0.22 0.42 Table 3. Operating characteristics for the remainder of the trial given Z1 = (2, 1.5) under corresponding configuration θ.Where Ri is the event that H0i is rejected and N is the total sample size (note 30 participants already recruited).

Comparison of performance
We compare our proposed method with two options that maintain the integrity of the results given that observations are already available from the trial: option one is to conduct a separate MAMS trial comparing the new treatments with the control in addition to the trial already in progress; option two is to conclude the current trial and start a new trial incorporating all four experimental treatments.In examining these unmodified designs we make no use of the previous trial data, meaning these trials do not benefit from the patients already recruited.
As before we add two treatments, T = 2, at the first analysis, J = 1.We keep all other parameters as before, keeping them consistent for each design.We estimate the operating characteristics of our proposed method based on 10,000 simulations.This is a lower number of simulations than would be ideal, due to the computationally intensive nature of the simulation.In practice we do not expect this to be used as a pre-planned scheme and hence only one set of updated testing boundaries need computing, making longer simulations more viable as seen in Section 5. We break the operating characteristics of our proposed method down, with Table 4 showing the behaviour of the first stage of the trial and Table 5 showing trial that continues beyond the first interim analysis.We see a relatively high probability of the trial concluding at the first analysis, before the treatments are added; this shows the first stage data should not be disregarded.
If the trial continues beyond the first stage we observe a similar pattern to that shown in Comparing our proposed procedure with option one shown in Table 6, conducting two separate trials produces similar probabilities for rejecting H 01 or H 02 .Our method is sensitive to θ 3 and θ 4 due to their ability to also conclude the trial early.Two separate trials increase the probabilities of rejecting H 03 or H 04 ; this is partially due to the disconnect between trials, if one concludes early the other may continue and reject and null hypothesis.Given this and that patients are recruited to the control in both trials we see that our proposed method significantly reduces the expected sample size, with 70-80 patients including the first stage of the trial for trials that continue beyond the first stage (for the trial as a whole this expected sample size drops to 50-60 over the scenarios we have examined) whereas option one requires 90-95 patients.There are two key flaws in option one: while this method incorporates all existing data for H 01 and H 02 there is no multiplicity adjustment between the existing and added hypotheses, as we have two separate trials of two null hypotheses each with a FWER of α; if we wish to select some subset of treatments for further study, there is no guarantee of direct comparability between each trial.
Comparing the operating characteristics of option two in Table 7 with our method in Table 5, we see that the probabilities of rejecting H 01 or H 02 are lower while the probabilities of rejecting H 03 or H 04 are similar, this leads to a reduction in the probabilities of rejecting multiple hypotheses.For example, when θ 1 = θ 2 = θ 3 = θ 4 = δ the probabilities of rejecting H 01 ,H 02 , H 03 and H 04 are 0.77, 0.77, 0.61 and 0.61 respectively, while they are all 0.64 under option two and the probability of rejecting two or more hypotheses falls by 0.12 compared to our proposed method.
The expected sample size of the trial conducted under option two is reduced by 8-15 patients this does not account for the fact that 30 patients have been recruited who do not contribute to the result.

Discussion
The motivation for adding a treatment to a trial in progress is clear.7.Under starting a new trial incorporating all treatments, probabilities of rejecting null hypotheses and expected sample size under the corresponding configuration of θ for option two assuming the trial continues beyond the interim analysis.Where Ri is the event that we reject Hoi and N is the total sample size (note 30 additional patients are recruited but not used in the analysis).
tegrity and avoiding delays to the overall development process.Our proposed general framework for adding experimental treatments to a trial in progress builds upon the work of Hommel (2001), allowing any trial with strong control of the FWER to add new hypotheses.This also allows other alterations to the design of the trial while ensuring that all information already collected is utilised in inference and decision making.
This framework can be applied in our motivational setting of MAMS platform trials (Meyer andothers, 2021, 2020).The examples in Section 5 demonstrate that this does indeed strongly control the FWER as expected.
Our examples in Sections 2.4 and 5 show the penalty adding treatments in terms of the probability of rejecting the null hypotheses is marginal and only has a notable impact on the introduced arms, optimising the recruitment proportions across configurations of the true treatment effects may reduce the impact of this further.In addition the combination of utilising the existing data and the efficient use of control patients across the trial yields a reduction in the expected sample size when compared to alternatives that do not make such use of the existing data.The operating characteristics are not the primary motivation to adding treatments to a trial in progress.
As for MAMS designs in general this allows for reduction in logistical and administrative effort and speeding up the overall development process as well as allowing direct comparisons of the treatments within the same trial.
The general framework for adding hypotheses to a trial in progress has broader application than, being applicable to any testing procedure that gives strong control of the FWER.The addition of hypotheses in this way allows for the incorporation of existing trial data into decisions about how to plan the remainder of the trial.

Software
Software relating to the examples in this paper is available at https://github.com/Thomas-Burnett/Adding-treatments-to-clinical-trials-in-progress.git.
Fig.1.Inflation in the FWER when an additional hypothesis is added to an ongoing two arm trial.The red line is the nominal FWER α = 0.05 as per the design and the black line is the actual FWER for the given τ .
. This requires tests of H 01 , H 02 and H 0,12 = H 01 ∩ H 02 : θ 1 0 & θ 2 0. Rejecting H 01 globally when the local level α tests of H 01 and H 0,12 are rejected and H 02 globally when the local level α tests of H 01 and H 0,12 are rejected.
Table2shows the global rejection probabilities for the null hypotheses for each testing method.Both testing procedures gives strong control of the FWER.The probabilities of rejecting false H 01 do not differ largely, with an increase of at most 0.04 for the gate keeping procedure.When both null hypotheses are false, there is a small decrease of 0.03 in the probability of rejecting H 02 for the gate keeping procedure.When H 01 is true and H 02 is false the gate keeping procedure cannot reject H 02 without making an error in rejecting H 01 and thus our proposed procedure increases the probability of rejecting the H 02 by 0.29.The small advantage for testing H 01 for the gate keeping procedure is outweighed by the ability to reject H 02 when there is a low probability of rejecting H 01 when taking an integrated approach to the test of the intersection hypothesis.Dunnett procedure for testing the intersection hypothesis ξ 1 ξ 2 P (Globally reject H 01 only) P (Globally reject H 02 only) P (Globally reject both) P (Globally reject Gate keeping procedure for testing the intersection hypothesis ξ 1 ξ 2 P (Globally reject H 01 only) P (Globally reject H 02 only) P (Globally reject both) P (Globally reject Let α e be the conditional error rate for the test of H e at the time the N hypotheses are added, the test of H e requires the probability of falsely rejecting H e does not exceed α e .H n : With no existing test for H n it must be tested at level α.H e ∩H n : H e ∩H n =⇒ H e and hence the data already available for H e is distributed such that computing the corresponding conditional error α e will ensure that an equation of the form 2.2 holds.Thus we incorporate the existing information into the test of H e ∩ H n by constructing it such that the probability of falsely rejecting H e ∩ H n does not exceed α e .Any intersection of the form H e ∩ H n must be constructed in this way.While proposing changes to the trial one may or may not add a new hypotheses H n : in the case where H n added For the unmodified option one, we use the original trial for H 01 and H 02 and compute boundaries for a two stage trial for H 03 and H 04 .For the unmodified option two, we compute boundaries for a two stage trial for H 01 , H 02 , H 03 and H 04 .We use 1,000,000 simulations to estimate the operating characteristics for each unmodified design.

Table 3 .
With lower probabilities of rejecting H 03 or H 04 than rejecting H 01 or H 02 .The probabilities of rejection are lower than Table3since l (1) = 0 allows for less promising stage-1 Z-values to progress the trial beyond the first analysis.

Table 5 .
Under our proposed update procedure, probabilities of rejecting null hypotheses and expected sample size under the corresponding configuration of θ for our proposed update procedure when the trial continues beyond the first stage.Where Ri is the event that we reject Hoi and N is the total sample size (including the 30 patients included in stage one).

Table 6 .
Under two separate trials, probabilities of rejecting null hypotheses and expected sample size under the corresponding configuration of θ for our option one assuming the trial continues beyond the interim analysis.Where Ri is the event that we reject Hoi, N1 is the total sample size in the original trial and N2 is the total sample size in the additional trial.
Should a new treatment become available it is desirable incorporate it allowing direct comparisons while preserving in-θ P θ (R 1 ) P θ (R 2 ) P θ (R 3 ) P θ (R 4 )