A Bayesian multi‐arm multi‐stage clinical trial design incorporating information about treatment ordering

Multi‐Arm Multi‐Stage (MAMS) designs can notably improve efficiency in later stages of drug development, but they can be suboptimal when an order in the effects of the arms can be assumed. In this work, we propose a Bayesian multi‐arm multi‐stage trial design that selects all promising treatments with high probability and can efficiently incorporate information about the order in the treatment effects as well as incorporate prior knowledge on the treatments. A distinguishing feature of the proposed design is that it allows taking into account the uncertainty of the treatment effect order assumption and does not assume any parametric arm‐response model. The design can provide control of the family‐wise error rate under specific values of the control mean and we illustrate its operating characteristics in a study of symptomatic asthma. Via simulations, we compare the novel Bayesian design with frequentist multi‐arm multi‐stage designs and a frequentist order restricted design that does not account for the order uncertainty and demonstrate the gains in the sample sizes the proposed design can provide. We also find that the proposed design is robust to violations of the assumptions on the order.


NUMERICAL RESULTS
The numerical results that compare the proposed Bayesian design with the frequentist designs with Pocock critical bounds are provided in Figures 1 and 2. The results under the global null hypothesis are provided in Table 1.

FURTHER EXPLORATIONS OF THE OPERATING CHARACTERISTICS FOR A 4-ARM 2-STAGE B-MAMS DESIGN
In this section, further explorations of the probabilities to reject at least one null hypothesis for a 4-arm 2-stage design are provided.For the Bayesian model we set:  = 340 −2 ,  01 = 10 −6 ,  (0) 0 = 489,  (1) 0 = 602,  (1) 0 =  (2) 0 = 0.The precision values for the treatment mean differences and the control mean are given in Table 2.The Table provides also the sample sizes and the critical bounds that are found to control the FWER under the global and partial nulls at level  = 0.025 when  (0) = 489 and the design is powered at 80% to reject all hypotheses under  = (120, 120, 120).3 and 4 that the B-MAMS(U) design is robust in controlling the FWER around the global and partial null hypotheses for all the considered values of the control mean.Under the global null hypothesis the FWER is equal to 0.025 when the control mean is  (0) = 489 or  (0) = 1489.

It can be observed in Figures
Regarding the B-MAMS(D) design, the results in Figures 5 and 6 show that the FWER is controlled under the global and partial null hypotheses when  (0) = 489 and slightly inflated under the global null and the partial null -around 0.1% -when  (0) = 1489.However, this design shows also some minor inflations -up to around 0.7% -when the treatment effects are close to 200.
The B-MAMS(C) design is robust in controlling the FWER when  (0) = 489 -see Figure 7 -but it shows major inflations around the global and partial nulls when the control mean is off by a factor of 3 -see Figure 8.For example, for this chosen design, it can be observed that the probability to reject at least one hypothesis is almost 1 when  (1) or  (2) are equal to zero and  (3) = −200.
Overall, as concluded for the 3-arm 2-stage design, these exlorative analyses have shown minor inflations on the FWER when the true control mean is different from the prior assumption and the precision on the control mean is small.Thus, the proposed design can provide robust results when the assumption on the value of the control mean is violated.However, substantial inflations of the FWER are shown when an informative prior distribution is used for the control mean.
In addition, for the 4-arm 2-stage design, we have also included some results to compare how the chosen Bayesian designs perform in terms of rejecting all hypotheses and rejecting the third hypothesis -see Figures 9 and 10 respectively -compared to the frequentist ordered and multi-arm multi-stage designs under some alternative scenarios.Figure 11 shows the expected sample sizes (ESS) for the considered scenarios.Overall, the results suggest that the proposed Bayesian design provides benefits compared to the considered frequentist approaches in terms of probability to reject the third hypothesis -an increase up to around 90% for the B-MAMS(D), B-MAMS(C) and B-MAMS(U) compared to the ORD -when  (3) = 120 -see Figure 10.Nevertheless, the proposed design shows a reduction in power to reject all hypotheses -the B-MAMS(D) shows a difference up to around 5% compared to the ORD, while the B-MAMS(C) and B-MAMS(U) provide both a difference of around 5% compared to ORD designs respectively.In addition, when historical information is disregarded, the proposed design can match the operating characteristics of the MAMS(m) frequentist design.In terms of ESS, the B-MAMS(D) and B-MAMS(C) designs provide a reduction -up to 10% -compared to the frequentist MAMS(m) design.

FURTHER EXPLORATIONS OF THE OPERATING CHARACTERISTICS FOR THE 3-ARM 3-STAGE B-MAMS DESIGN
Figure 12 provides further explorations about the probability to reject at least one null hypothesis when  (0) = 0, 0.5, 1 for a 3-arm 3-stage B-MAMS design.For the Bayesian model we set:  = 1,  01 = 10 −6 ,  (0)  0 = 0,  (1) 0 = 0,  (1) 0 = 0.The precision values for the treatment mean differences and the control mean are given in Table 3.The Table provides also the sample sizes and the critical bounds that are found to control the FWER under the global and partial nulls at level  = 0.025 when  (0) = 0 and the designs are powered at 80% to reject all hypotheses under  = (0.5, 0.5).
It can be observed that the design is robust in controlling the FWER around the global and null partial hypotheses for the considered values of the control mean for the B-MAMS(D) and B-MAMS(U) designs.The FWER is inflated for the B-MAMS(C) design when the control mean is different from  (0) = 0.   , 20, 40, 60, 80, 120} (left) and under  = ( (1) , 120) and  (1) ∈ {0, 20, 40, 60, 80, 120} (right) for for the 3-arm 2-stage MAMS(m), ORD, Urach&Posch and Bayesian designs when all designs are powered at 80% to reject both hypotheses under  = (120, 120).All designs use Pocock bounds.Results are provided using 5000 replications.

TABLE 1
Probability to reject at least one hypothesis (FWER), maximum sample size (Max SS) and expected sample size (ESS) for each design under the global null hypothesis  0 with  (1) =  (2) =  (0) = 489 when all designs are powered at 80% to reject all hypotheses under  = (120, 120).Results are provided using 5000 simulations.Upper -  ,  ∈ {1, 2} -and lower - 1 -Pocock critical bounds are provided for the ORD and MAMS(m)  , critical boundaries, prior information on the mean treatment differences and on the control mean for each design under the global null hypothesis  0 with  (1) =  (2) =  (3) =  (0) = 489 when all designs are powered at 80% to reject all hypotheses under  = (120, 120, 120)., critical boundaries, prior information on the mean treatment differences and on the control mean for each design under the global null hypothesis  0 with  (1) =  (2) =  (0) = 0 when all designs are powered at 80% to reject all hypotheses under  = (.5, .5).

FIGURE 3
FIGURE 3Probability to reject at least one null hypothesis when  (0) = 489 for different values of the treatment effects for the 4-arm 2-stage B-MAMS(U) design when it is powered at 80% to reject both hypotheses under  = (120, 120, 120).

TABLE 2
frequentist designs.For the Urach and Posch design   ,  ∈ {1, 2} and  1 are the upper and lower global boundaries, while   ,  ∈ {1, 2} are the elementary boundaries.Maximum sample size (Max SS)

TABLE 3
Maximum sample size (Max SS)