Response‐adaptive randomization for multiarm clinical trials using context‐dependent information measures

Theoretical‐information approach applied to the clinical trial designs appeared to bring several advantages when tackling a problem of finding a balance between power and expected number of successes (ENS). In particular, it was shown that the built‐in parameter of the weight function allows finding the desired trade‐off between the statistical power and number of treated patients in the context of small population Phase II clinical trials. However, in real clinical trials, randomized designs are more preferable. The goal of this research is to introduce randomization to a deterministic entropy‐based sequential trial procedure generalized to multiarm setting. Several methods of randomization applied to an entropy‐based design are investigated in terms of statistical power and ENS. Namely, the four design types are considered: (a) deterministic procedures, (b) naive randomization using the inverse of entropy criteria as weights, (c) block randomization, and (d) randomized penalty parameter. The randomized entropy‐based designs are compared to randomized Gittins index (GI) and fixed randomization (FR). After the comprehensive simulation study, the following conclusion on block randomization is made: for both entropy‐based and GI‐based block randomization designs the degree of randomization induced by forward‐looking procedures is insufficient to achieve a decent statistical power. Therefore, we propose an adjustment for the forward‐looking procedure that improves power with almost no cost in terms of ENS. In addition, the properties of randomization procedures based on randomly drawn penalty parameter are also thoroughly investigated.

Each step is a separate clinical study, as each step may aim to achieve different goals.For example, Phase II clinical trial is aimed at the assessment of the efficacy and side effects.For this phase, the number of patients varies from 100 to 300.
A randomized clinical trial (RCT) is a clinical trial that embeds randomness in patient allocation, that is, the probability to assign a patient to any treatment arm should not be equal to one at any point of the experiment.In particular, the response-adaptive randomization (RAR) schemes allow for adjustment of future allocation probabilities based on the observed history of patients responses to treatment.In contrast, in a nonrandomized [i.e., deterministic] trial, which treatment arm a patient is assigned to is determined by the design characteristics and can be established at each time point.
Randomization is a key concept in medical data analysis.It ensures that researchers' preferences do not affect the allocation of the patients and do not cause differences in treatment for the patient groups.Hence, introducing randomization in deterministic setting allows to reduce investigator bias, since it prevents the investigator from making a decisions on entry of participants to the trial, so that they cannot intentionally or unintentionally impose their judgment about the best treatment and alter the design.
Some of the recent papers regarding the novel applications of RCT and response-adaptive RCT (RARCT) consider a covariate-adjusted response-adaptive designs (Zhu & Zhu, 2023), methodological considerations regarding adaptive stopping, arm dropping, and randomization in clinical trials (Granholm et al., 2022).
Considering application of the considered designs to a real clinical trial setting, RAR approaches have already been used in practice in such well-known clinical trials as I-SPY 2 considered in Barker et al. (2009) and BATTLE considered in Kim et al. (2021).The use of the RAR designs is considered in more detail in these papers: Berry et al. (2016), Villar et al. (2021), and Grieve (2017).However, the goal of this paper is to investigate the properties of new RAR entropy-based designs procedures.
The derivation and comparison of the information-theoretical designs based on the Fisher and Shannon entropy measures, was conducted in Kasianova et al. (2021).The problem with this previously suggested procedure is the same as with a well-known Gittins index (GI) approach, which is considered to be an asymptotically optimal strategy that maximizes the expected number of successes (ENS), suggested in Gittins and Jones (1979)-these approaches are deterministic.Deterministic strategies are rarely used in real-world clinical trials since if a clinician knows the allocation algorithm there is a possibility to withhold or accelerate screening of patients if several of them are waiting for enrollment into the study.In addition, in deterministic allocation schemes the design can run into local optima and randomization forces an algorithm to search for the global extrema.
It was already shown that the designs based on asymptotic criteria, namely, asymptotic Shannon (AS) and asymptotic Fisher (AF), are not only easily interpreted but can also provide an additional increase in terms of power, compared to their exact analogs.The introduction of randomization can further improve the power-ENS balance of the named designs.Hence, the goal of this paper is to introduce randomization for the designs based on asymptotic information criteria, so that at each moment of time there will be uncertainty about which arm a new patient will be assigned to.
The experimental design and calibration procedure for the experiments described in Kasianova et al. (2021) are extended to a multiarm experiment.The performance of several methods of randomization applied to the entropy-based designs will be investigated in terms of statistical power and ENS and compared to the randomized design based on GI and fixed randomization (FR) procedure.In particular, four types of the procedure are considered: (a) deterministic procedures, (b) naive randomization using the inverse of entropy criteria as weights, (c) block [or batch] randomization as proposed in Villar et al. (2015) for a forward-looking Gittins index (FLGI), and (d) randomized penalty parameter .
This paper makes two main contributions to the field.First, a method based on the randomization of the penalty parameter  was not yet considered in the literature, because of the novelty of the clinical trial design based on weighted information measures.However, this approach follows directly from the results described in Kasianova et al. (2021), where it was shown, that design characteristics can be adjusted explicitly via the penalty parameter.Second, we suggest an adjustment to forward-looking block randomization procedure, that is, when calculating allocation probabilities for each of the patients in the block of the fixed size one can assume that the block size is bigger, which hypothetically will lead to higher variance for allocation distribution and hence might increase power.
The goal of this paper is to investigate the behavior of different types of randomization procedures in terms of power and ENS.To make comparison to a randomized version of GI-based design proposed by Villar et al. (2015), the same setup is employed, that is, one "control" treatment and three alternative treatments with a binary outcome and no delay in response observations.In this paper, the control arm will be treated the same as other arms.Hence, one of the characteristic of interest will be marginal power of choosing superior arm in comparison to control, however, the number of patients on control arm is not lower-bounded.
The rest of the work proceeds as follows.The deterministic entropy-based procedure generalized for the multiarm case and the randomization approaches that will be applied to it are described in Section 2. In Section 3, the experiment setting and the calibration procedure are described.A simulation study of different types of randomized entropy-based designs and their comparison to FR, randomized designs based on GI is given in Section 4. The work concludes with the discussion.For the convenience of the readers, some background material related to the theoretical-information approach to the clinical trials is presented in the Appendix.

METHODS
In this section, the concept of the design based on context-dependent measures is generalized for the multiarm case.Then three randomization approaches that will be applied to an entropy-based design are described, namely, naive approach, forward-looking or batch randomization, and randomized penalty parameter approach.

Deterministic designs based on different context-dependent measures
In this section, the deterministic approach based on different context-dependent measures is described.The procedure is Bayesian.At the beginning of the experiment, the prior knowledge about treatment arms is given in the form of Beta prior: with parameters  > −1 and  −  > −1.
The conjugate posterior probability distribution function of  takes the following form of Beta-Bernoulli distribution, given that  patients were assigned to an arm and  responses were observed  () (|) = ( +  + 1) and concentrates in a neighborhood of the true response probability  as the sample size  grows,   → .In contrast to standard entropy measures, in case of the clinical trial design it was suggested to use weighted entropy (WE) criteria, since it allows to explicitly show the interest in the treatment arms with highest response probability.The information gain is defined as the difference between the weighted and unweighted entropy measures.By the principle of maximum entropy in the sequential trial, the next patient will be assigned to an arm one knows least about.Therefore, Mozgunov and Jaki (2019) proposed to use the minimization of the leading term of the information gain asymptotic expansions as the criterion for the allocation of patients, resulting in the following asymptotic criteria for arms selection: and for the Shannon and the Fisher information criteria which will be reffered to as AS and AF, respectively.Notice that the normalized squared difference of the target value  and the estimated probability of response , that is, ( − ) 2 , explicitly represents the interest in "exploitation."The closer the estimated response probability is to the target one, the smaller the criterion will be and the design would tend to choose the corresponding arm.
The second term, that is, ( +  + 2), shows positive dependence on the number of observations  and the build-in parameter  represents the interest in "exploration."Hence, the higher the parameter  the greater the investigator's interest in the exploration as the design will then favor the arms with fewer observations.
In the simulation study we use  = 0.999, that is, close to 1.In clinical trials, one can specify, for example,  = 0.9 if the researcher will be equally likely to choose an arm with response probability 0.8 and 1.The choice of  depends on an event of interest to the researcher.We can probably think of a design with compound outcomes, for example, responses without toxicity where it can be useful.
This approach is based on the principle of maximum entropy first suggested by Jaynes (1957), which states that the probability distribution which represents the current knowledge about the system is the one with the largest entropy.Hence, by the principle of maximum entropy in the sequential trial, the next patient should be assigned to an arm one knows least about.For more details regarding the idea and the derivation of the asymptotic weighted information-based criteria, see the Appendix.
In Kasianova et al. (2021), it was shown that the built-in parameter of the weight function  allows finding the desired trade-off between the statistical power and number of treated patients in the context of a small population clinical trials with two treatments arms.
We use the deterministic response-adaptive sequential designs based on asymptotic criteria suggested in Mozgunov and Jaki (2020) The procedure repeats until the total number of  observations is attained and, at the end of the experiment, the target arm is defined with   which minimizes the leading term of the asymptotic (  = 0.5 for AS and   ≈ 0 for AF).
In order to describe the prior distribution, the quantities  > 0 and  ∈ (0, 1) called the strength of prior and prior probability, respectively, are introduced.The prior distribution has a strength of prior  with prior probability  meaning that it takes the form: ( × ,  −  × ).To make correspondence with ,  in 1, note that,  =  ×  − 1 and  =  − 2.
The role of the Beta prior corresponds to the choice of a starting point.Indeed, asymptotically prior information is irrelevant.However, in a small sample trials the Beta prior can play a major role in the results of the procedure, since in case of lacking potential data the researcher can use his knowledge about the suggested drug, specifying the prior parameters.
In addition, one can consider the use of a "run-in" period as this method is useful for RCT design as it allows to collect information about the treatment arms and adjust the design accordingly (cf.Pablos-Méndez et al., 1998).In this paper, we do not use a "run-in" period, as the main goal of the study is comparing the design performance under different randomization schemes.
If we increase the strength of the prior , a design would force more equal allocation at least in the beginning of the trial, which will generally lead to an increase in power and a decrease in ENS.We can obtain the same results if we will increase a number of observations for the run-in period (given that the total number of patients is fixed).

Randomization approaches
The use of randomization procedures provides more evidence-based and informative statistics for assessing the differences between new treatments and control treatment.There are a number of different types of randomization depending on the goal a researcher is pursuing.For example, RAR methods are used in response-adaptive clinical trial designs.These trials are carried out in such a way that by the end of the study the largest number of patients is assigned to the most efficacious treatment arm.During RAR procedures, the probability of a patient being assigned to a particular arm changes dynamically based on data obtained at each step by the result of sequential analysis.
There are many methods of RAR, for instance, randomized play-the-winner described in Rosenberger (1999), maximum utility model considered by Graf et al. (2015), and so forth.
In this paper, deterministic approach (i) is compared to its randomized analogs (ii-iv), listed below: (i) Deterministic procedure based on weighted entropy (WE) criteria.
(ii) Naive approach with allocation probability proportional to an inverse of weighted entropy (IWE) criteria .For this design, the probability to assign a new patient to a treatment arm on each step is proportional to an inverse of information criteria (3, 4).
(iii) Forward-looking weighted entropy (FLWE): a batch-procedure approach proposed in Villar et al. (2015) that was originally applied to the designs based on GI, which is known for being asymptotically optimal in terms of maximizing ENS.
In more detail, for a trial, in which all of  patients, instead of being assigned to treatment arms one-by-one, are being enrolled in batches of size  in  = ∕ stages.It is assumed that all the outcomes in a batch are observed immediately.
Exact allocation probabilities for each of the batches are calculated using formula (3) from Villar et al. (2015), where the conditional probabilities for a certain decision made under GI rule  GI , are replaced by the conditional probabilities for information criteria rule  IC , .Following the original paper, in order to reduce computational costs, a Monte-Carlo algorithm with  = 100 is used for estimating these probabilities.
To investigate the properties of the entropy-based designs, the batch size of patients,  = 9, is set the same as in Villar et al. (2015).Note, one of the properties of this design is that, when  = 1, the procedure is equivalent to a deterministic one (FLWE procedure is equivalent to WE, and FLGI-to GI).(iv) Procedure based on randomization of , or randomized weighted entropy (RWE) procedure: for this design, instead of fixing the value of penalty parameter for each new patient,  is drawn either from uniform or from Beta distribution.
In more detail, if  is drawn from uniform distribution, namely,  ∼  [0.5, 0.99] for AS or  ∼  [0.01, 0.99] for AF, or, in view of the results obtained in Kasianova et al. (2021) on the effect of  on design performance in terms of power and ENS, one could see that the parameters of uniform distribution can be chosen more optimally if the distribution would be biased toward the values that, for instance, are "shifted" toward power (more details can be found in Section 4).
In addition, when comparing the performance of the designs based on different asymptotic criteria and the effect of the penalty parameter , the following notation will be used below: the last two letters in WE, IWE, FLWE, RWE are switched for AS or AF in accordance with the asymptotic criteria used , followed by the value of penalty parameter .Hence, 0.5refers to the forward-looking procedure based on the AS criterion with  = 0.5.

COMPARISON OF DESIGNS BASED ON DIFFERENT CONTEXT-DEPENDENT MEASURES
Below, the setting of the proposed simulation study is defined.Furthermore, the calibration of design parameters is described.

Setting
First of all, to conduct the experiment and to compare the performance of our novel method to the performance of the FLGI, the entropy-based design is extended to the case with several treatment arms and the experiment will be set exactly as in Villar et al. (2015).Consider a trial with a binary outcome and no delay in response observation.There will be four arms in the trial: a control treatment with three alternative treatments with 0 being the index for control arm and  = 1, … ,  the indexes for an alternative treatment arm.Θ 0 and Θ 1 are vectors of true response probabilities under null and under alternative, respectively.
The first setup with the null hypothesis of equal response probabilities for treatment arms  0 , … ,  3 , that is, Θ 0 = ( 0 ,  1 , … ,   ) = (0.29, 0.29, 0.29, 0.29), is used for calibration of the design.The second setup with true response probabilities for treatment arms  0 , … ,  3 , that is, Θ 1 = (0.29, 0.458, 0.168, 0.24), is used to assess performance of this design in comparison to alternative methods.The probabilities Θ 1 are fixed the same as were estimated during the NeoSphere trial which was conducted to evaluate the efficacy of a combination of drugs for treating women with breast cancer described in Gianni et al. (2012).During that well-known trial, which was also considered in Villar et al. (2015), out of 417 eligible patients, the equal number of 107 patients were randomly assigned to the control group, as well as to group 1, and group 2 each, while the remaining 96-to the group 3.
During the NeoSphere trial described, the patients were randomly allocated between the arms with a fixed and equal randomization probability.The trial started with three treatment arms and the fourth arm was added after 29 patients had been recruited to the study, hence after that the patients were assigned with (1:1:1:1) randomization.Therefore, the equal number of 107 patients were randomly assigned to the control group, as well as to group 1, and group 2 each, and 96-to the group 3.For our simulation studies, the sample size is fixed at  = 417, where for the allocation is being calculated for three control arms starting from the first patient.
The two main objectives of the experiment for this simulation study are to maximize the number of responses in the experimental sample and to make a statistically significant conclusion regarding the relative efficacy of the treatment arms.
There are several ways to test the null hypothesis with multiple arms.To compare our method to FLGI in the same manner as the original paper by Villar et al. (2015), the following procedure is used.To determine the best treatment arm a pairwise Fisher test is conducted for treatment arms  1 ,  2 ,  3 in relation to the control arm  0 .To assure that the familywise error rate is controlled at 5%, the Bonferroni correction is applied.Hence, for each individual Fisher test, the -values are compared to 0.05∕3.
However, Bonferroni-adjusted Fisher's test is conservative, since the same control arm is used for all of the comparisons.In addition, Fisher's exact test assumes that any allocation of the successes onto the two treatment arms is possible.However, this assumption can be violated in a response-adaptive design scheme, where an association of number of successes and sample size of an arm is induced by design (and in fact this is observed in Section 4.5 for the considered designs).
That is why in order to make a fair comparison type I error rate is controlled at exactly 5%, that is, the procedure was adjusted manually via a cutoff parameter  as in Villar et al. (2015).Hence, while not tackling this in the construction of the test, this is taken into account at the calibration step when the value of the cutoff parameter is chosen.The cutoff parameter  is the confidence level in a Fisher exact test, conducted at the end of the experiment.The value of  is chosen via the binary search algorithm such that type I error rate is slightly less than 0.05.For entropy-based designs with different values of penalty parameter , the cutoff parameter  is calibrated individually.Importantly, the same procedure is used for all competing design and so it allows to make a fair comparison in terms of power.
Here and below, the first treatment arm  0 is referred to as the control arm.However, for now, in terms of the design specifics, the arms are considered as equally important during the experiment in a way that one does not actually control that at least a fraction  0 ∈ (0, 1) of patients is assigned to a control arm.In further research, the design can be extended to a case of a controlled experiment.
Hence, the operating characteristics of interest are defined as follows: 1. Type I error rate.The proportion of times  0 is incorrectly rejected under the null hypothesis scenario Θ 0 = (0.29, 0.29, 0.29, 0.29).The null hypothesis for a pairwise Fisher test will be referred as  0 :  0 =   , where  is the index of an alternative treatment arm.Since a pairwise test is used, the probability to reject either of  01 ,  02 , or  03 is calculated.Type I error rate should be controlled at 5% level as in Villar et al. (2015).2. Power.The proportion of times  0 is correctly rejected for the arm with the highest success probability.Under the scenario Θ 1 = (0.29, 0.458, 0.168, 0.24), it is the proportion of times  01 is rejected.3. ENS estimated as observed number of successes with standard error (s.e.).4. Probability of a correct allocation (PCA) to the superior arm within a trial, that is, the proportion of patients on a superior treatment.
PCA and power cannot be maximized simultaneously: if PCA grows the power decreases and vice versa.The same goes in terms of power and ENS: in a deterministic procedure, the lower the ENS, the higher the power.Note that ENS and PCA are similar characteristics as they describe how often the best of four arms is given to a patient within the experimental sample.
In line with the original paper by Villar et al. (2015), ENS will also be used as a main performance evaluation criterion.However, in a study with different sample sizes or several alternative response probabilities for treatment arms, PCA is more preferable characteristic, since, unlike ENS, PCA does not depend on the probability of success and sample size.
The goal of using different randomization strategies is to decrease possible biases and find a way to improve power-ENS or power-PCA balance.First, the 0.5 and 0.001designs will be considered in order to observe the behavior of the procedure without the effect of .Then, to investigate the effect of randomized , other values of  will also be considered for all design types.
Below, the details on the calibration of the designs in an extensive simulations study are given.Computer simulations for each scenario involved 10,000 trial replications.The target response probability  = 0.999 is set as in Kasianova et al. (2021).To satisfy the principle of clinical equipoise described in Djulbegovic (2009) the same prior probability  = 0.99 and strength of prior  = 2 are set the same for each arm.In this paper, the strength of prior  is fixed as its influence on design characteristics was already investigated in Kasianova et al. (2021).

PERFORMANCE FOR DIFFERENT RANDOMIZATION PROCEDURES
Below the operating characteristics of the designs with different penalty parameter values and randomization methods are investigated.The information criteria-based designs are compared to GI, FLGI, and FR designs based on a simulation study with 10,000 replications.First, the properties of deterministic entropy-based designs are considered in order to distinguish the values of penalty parameter  that may be the focus of interest to the researcher.Second, within each group of the designs combined based on the randomization procedure type, namely, (a) naive approach, (b) forward-looking procedures, and (c) randomized penalty parameter , several designs that outperform their comparators in terms of ENS or power will be chosen for a more rigorous investigation in terms of the number of patients, estimated efficacy, and allocation probability dynamics during the trial.

Deterministic procedure
Given that Θ 1 = (0.29, 0.458, 0.168, 0.24) and all of the procedures are calibrated such that type I error rate is controlled at exactly 5% level, deterministic designs without the effect of penalty parameter , namely, 0.5 and 0.01perform slightly better in terms of ENS in comparison to GI procedure.As seen from Figure 1, for these designs ENS is equal to 181.2 and 180.9, respectively, when for GI this value is around 180.2.However, in terms of power GI outperforms both of these designs with by 0.023 and 0.01, respectively.For all of the named designs, such small values in terms of power are indicating that the design quickly determines which treatment arm is superior and assigns most of the patients to it.One of the designs that will be considered further is the randomized  design, where  is uniformly distributed.Two settings for the parameters of the uniform distribution will be considered: all possible values of  and the boundaries for  selected based on their relative improvements in terms of the characteristics of interest.Overall, considering the effect of the penalty parameter on the design performance, as expected the closer  is to 0.5 for the AS the higher is the ENS and the lower is the power.For the AF, with the increase of penalty parameter  the power also grows slightly.One of the reasons for this behavior may be that only marginal power is reported.However, for the AS, the designs with  > 0.7 perform noticeably worse in terms of ENS in comparison to 0.7,without any significant improvements in terms of power.The same can be stated for the AF-based designs with  > 0.3 in comparison to 0.3.Hence, instead of giving equal weights to the designs with different values of , we propose to consider only  ∈ [0.5, 0.7] for the AS and  ∈ [0.01, 0.3] for the AF, since these values of  can provide improved balances in terms of the characteristics of interest.

IWE (naive approach)
The operating characteristics, for the IWE approach, where a probability to assign a new patient to a treatment arm on each step is proportional to an inverse of information measure criteria, are given in Figure 2.
In terms of ENS, the effect of  is anticipated: the lower is  the higher is ENS.In particular, the difference in terms of ENS between 0.5 and 0.9 is 6.3, and, between 0.01 and 0.9 is 12.2.Again, the behavior observed for deterministic AF-based designs, with the growth of  power is also slightly decreasing (from 0.62 to 0.653 for  and from 0.623 to 0.673 for ).Hence, for this particular setting Θ 1 , it may happen that the designs with a balance shifted toward ENS will increase the number of observations on the most efficacious arm without a significant decrease in observations for other arms resulting in the growth of marginal power.
The design chosen for a closer investigation, based on this analysis, is 0.01 with ENS equal to 142.9 and the power equal to 0.673, which outperforms the FR design in terms of both characteristics.

Forward-looking procedure
Below, in Figure 3, the forward-looking designs with different parameters are considered.The results of applying the forward-looking scheme to both entropy-based and GI-based designs imply that block randomization does not randomize enough.Hence, we propose adjustments for the forward-looking procedure that can improve the performance of the procedure in terms of power.Also, the effect of the additional penalty parameter on the forward-looking designs is investigated.Given the setting for the forward-looking procedure described in Villar et al. (2015), the parameter that can be altered for both entropy-based and GI-based designs is the number of patients in a block.The effect on block size is that, with other variables being fixed, the power grows with the block size.However, to obtain a substantial increase in terms of power, the block size should be at least doubled, which may be impossible in terms of the real clinical trial.Hence, in further analysis, the alternative ways to increase power will be proposed and the actual block size for all of the forward-looking procedures will be fixed at 9 as in Villar et al. (2015).
Since the forward-looking procedure was applied to the entropy-based design, an additional parameter to alter the design characteristics is available.From Figure 3, it can be observed that for 0.15and 0.3 in comparison to 0.01 the power increases from 0.305 to 0.447 and 0.658, respectively, whereas ENS decreases from 179.8 to 173.5 and 160.8, respectively.The same effect can be seen when comparing 0.65 and 0.7 to 0.5.
The proposed adjustment to the forward-looking procedure is as follows.Consider the procedure with the block size fixed at 9 and no effect of the penalty parameter, that is,  = 0.5 for AS and  = 0.01 for AF.When calculating allocation probabilities for each block one can assume that the block size is bigger, which will lead to higher variance for allocation distribution and hence might increase power.This parameter will be referred to as hypothetical block size.Alternatively, for the calculations of allocation in the FLWE procedures the value of penalty parameter  can also differ from the one fixed throughout the experiment itself.This parameter will be referred to as hypothetical penalty parameter .
From Figure 3, the effect of increasing hypothetical block size is visible: increasing the hypothetical block size from 9 to 72 can increase the power for the 0.5 by 0.042, for the 0.01by 0.06, and the  by 0.046.Doubling the hypothetical block size again does not bring notable improvement in terms of power, though it doubles computational costs, which is one of the problems for that design.Note that the effect on ENS is different for the named designs, that is, for the 0.5 the drop in ENS is 0.07, for the 0.01-0.49,and the  by 0.78.There is almost no difference in terms of ENS for , meaning that the improvement in terms of power using this ad hoc adjustment can be achieved with almost zero costs in terms of ENS.However, for  decreases ENS almost by one patient.
When calculating allocation probabilities, to increase the variance for allocation distribution the hypothetical  can also be set higher than the value of  used in the experiment.From Figure 3, it can be seen that choosing relatively big values for hypothetical , that is, 0.5 and higher for the 0.01or 0.75 and higher for the 0.5,leads to a sharp increase in terms of power and decrease in terms of ENS leading to the performance comparable with IWE desings (see Figure 2), for example, for 0.01with hypothetical  = 0.5 power increases to 0.688 and ENS decreases to 148.5.
A small increase in hypothetical , that is, 0.1 for the 0.01or 0.55 for the 0.5,leads to a bigger decrease in terms of ENS with no additional gain in terms of power in comparison to the performance of the FLWE with increased hypothetical block size.This result shows that FLWE designs with increased hypothetical block size leads to a better performance if one is interested in designs with a balance shifted toward ENS.Therefore, the designs with increased hypothetical  will not be considered further.
Finally, the entropy-based design considered for a further investigation would be 0.01with hypothetical block size 72 as it provides the largest increase in terms of power with a moderate decrease in terms of ENS.It will be compared to  with the same hypothetical block size.Also, the 0.3will be considered for a closer investigation as a design with the highest power.

Randomization of the penalty parameter
First, for randomized  approach (RWE),  is drawn from uniform distribution, for the randomized AS (RAS)  ∼  [0.5, 0.99], for the randomized AF (RAF)  ∼  [0.01, 0.99].For both the RAS and RAF, the power of the designs is higher than of FR design, with 0.659 and 0.673, respectively.In terms of ENS, both RAS and RAF perform notably better than the FR, with 152.4 and 144.9, respectively (see Figure 4).When the parameters of the uniform distribution are calibrated in accordance with the results obtained for deterministic procedures, the distribution for the penalty parameter is  ∼  [0.5, 0.7] for the AS and  ∼  [0.01, 0.3] for the AF.For the RAF, the power of the designs is comparable to the power for the FR design, being 0.628, but ENS is further improved up to 166.2.For the AS, the power is 0.535 with ENS equal to 169.1.The performance in terms of ENS for all uniformly randomized  designs is better than of any of the designs from the naive randomization method (IWE).
Furthermore, consider the randomized  procedure with the rescaled Beta distribution.For both RAS and RAF, altering the mean of the rescaled distribution allows obtaining a simple and computationally cheap randomized procedure with different balances between power and ENS.The effect of  on both power and ENS is predictable.For example, for the RAF increasing the mean from 0.1 to 0.9 will allow to increase power from 0.481 to 0.666, while ENS decreases from 168.4 to 135.9.The same dynamics are observed for the RAS.
The performance of simple randomization of  is inspiring and shows the potential of the investigation of the more advanced procedures based on randomized , such as dynamically varying .The RAF with  ∼  [0.01, 0.3] will be chosen for a further investigation.

Design performance dynamics during the trial
Above, the following randomized entropy-based designs were chosen for a closer investigation: inverse entropy criteria approach design 0.01;forward-looking procedures 0.01with hypothetical block size equal to 72, and 0.3;randomized penalty parameter approach  with  ∼  [0.01, 0.3] (blue line).For the sake of comparison, the deterministic procedure 0.01 and the alternative approach  with hypothetical block size 72 are also considered.These designs will be investigated in terms of the number of patients, estimated efficacy, and allocation probability dynamics during the trial.First, to investigate the behavior of the randomized response-adaptive entropy-based design, the average number of patients assigned to each arm is calculated.From Figure 5, it can be observed that the number of observations increases almost linearly for each of the design types for the superior treatment arm ( = 1).For any patient index, the number of patients assigned to the superior treatment arm for 0.01and  is almost as high as of the deterministic design .
Second, consider how the estimated response probability of each treatment arm changes as the experiment continues.In Figure 6, it can be seen, that the estimate of response probability for the superior treatment arm approaches the true value of 0.458, with a slightly different speed depending on the randomization type.The difference between true and estimated efficacy values on the inferior arms also depends on the randomization type with the highest difference for the deterministic procedures (0.01)and the smallest difference for the naive randomization approach (0.01).
Note that the growth rate of the number of patients on the inferior arms quickly decays in the case of deterministic and forward-looking procedures  and 0.01(see Figure 7).This leads to the fact that the estimated response probabilities will not converge to the true values, since, these treatment arms are considered so inferior, that is, arms 0, 2, and 3, that the allocation probability quickly approaches zero.
Third, an important characteristic of the randomized procedure is the probability distribution for patients' allocation.The important feature of a randomized procedure is that it ensures that for a superior arm allocation probability does not converge to 1, and, vice versa, for inferior arms allocation probabilities should not converge to zero.As can be seen in Figure 7, forward-looking randomization procedures do not randomize the algorithm enough, since the allocation probabilities for superior and inferior arms converge to 1 and 0's, respectively, with almost the same speed as for deterministic design.
On the other hand, in the case of the naive randomization approach, for the superior arm allocation probability for each new patient does not exceed 0.51.For the randomized  approach and 0.3,the resulting allocation probabilities are something in the middle between these two extremes.
Another important feature of the randomized procedure is the variability of the allocation probability throughout the experiment.For the deterministic procedure, allocation probabilities do not vary as much as for the other approaches, especially for the superior treatment arm, causing an undesirable situation when each new patient will be assigned to a superior treatment with a higher probability than the previous patient.That leads to a problem that the procedure is designed to be unfair to the earlier patients.One of the reasons for introducing randomization to the deterministic procedures is to ensure this does not happen.
From Figure 7, it is clear that naive randomization approach 0.01,forward-looking procedure with increased penalty parameter 0.3,and randomized  approach  provide the means to introduce variability in patients' Estimated response probability p for each treatment arm  ∈ {0, 1, 2, 3} for the 0.01(black line), 0.01(red line), 0.01and  with hypothetical block size 72 (green and yellow line, respectively), 0.3(violet line) and  with  ∼  [0.01, 0.3] (blue line).The lines' starting points correspond to the first observation number such that for all of 10,000 iterations at least one of the patients received the treatment .Note that the first arm ( = 1) is superior with the true response probabilities for each arm being Θ 1 = (0.29, 0.458, 0.168, 0.24), which are denoted as horizontal dashed lines.allocation.However, forward-looking procedures  and 0.01suffer from the fact that for the superior arm a patient in each new batch will receive better treatment with higher probability (starting from 3-4th batch).

Design performance under other scenarios
To determine the effect of different design parameters we have chosen several designs, that were leading to a good balance in the operating characteristics, namely: 0.01,0.01and  with hypothetical block size 9 and 72, 0.2 with Beta distributed  with mean 0.2, and  with  ∼ [0.01,0.3].Also we considered deterministic designs 0.01,, and  for comparison.Each of the designs is calibrated so that type I error is controlled exactly at 5%.To determine the effect of sample size, we have additionally considered sample sizes 100, 200, 300, and 417 with one control and three experiment arms with Θ 1 = (0.29, 0.458, 0.168, 0.24) (see Figure 8).
The effect of sample size on PCA can be observed from the right subplot in Figure 8.With the increase of the sample size the PCA gradually increases for all of the designs except .Note that the difference in PCA for all the designs that have a balance shifted toward PCA (or ENS) slightly decreases: as the sample size grows from 100 up to 417 the difference in terms of PCA between 0.01 and  with hypothetical block size 72 decreases from 0.132 to 0.035.Whereas the difference in PCA for the designs with the balance shifted toward power, that is, 0.01 and 0.2, is more stable with a i i F I G U R E 7 Average allocation probabilities ā for each treatment arm  ∈ {0, 1, 2, 3} for the 0.01(black line), 0.01(red line), 0.01and  with hypothetical block size 72 (green and yellow line, respectively), 0.3(violet line), and  with  ∼  [0.01, 0.3] (blue line).Note that the first arm ( = 1) is superior with the true response probabilities for each arm being Θ 1 = (0.29, 0.458, 0.168, 0.24).
the average of 0.178.Finally, the difference in PCA between designs with the balance shifted toward PCA and with balance shifted toward power grows with the sample size, that is, the difference as the sample size grows from 100 up to 417 the absolute difference in terms of PCA between  with uniformly distributed  and  with hypothetical block size 72 increases from 0.004 to 0.15.
The effect of sample size on power can be observed from the left subplot in Figure 8.For all of the designs except deterministic designs, we see the increase in the marginal power with the growth of the sample size.In particular, for small samples marginal power for deterministic designs is greater than for the designs shifted toward power.This can be explained by the fact that in our analysis we only consider marginal power for testing the hypothesis  01 ∶  0 =  1 , which means that for small sample sizes these designs will force more equal allocation between arms, leading to a smaller amount of observations on control and winning treatment arms in comparison to designs with the balance shifted toward PCA.This effect on deterministic designs is investigated in more detail in Supplementary Materials.In addition, the same observation can be made when considering the effect of strength of prior  and number of treatment arms on the design characteristics.
To investigate the effect of choosing other success probabilities the following experiment setting was chosen: one control and two experiment arms with Θ = (0.3,  1 ,  2 ), where  1,2 = [0.1,0.3, 0.5, 0.7, 0.9].Sample size was fixed at 200.The resulting averaged operational characteristics can be seen in Figure 9.
From Figure 9, we can differentiate the designs that has the balance shifted toward power, that is,  and , and all of the other designs as the designs with the balance shifted toward PCA.Considering each pair of the designs, except 0.01 and , in order to increasing power PCA has to drop and otherwise.In particular, for 0.01 and , the power for 0.01 is greater than of  by 0.04, while the difference in terms of PCA is we can see that the difference in terms of power is 0.23.The effect of different success probabilities on operational characteristics for each of the specific scenarios can be found in the Supplementary Materials.

DISCUSSION
This paper extends the idea of using the response-adaptive designs for clinical trials based on the context-dependent information measure introduced in Mozgunov and Jaki (2020).Following the results achieved in Kasianova et al. (2021), the design is generalized to a multiarm setting and randomization is introduced into a deterministic procedure.The results achieved by analyzing the operating characteristics of the randomized response-adaptive procedures based on information criteria allows to explore the problem of achieving the balance between two competing goals, namely, power and ENS even further.
In this paper, three randomization approaches were considered: (a) naive randomization using the inverse of entropy criteria, (b) forward-looking procedures, and (c) randomized penalty parameter approach.The randomized entropy-based designs are compared to randomized GI proposed by Villar et al. (2015) and FR.
Based on the analysis of the dynamics of the allocation probabilities during the trial, it was found that a forwardlooking procedure introduced in Villar et al. (2015) does not randomize the algorithm enough, even when the proposed adjustment to increase the power of the experiment is applied.However, the proposed improvement, namely, the increase of the hypothetical block size used for calculating allocation probabilities on a batch in a forward-looking procedure, allows for an increase in terms of power with almost no costs in terms of ENS for the entropy-based designs.
The proposed randomization approach specific for the entropy-based design, namely, the randomized penalty parameter method, provides a simple and computationally cheap randomized procedure that allows achieving different balances between power and ENS.In particular, when the distribution of  is set as the uniform distribution with the calibrated parameters, such that, the designs with higher ENS are preferred, the design allows for around 38.8% increase in terms of ENS in comparison to the FR, with almost no loss in terms of power.
The randomized  approach with a rescaled Beta distribution is an intuitive way of tweaking the design parameters, that is, it is possible to achieve different power-ENS balances in a similar manner to a deterministic design described in Kasianova et al. (2021).The performance of simple randomization of  shows the potential for the investigation of the more advanced procedures based on randomized , for example, the use of other distributions or dynamically varying  can be investigated.
In this paper, the control arm was treated in the same way as other treatment arms.At the next stages of work, the controlled procedures will be considered, that is, the design will be constructed in a way that at least a fraction  0 ∈ (0, 1) of patients is assigned to a control arm in any case.
Indeed, many RAR designs do not have a fixed sample size.The entropy-based designs that we have suggested allows to adjust  dynamically throughout the whole experiment so that the balance is shifted toward power (if the design sticks to a roughly uniform distribution among the arms) or shifted toward ENS (if the design sticks to some specific arm), for example, when enough information is collected on each of the treatment arms  can be decreased, so that the probability to assign the patient to a superior arm increases.Currently it is a work in progress.
Furthermore, the subject of future research is the problem of a delayed response.One of the assumptions of the proposed designs is that the patients' responses are observed immediately.However, there are many settings in which this assumption is not held and there is a delay in evaluating the efficacy of response.If the decision is made before the outcome for the latest patient is observed then the strategy may be suboptimal.Although, the idea to assign a new patient to a different treatment arm (chosen at random, or second optimal given the observed information) can be considered unethical.Also, the concentration on binary responses in the studies is oversimplified and does not account for the need to use nonbinary endpoints and continuous responses.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflicts of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the supplementary material of this article.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results.The data is available in the Supporting Information section.This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results.The results reported in this article could fully be reproduced.

R E F E R E N C E S
Barker, A. D., Sigman, C. C., Kelloff, G. J., Hylton, N. M., Berry, D. A., & Esserman, L. J. (2009).I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy.American Society for Clinical Pharmacology and Therapeutics,86,[97][98][99][100] was proposed in Mozgunov andJaki (2019, 2020).Formally, WDE differs from standard Shannon differential entropy by a positive weight function  ∶ ℝ ↦ ℝ + .The weight function allows for the algorithm to take into account the fact that some outcomes are more desirable than others and ensures that during sequential trials the design would tend to explore more efficacious treatment arms.
In case of binary trial, the weight function takes a Beta form where  is the unknown true response probability,  is target response probability,  is the number of successes, Λ(⋅) is a constant satisfying the normalization condition ∫ ℝ  ()  () () ()d = 1, and  is the penalty parameter.The weight function is emphasizing in the treatment arm with an estimated response probability close to the target response probability  = 0.999.
In Kasianova et al. (2021), it was shown that the built-in parameter of the weight function  allows finding the desired trade-off between the statistical power and number of treated patients in the context of a small population clinical trials with two treatments arms.
Initially, in Kasianova et al. (2021) the exact criteria were considered.The idea behind using the weighting function in formula (5) allows to tackle the trade-off between power and balance explicitly using parameter  and it can be seen in asymptotic criteria.Then the asymptotic criteria were suggested as the difference in design characteristics is minor and formulas for asymptotic criteria have an intuitive explanation.For both exact and asymptotic criteria, generally the effect of the increase in  is the same, that is, it results in the increase in power and decrease in the probability of a correct allocation (PCA).
The properties of response-adaptive procedures based on other information measures, namely, the Renyi, Tsallis, and Fisher information criteria, were investigated in a simulation study in Kasianova et al. (2021).It was shown that designs based on the Shannon and Fisher information criteria provide a better power-PCA balance in comparison to the Tsallis and Renyi criteria.Hence, in this paper, only designs based on Fisher and Shannon information criteria are thoroughly discussed.
Given that the prior knowledge about treatment arms is given in the form of Beta prior (1), the posterior probability distribution function of  takes the form of Beta-Bernoulli distribution (2).Note that given the weight function (A1) the estimated response probability satisfies To select an arm to assign a new patient one need to specify a criterion for the selection process.Following the maximum entropy principle, see MacKay (2003), the "exact" entropy criteria for the allocation of patients are defined as where  stands for Fisher information, ℎ stands for Shannon information, and    and ℎ   are their weighted analogs.
In Mozgunov and Jaki (2019), the minimization of the leading term of the information gain asymptotic expansions was proposed as the criterion for the allocation of patients, resulting in the asymptotic criteria for arms selection, see Equations ( 3) and (4), for the Shannon and the Fisher information criteria, respectively.In Kasianova et al. (2021), it was shown that using exact criteria brings no sufficient gain in terms of the power or PCA.Since, the resulting asymptotic criteria can be easily interpreted compared to the exact formulas, in this paper only the designs based on asymptotic criteria for Shannon and Fisher information measures will be in focus.
Marginal power  01 versus expected number of successes (ENS) for the deterministic designs, namely, the asymptotic Shannon (AS, green dots), asymptotic Fisher (AF, red dots), and Gittins index (GI, violet dot), in comparison to the fixed randomization (FR, blue dot).The size of dots corresponds to the value of : smallest dot,  = 0.01; largest dot,  = 0.9.Note that the dots representing 0.01 and 0.5 in the bottom right corner are almost identical.
Marginal power  01 versus expected number of successes (ENS) for the randomized penalty parameter randomization procedure, namely, the randomized asymptotic Shannon (RAS, green symbols), randomized asymptotic Fisher (RAF, red symbols), in comparison to the fixed randomization (FR, blue dot, and Gittins index (GI, violet dot).The size of dots corresponds to the value of the mean of the Beta distribution for : smallest dot,  = 0.1; largest dot,  = 0.9.The shape of the dots corresponds to the distribution for : Beta distribution (circle open); uniform distribution with all values for , that is,  ∼  [0.5, 0.99] for the RAS and  ∼  [0.01, 0.99] for the RAF (triangle filled); uniform distribution with selected values for , that is,  ∼  [0.5, 0.7] for the RAS and  ∼  [0.01, 0.3] for the RAF (diamond filled).
This report is independent research supported by the National Institute for Health Research (NIHR Advanced Fellowship, Dr. Pavel Mozgunov, NIHR300576).The views expressed in this publication are those of the authors and not necessarily those of the NHS, the National Institute for Health Research, or the Department of Health and Social Care.P. Mozgunov received funding from Medical Research Council (MC_UU_00002/14).K. Kasianova and M. Kelbert are supported by the RSF Grant N 23-21-00052.The article was prepared within the framework of the HSE University Basic Research Program.
) = Λ(, , , , , )   (1 − ) (1−)  , (A1) , which is easily generalized for the multiarm setting.Consider  alternative treatment arms { 1 , … ,   }.Denote by   ,   ,   ,   , and α parameters for the arm   .The experiment starts with the arm that minimizes one of the Effect of sample size on marginal power  01 (left subplot) and probability of a correct allocation (PCA, right subplot) for the designs 0.01(blue), 0.01with hypothetical block size 9 (olive) and 72 (orange, smaller dots),  with hypothetical block size 9 (viridian) and 72 (green, smaller dots), 0.2 with Beta distributed  with mean 0.2 (pink), and  with  ∼ [0.01,0.3] (violet).Entropy-based designs are denoted by circles and connected with straight line, designs based on Gittins index-squares and dashed lines, fixed randomization-triangles and dotted line.If applicable, the size of the dots corresponds to the value of the hypothetical block size: larger dots-9, smaller dots-72.
The properties of entropy-based designs are investigated in a comprehensive simulation study.Average marginal power  01 versus probability of a correct allocation (PCA) across considered scenarios for the designs 0.01(blue), 0.01with hypothetical block size 9 (olive) and 72 (orange, smaller dots),  with hypothetical block size 9 (viridian) and 72 (green, smaller dots), 0.2 with Beta distributed  with mean 0.2 (pink), and  with  ∼ [0.01,0.3] (violet).Entropy-based designs are denoted by circles, designs based on Gittins index-squares, fixed -triangle.If applicable, the size of the dots corresponds to the value of the hypothetical block size: larger dots-9, smaller dots-72.