Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule

Summary We introduce a non-myopic, covariate-adjusted response adaptive (CARA) allocation design for multi-armed clinical trials. The allocation scheme is a computationally tractable procedure based on the Gittins index solution to the classic multi-armed bandit problem and extends the procedure recently proposed in Villar et al. (2015). Our proposed CARA randomization procedure is defined by reformulating the bandit problem with covariates into a classic bandit problem in which there are multiple combination arms, considering every arm per each covariate category as a distinct treatment arm. We then apply a heuristically modified Gittins index rule to solve the problem and define allocation probabilities from the resulting solution. We report the efficiency, balance, and ethical performance of our approach compared to existing CARA methods using a recently published clinical trial as motivation. The net savings in terms of expected number of treatment failures is considerably larger and probably enough to make this design attractive for certain studies where known covariates are expected to be important, stratification is not desired, treatment failures have a high ethical cost, and the disease under study is rare. In a two-armed context, this patient benefit advantage comes at the expense of increased variability in the allocation proportions and a reduction in statistical power. However, in a multi-armed context, simple modifications of the proposed CARA rule can be incorporated so that an ethical advantage can be offered without sacrificing power in comparison with balanced designs.

of several treatments with an infinite number of patients, it also provides a deterministic patient allocation rule that aims to optimize patient benefit on average. In order to do so, the rule must dynamically address the ethical conflict between learning (efficiency/power) and earning (patient benefit/ethics) after every patient is treated, its outcome observed and considering the potential outcomes of the future patients, given the observed history.
The multi-armed bandit problem and the Gittins index are based on a set of assumptions which may be restrictive when considered from a practical point of view . Particularly important assumptions include the infinite size of the trial, the observability of each patient's outcome before treating the next patient, and the lack of randomization of the resulting patient allocation rule. Any extensions of the original model that result from relaxing some (or all) of these assumptions would, in general, require either finding an appropriate extension of the Gittins index rule for the relaxed model (e.g., an index for the finite horizon problem investigated by ), or otherwise relying on a computational solution using dynamic programming (e.g., as in Cheng and Berry (2007) or Williamson et al. (2017)). The latter approach requires the problem to be of a tractable size. An alternative approach was proposed in  where the Gittins index rule was used to define a non-myopic response-adaptive randomized procedure for the design of finite-sized trials-namely, the block randomization procedure referred to as the forward looking Gittins index (FLGI).
Incorporating covariates into the multi-armed bandit model is one such extension. There are at least two reasons why this would be relevant. First, including covariate information into the model would imply relaxing the assumption that observations of a given treatment are exchangeable (i.e., that subjects receiving the same treatment arm have the same probability of success). This would, in turn, allow for the inclusion of treatment-covariate interactions and the modified bandit model with covariates would maximize patient benefit by assigning more patients to a superior treatment, given their covariate profile. Second, methods that promote balance on important known covariates have become a general standard among practicing clinical trialists. However, there are many relevant instances in which balance does not lead to efficiency or ethically attractive designs, as shown in Rosenberger and Sverdlov (2008). A bandit model with covariates would illustrate this conflict, as balance on covariates would never be achieved by its optimal solution rule if treatments are perceived differently among covariate groups.
In this article, we address the issue of introducing covariates into the classic multi-armed bandit model. We first present a deterministic implementation of the Gittins index that makes use of covariate information, and then use it to define a covariate-adjusted responseadaptive (CARA) randomization procedure that is non-myopic and applied to blocks of patients rather than individuals, as suggested by Rosenberger and Lachin (1993). The resulting CARA procedure sacrifices a small amount of expected patient benefit in order to introduce randomization. Our procedure differs from existing CARA procedures by optimizing patient benefit in an unconstrained fashion, rather than with a constraint to preserve power (e.g., Rosenberger, Vidyashankar, and Agarwal, 2001).
The bandit literature has very few relevant papers following the lines of Gittins' work. Woodroofe (1979) studied the optimal policy structure of a simplified special case of a onearmed bandit problem with a covariate. Clayton (1989) concludes that the existence of the index function depends on the functional form of the assumed relation between outcome and covariate (or link function), among other assumptions. Yet, for the usual logit function, the existence of the index is only conjectured under certain constrains on the parameter space. More recently, there has been work on randomized bandits with covariates by Yang and Zhu (2002). The authors introduce a myopic randomized solution that is asymptotically consistent. We take a different approach, concentrating on the introduction of covariates and randomization to the Gittins index, by taking into account future sequences of allocations and covariate values under the Gittins rule. Our simple, heuristic approach thereby aims to achieve a near-optimal mean total rewards criterion in a computationally feasible way.
In Section 2, we introduce the modified Gittins' rule and demonstrate its application in a clinical trial setting with a binary outcome and a binary covariate. We introduce our probabilistic implementation of the modified Gittins index, which we call the covariateadjusted response-adaptive forward looking Gittins index (CARA FLGI), in Section 3. In Section 4, we compare our approach to alternative procedures by performing simulations, including scenarios in the context of a recently published trial (Schortgen et al., 2012). We briefly discuss extensions to multiple polychotomous covariates in Section 5. We draw conclusions in Section 6.

The Gittins Rule for a Model with Covariate Information
Consider a clinical trial to test the effectiveness of T experimental treatments against a control treatment on a fixed sample of N patients. Assume T and N are fixed and known. Before a patient n (n = 1, … , N) is allocated to treatment t (t = 0, … , T), where t = 0 denotes the control, a binary characteristic or covariate Z n is observed for patient n. Immediately after making a treatment decision, that is, patient n receives treatment t, a binary outcome variable (sucess/failure) Y t,n is observed with Pr(Y t,n = 1) = p t , the true unknown response probability for treatment t. Assume that Y t,n = 1 if the treatment is successful and 0 otherwise. Patients enter the trial one-by-one and the outcome for patient n is observed before patient n + 1 appears.
Let Z n ~ Bernoulli(q) for all n = 1, … , N, where Z n = 0, 1, respectively, indicates patient n is covariate negative or covariate positive. Let Y t,n ~ Bernoulli (p t (z n )) for all n = 1, … , N. For example, we can assume that for all n: p t (z n ) = Expit (α t + β t z n ) for t = 0, … , T with α t and β t being unknown parameters and where we have defined We will further assume that the probability of the covariate taking value 1, that is, q, is known. The goal is to find a patient allocation procedure that, taking into account the covariate and outcome observations available at each time, maximizes the expected total number of successes after the N patients have been treated. The resulting optimization Villar and Rosenberger Page 3 Biometrics. Author manuscript; available in PMC 2018 July 23. problem will no longer admit a Gittins index solution as each arm's state variable includes the covariate data, which continues to evolve regardless of whether a treatment is allocated or not, making the bandit formulation restless.
In order to define a Gittins index heuristic solution for this problem with covariates, we will reformulate it as follows. We will consider that for every treatment-covariate combination there exists a independent arm, which will be indexed by its covariate value and treatment arm label, that is, zt. For example, the arm 00 is the arm corresponding to covariate value 0 and the control treatment. Therefore, there will be 2 (T + 1) arms in this reformulated version of the classic multi-armed bandit problem, each of which has a success rate p zt .
Let each parameter p zt be assigned a Beta(s zt,0 , f zt,0 ) prior density at the start of the trial, where s zt,0 and f zt,0 denote prior beliefs about the relative chances of success and failure of arm zt, respectively. Given the conjugacy of the prior and Bernoulli distributed outcome, these priors are converted into beta posteriors for p zt via Bayes theorem as patients enter the trial, are assigned to a treatment arm given their observed covariate z n and subsequently experience a success or a failure. Let X zt,n = (s zt,0 + S zt,n , f zt,0 + F zt,n ) be the 2 state-vector of available information on arm zt before treating patient n, where the random vector (S zt,n , F zt,n ) represents the number of successful and unsuccessful outcomes for arm zt up to patient n. The posterior for p zt after having treated patient n and observing s zt,n successes and f zt,n failures, is: f (p zt |x zt,n ) ~ Beta(s zt,0 + s zt,n , f zt,0 + f zt,n ) with its posterior mean being . Finally, let a zt,n be the binary indicator variable denoting whether patient n + 1 is assigned to arm zt or not. The multi-armed bandit optimization problem is to find an allocation rule π such that: where t′ = {1, 2, … , T, T + 1, T + 2, … 2(T + 1)} stand for 00, 01, … , 0T, 10, 11, … , 1T, respectively, x 0 = (x t′, 0 ) t′ = 1 2(T + 1) is the initial joint state with all the prior parameters, Ε π [·] denotes expectation under allocation rule π, and d is a discount factor (i.e., 0 ≤ d < 1) introduced for reasons of tractability, so that a trial of infinite size (N = ∞) can be assumed. Thus, V D * (x 0 ) is the optimal expected total discounted value function conditional x̃0 over П, the family of admissible allocation rules, which for this particular reformulated model are those for which for all z n = 0 it holds that ∑ t′ = 1 T + 1 a t′, n = 1 and ∑ t′ = T + 2 2(T + 1) a t′, n = 0 while for all z n = 1 it holds that ∑ t′ = 1 T + 1 a t′, n = 0 and ∑ t′ = T + 2 2(T + 1) a t′, n = 1 (in words, those rules that allocate an arm zt that is available for covariate value z n ). Put simply, (1) is the maximum average (discounted) number of patients responses attainable given the initial information on the available treatments and covariates before the start of the trial.
The solution to (1) could in principle be found via dynamic programming, using a backward induction algorithm. However, this becomes computationally infeasible for relatively small Villar and Rosenberger Page 4 Biometrics. Author manuscript; available in PMC 2018 July 23.
values of N and 2(T + 1) and is further extremely difficult to implement in practice. A computationally tractable solution to (1) when considered for N = ∞ based on the indices proposed by Gittins and Jones (1979) would be to allocate patient n with covariate z n to the arm that is available for covariate z n with the highest Gittins index at time n -1. For arm zt at time n, and stopping time τ, this is denoted x zt , n where: Here (2) represents the ratio between the total expected discounted number of successes observed after allocating arm zt from patient 1 up to τ given the initial information on arm zt and the total expected discounted number of patients treated with arm zt from patient 1 to τ given the initial information on arm zt. We will refer to this solution as the CARA Gittins index rule (CARA GI). Notice that the index defined by (2) depends only on the current information state of arm zt. These Gittins indices can be calculated by solving the problem of allocating patients optimally between treatment zt and a known treatment which yields a constant reward. For a detailed explanation of how the Gittins index rule is deployed and a table with values see Tables 4 and 5 of Web Appendix A in .

Index Rule
Following the approach introduced in Villar et al. (2015), we will assume that instead of enrolling the N patients one-by-one, patients are enrolled in groups of size b over J stages, so that J × b = N. We wish to specify a CARA rule based on the CARA Gittins index rule that sequentially randomizes the next b patients among the T + 1 treatments at stage j (j=1,… , J) given the data up to block j -1 and each of the patients observed covariate value z n . This translates to determining π zt,j , where: π zt,j = the probability of allocation for patients to treatment zt at stage j (j = 1,… , J), which is common to all patients with covariate value z in block j, when using the CARA Gittins index rule as defined in Section 2 and given data observed up to the stage j -1 (and thus patient (j -1) × b), denoted by x̃( j-1)b . Note that x̃( j-1)b can be written as a 2(T + 1) × 2 matrix in which row t′ represents the parameters of arm t′'s current posterior distribution up to patient (j -1)b. This marginal probability is obtained via the procedure next explained. Let bZ =1 represent the expected number of patients with covariate positive value in a block of size b, that is, bZ =1 = b × q and let a zt, n GI be the binary variable representing if arm zt is allocated to patient n under the CARA Gittins index rule (a zt, n GI = 1) or not (a zt, n GI = 0) . The formula below represents the probability of allocating each combination arm zt in a block of size bZ =z given that the first patient in the block has an observed covariate value of Z n = z and when using the Gittins index rule: Villar and Rosenberger Page 5 Biometrics. Author manuscript; available in PMC 2018 July 23.
Here Ω n-1 represents the set of all possible values for X̃n -1 given initial data x̃( j-1)b for every future patient n in (j -1)b + 1, … , jb under the CARA Gittins index rule described in Section 2 (summarized by a zt, n GI ). Each term of the summation within the square brackets of (3) represents the joint probability of allocating a future patient n with covariate z in block j to treatment t and the current information state at patient n -1, given the data at the beginning of block j -1 and the probability of a patient being covariate positive q.
In order to compute the CARA FLGI allocation probabilities for block j and arm zt (which do not depend on the covariate value of the first patient in the block), we average the probabilities calculated in (3) as follows:

Worked Example
To illustrate how these probabilities are computed, implemented and their computational cost, we present the simplest case of a trial testing a control treatment (t=0) against an experimental treatment (t=1) with a block of size 2 (i.e., b = 2 and 2(T + 1) = 4) and a binary covariate Z = {0, 1}. Suppose further that in the first block of the trial, having started with Beta(1,1) priors for all the t′ = 4 arms, the resulting allocation was: one covariate negative patient to control (i.e., arm t 00 )-one success-and a covariate positive patient to experimental t 11 -another success-and no patients allocated to the remaining arms (i.e., t 01 , t 10 ). Hence, for the second block the priors for each combination arm are x̃2 = [(2, 1); (1, 1); (1, 1); (2; 1)]. Figure 1 shows, via a probability tree, how the CARA FLGI probabilities for block 2 given the data in block 1 are computed for the first patient in the block being covariate negative value. Given that the control treatment has the maximum Gittins index, the allocation for the first patient of the second block who has a covariate negative value (i.e., patient 3 with z 3 = 0) is deterministic. It follows that r(a 00, 3 GI = 1 x 2 ) = 1 and r(a 01, 3 GI = 1 x 2 ) = 0.
From this example, it is clear that the computational cost of computing the π zt,j 's, which depend on the joint state for the 2(T + 1)-arms, that is, x̃n (instead of the 1 arm state x n ), will grow exponentially as b and T increase. Hence, we use a Monte-Carlo algorithm for this purpose. It follows that for Z n = z if we plug (3) into (4) the π zt,j 's add up to 1.

Simulation Study
In the following, we assume that responses for treatment t satisfy the following logistic model: Logit (p t (z n )) = α t + β t z n , where α t is treatment's t effect, and β t is the effect due to the binary covariate Z in treatment group t. The parameter of interest is the covariateadjusted treatment difference, defined as α t -α 0 for t = 1, … , T. The covariate Z is assumed to be independently distributed as a Bernoulli(p z ), where p z is known before the start of the trial. Notice that the model allows for treatment-covariate interaction, since the covariates effects coefficients β t 's are allowed to vary across treatments.
We now evaluate the properties of the proposed CARA procedures by simulation, focusing on operating characteristics that include measures of validity, efficiency, balance and ethics. The validity of the procedures is measured by the average significance level or type I error rate α of the test under the null hypothesis. To assess the chance of a type I error being made under the global null in a multi-arm setting, we report the family-wise error rate ᾱ. This is the probability of rejecting at least one true null hypothesis. The Bonferroni method is used to account for multiple testing and ensure that ᾱ ≤ α, that is, all hypothesis whose p-values are less than α K are rejected (with α = 0.05). The efficiency of procedures is measured by the average statistical power (1 -β) of the test used. To assess power in a multi-arm setting, we calculate the probability of rejecting the null for the truly best treatment under each assumed scenario. The balance measures of procedures considered are the average allocation proportion per treatment N t (N)/N for t = 0, … , T. For a measure of (im)balance and ethics, we consider the average allocation proportion of patients within Z category assigned to the best treatment for that category N (zt)* (N)/N. The ethical performance of the procedures is assessed by the expected total number of treatment failures (ENF) and the average proportion of patients assigned to the best treatment p*, defined as pz × N (1t)* (N)/N + (1pz) × N (0t)* (N)/N. Finally, for a combined measure (CM) of the power-ethics trade-off, we report the percentage of trial realizations in which: (1) 85% or more patients with biomarker Villar and Rosenberger Page 7 Biometrics. Author manuscript; available in PMC 2018 July 23. 0 received the best arm under H 1 and (2) resulted in a statistically significant treatment difference.
We compare both the deterministic and randomized GI-based CARA procedures to the following established CARA allocation procedures for the logistic regression model: (1) Equal randomization (ER): patients within a covariate group are allocated between the T experimental arms and the control arm with a fixed and equal probability of 1 T + 1 . (2) CARA 1: Rosenberger, Vidyashankar, and Agarwal (2001) propose the allocation target: where p t (z) = Expit(α t + β t z) and q t (z) = 1 -p t (z).

(6)
Thompson CARA Sampling (TS) (Thompson, 1933): patients within a covariate group are allocated between the experimental arms and the control arm with a probability proportional to the posterior probability that p zt is the largest response rate given the observed data. Specifically, we defined allocations are made using a permuted block of size m (m is reported for every simulation).
Note that in choosing which rules to compare our procedure against, we wanted to include all related methods and those that are used in practice. However, it is important to highlight that all of these methods are essentially different in that they are designed to achieve different goals (power, balance, or ethics). In all simulations, we use a uniform Beta(1,1) prior for each arm and we compute the allocation probabilities for CARA FLGI using a Monte-Carlo approximation based on 10 2 replications. For the CARA procedures in (2)-(5), the first 50 patients are randomized following an ER procedure. After those 50 allocations, the maximum likelihood estimators of α t and β t are computed, and the associated estimate of the treatments success rates p t (z) is sequentially used for computing the allocation probabilities.

The Sepsicool Trial
The Sepsicool trial, as reported by Schortgen et al. (2012), evaluated the effect of fever control by external cooling on vasopressor requirements in septic shock. Septic shock, defined as sepsis with cardiovascular failure requiring vasopressor infusion, has a high mortality rate (40-60%). Current recommendations focus on the first few hours of sepsis management but the criteria for vasopressor selection remains debated. The original trial allocated 200 patients between the two arms with a fixed and equal randomization probability of 1/2. The primary endpoint (number of patients with 50% decrease in the baseline vasopressor dose) was available 48 hours after randomization. The secondary endpoints included all-cause mortality on day 14. At the end of the trial, the difference in the primary endpoint between treatments was not found to be statistically significant. However, day 14 mortality rate was significantly lower in the cooling group (19% vs. 34%). Furthermore, post hoc analyses adjusting on the baseline vasopressor dose differed significantly between treatment groups, indicating that the significant effect of cooling was more pronounced in patients having the lowest baseline vasopressor doses (i.e., those that had a lower illness severity). The reported odds-ratio (OR) before and after adjustment for covariates with baseline imbalances (a combination of dose and illness severity), respectively, were 0.44 and 0.36.
We show the impact of redesigning the Sepsicool study to account for such covariate data using the CARA GI and CARA FLGI and the alternative CARA patient allocation rules. We dichotomize patients by dose-severity into two groups: high or moderate versus low illness severity-dose combination. We let covariate z take the value 0 when a patient is from the high or moderate severity group. We report the results for two scenarios of parameter values: α t = 0.6482, β t = 0 for t = 0, 1 (Scenario 1) and α 0 = 0.6482, β 0 = 0, α 1 = 1.6702, β 1 = −0.3793 (Scenario 2). Scenario (1) represents the null hypothesis where there is no treatment effect nor treatment-covariate interaction while Scenario 2 represents a possible parameter realization compatible with the values reported after the Scepsicool trial. The values for Scenario 2 are arbitrary but chosen so as to be consistent with the overall and adjusted ORs reported by the trial in Schortgen et al. (2012), as response data per covariate group is not publicly available, and neither is the information on the covariate distribution. Therefore we assume that patients are equally distributed between these two groups (i.e., p z = 0.5). Villar and Rosenberger Page 9 Biometrics. Author manuscript; available in PMC 2018 July 23.
The sample size was chosen to be of N = 450 so that ER achieves at least 90% in Scenario 1. For the CARA FLGI, we fix the block size to b = 10, 30, 45, 50 patients and the number of interim analysis to J = 45, 15, 10, 9. For SPBD the block size was set to m = 10.
Hypothesis testing was performed using a normal cut-off value (when appropriate) and using an adjusted version of Fisher's exact test for comparing two binomial distributions. Fisher's test has an actual rejection rate far below the nominal significance level. To make the designs comparable and suitable for response-adaptive bandit rules we chose its cutoff value from simulations so as to achieve the nominal type I error rate . Under the null, p* is defined as the mean proportion of patients assigned to the control group, whereas under the alternative, p* is computed as the mean proportion assigned to arm 1. The efficiency of procedures under Scenario 2 was measured by the average power of the test to reject the null hypothesis of no covariate-adjusted treatment difference (i.e., α 1 -α 0 = 0). Of particular interest is the potential conflicts among balance, efficiency, and ethics goals that result from the different methods. All CARA designs are more variable than the rules that achieve balance. Some of them achieve slightly higher levels of power and similar values of ENF than ER (CARA 3) while others offer a slightly reduced power accompanied by a reduction in their ENF (CARA 1) when compared to ER. As expected, the designs that have the smallest ENF (and therefore the highest performance in terms of the ethical goal) and p* are the Gittins index-based ones. However, these rules also have the largest variability of allocation probabilities and the lowest power levels. Naturally, these designs also have a much larger variability and lower power than CARA procedures designed to optimize patient benefit while simultaneously achieving a power constraint (as CARA 4 does). However, this larger variability is associated with highly skewed patient allocations only under H 1 .

Results
These results suggest that the conflict between power and balance can be less acute than that between balance/power and ethics, but just as in the two-armed case with no covariates, a conflict between both of these objectives and the ethical one will always be present. In other words, in a two-armed context any modification of the CARA GI rules aimed at increasing its statistical power result in a worsening of its performance in terms of patient benefit. This power-ethics trade-off is particularly acute when the disease is fatal and rare. The CM measure suggests that only GI-based designs and TS have some trial realizations that meet this dual power-ethics criterion. Next, we explore the multi-armed case, the main motivation for this article.

Multi-Armed Trials: The Controlled CARA FLGI
We next create a simulation scenario to evaluate the effect of increasing the number of treatments on the ethics-power relation of the proposed CARA GI allocation methods. We consider a trial of size N = 300 with three arms (i.e., K = 2). The assumed parameters per arm are: (α t , β t ) = {(-0.052, -0.473); (-1.252, 1.152); (-0.652, 1.552)} which corresponds to the following success rates vectors (0.2224; 0.3425; 0.4870) (for covariate negative patients) and (0.4750; 0.7109; 0.3717) (for covariate positive patients). Therefore, in this scenario there exists a treatment-covariate interaction: the best arm for covariate negative patients is experimental arm 2 while for covariate positive patients is experimental arm 1. As in the previous section, we assume that patients are equally distributed between these two groups (i.e., q = 0.5).
We compare the CARA FLGI procedures to ER, SPBD, and TS. Additionally, we heuristically extend CARA 1 and CARA 2 to, respectively, use the following allocation targets: Notice that these heuristics are no longer expected to optimal, as shown in Tymofyeyev et al. (2007) for multi-arm case with no covariate variables the extension is far from trivial. Procedures CARA 3 and CARA 4 are not simply extended into a heuristics rule for a multiarmed context and therefore not included in this section. In addition to the CARA FLGI rule defined by (4), we shall consider a controlled group allocation rule (CARA CFLGI) which, similarly to the CFLGI introduced in , protects the allocation to the control treatment so that it remains fixed at 1 K + 1 per covariate group during the trial. Table 2 reports the results of 5000 replications of this trial under the scenario above described and for each of the CARA designs considered. The operating characteristics reported are the same as before with the exception of the efficiency and ethics measures. The efficiency is measured in this case by the average marginal statistical power of the test used to detect that arm 2 is best for covariate negative patients (1 -β 0 ) or to detect that arm 1 is best for covariate positive patients (1 -β 1 ). The ethical performance is measured by the expected number of successes (ENS) rather than ENF. CM counts realizations that: (1) assigned 75% or more covariate 0 patients to their best arm and (2) had a statistically significant result.
As expected, the CARA GI-based rules perform extremely well in terms of patient benefit (with CARA GI achieving on average 38 more successes than a ER design) and extremely poorly in terms of marginal power and the variability of the resulting allocations. On the other hand, CARA 1 and CARA 2 have lower and similar levels of variability yet they result Villar and Rosenberger Page 11 Biometrics. Author manuscript; available in PMC 2018 July 23.
in different performances in terms of patient benefit and marginal power. The controlled class of CARA FLGI allocation rules results in higher values of marginal power than ER for both patient subgroups while offering and advantage in terms of patient benefit of 13-16 more successes on average depending of the block size selected. Notice that these controlled procedures achieve variability levels similar to CARA 1 and CARA 2.

Extensions: Polychotomous Covariates and Multiple Covariates
Let Z n ~ Multinomial(q 0 , q 1 , … , q M ) for Z n ∈ {0, 1, … , M} where M is a finite number, that is, P(Z n = i) = q i for i = 0, 1, … , M. A CARA GI heuristic solution as in Section 2 for this case can be defined by considering that for every treatment-covariate combination there exists a independent arm, indexed by its covariate value and treatment arm label, that is, zt.
Therefore, there will be (M + 1)(T + 1) arms in this reformulated the classic multi-armed bandit problem, each of which has a success rate p zt .
If there are C possible covariate variables and Z n is a C × 1 vector, where C is a finite number. Suppose each individual covariate c can take M c different values where M c is finite and specific for each covariate. A CARA GI heuristic solution as in Section 2 for this case can be defined by redefining all the covariates into a single covariate having S = M 1 × M 2 × … × M C stratification levels. Again, we consider that for every treatment-covariate level there exists a independent arm, indexed by its covariate value and treatment arm label. Therefore, there are S(T + 1) arms in this reformulated multi-armed bandit problem.
In both cases, the GI CARA procedure assigns patient n with covariate z n to the arm available given its covariate profile with the highest Gittins index after observing patient's n − 1 outcome. The computationally feasibility of the Gittins index rule ensures the tractability of the CARA GI procedure regardless of the size of M or M c .

Discussion
Over the past few years, "precision medicine" (the tailoring of medical treatment to the individual characteristics of each patient) has shown promise through the approval of new cancer therapies for patients with specific genetic mutations. The challenge to precision medicine is that many promising new treatments have relatively few patients to test them, and even fewer patients when a treatment works only within a biomarker subgroup. Novel methodology for trial design that is able to identify superior treatments more quickly, mainly treatments that work better within subgroups, is an essential requirement to make precision medicine possible.
Non-myopic block adaptive randomization procedures based on the Gittins index have recently been proposed to offer patient benefit within a trial by more quickly identifying a superior arm if it exists. In this article, we introduce a bandit-based CARA design which can incorporate biomaker information that is potentially predictive of patient outcome. Through simulations, we illustrate how the proposed CARA FLGI procedure enables adaptively block-randomized clinical trials that are statistically conservative when no superior treatment exists and highly ethical in terms of patient benefit. Such a procedure satisfies the dual goals of differential treatment responses while satisfying an ethical imperative.
A limitation of the resulting CARA design is its high variability leading to a corresponding power loss. Although in the two-armed context this reduced power is unavoidable without a corresponding sacrifice in patient benefit, in a multi-armed context, these CARA designs offer important simultaneous patient benefit and power gains over a traditional design. A caveat to these results is that they require a temporal homogeneity of the blocks of patients recruited, as a time trend could cause an inflation of the type I error.
The derivation of the allocation probabilities with the proposed CARA design is considerably more complex than that of the existing alternative procedures. However, its practical implementation is as feasible as that of other methods, given its low computational cost. Moreover, the patient benefit advantages of the proposed design grow with the number of arms under study.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.    Results from 5000 replications for different CARA procedures, N = 300: arm 1 is best for covariate positive patients and arm 2 is best for covariate negative patients