Escalation strategies for combination therapy Phase I trials

Phase I clinical trials aim to identify a maximum tolerated dose (MTD), the highest possible dose that does not cause an unacceptable amount of toxicity in the patients. In trials of combination therapies, however, many different dose combinations may have a similar probability of causing a dose-limiting toxicity, and hence, a number of MTDs may exist. Furthermore, escalation strategies in combination trials are more complex, with possible escalation/de-escalation of either or both drugs. This paper investigates the properties of two existing proposed Bayesian adaptive models for combination therapy dose-escalation when a number of different escalation strategies are applied. We assess operating characteristics through a series of simulation studies and show that strategies that only allow ‘non-diagonal’ moves in the escalation process (that is, both drugs cannot increase simultaneously) are inefficient and identify fewer MTDs for Phase II comparisons. Such strategies tend to escalate a single agent first while keeping the other agent fixed, which can be a severe restriction when exploring dose surfaces using a limited sample size. Meanwhile, escalation designs based on Bayesian D-optimality allow more varied experimentation around the dose space and, consequently, are better at identifying more MTDs. We argue that for Phase I combination trials it is sensible to take forward a number of identified MTDs for Phase II experimentation so that their efficacy can be directly compared. Researchers, therefore, need to carefully consider the escalation strategy and model that best allows the identification of these MTDs. Copyright © 2012 John Wiley & Sons, Ltd.


INTRODUCTION
In recent years, combination therapy treatments have become more widespread, giving rise to a number of proposed models for estimating the maximum tolerated dose (MTD) in Phase I combination trials [1][2][3][4][5]. The aims and designs of two-agent Phase I trials are more complex than that of a single-agent trial, and a multitude of MTDs with a similar toxicity profile can be identified. Indeed, a number of drug combinations could be recommended for Phase II (RPII) experimentation so as to directly compare their efficacy. In theory, in a two-dimensional setting, an infinite number of possible dose combinations will achieve the same target toxicity level (TTL), assuming a continuous dose-toxicity surface. In practice, however, such choices are often restricted by predefining a set of dose combinations used for experimentation.
Combination drug Phase I trials often have rich prior information available on the dose-toxicity response from single-agent studies, which should be utilised to make the trial more efficient. For this reason, a number of authors have suggested using Bayesian parametric models to describe the dose-toxicity surface, where model parameters can be separated into those that relate to the marginal dose-toxicity response and those that relate to the 'interaction' between the two agents [2][3][4]. In particular, the three-parameter copula-type regression model proposed by Yin and Yuan [2] is an extension of the popular continuous reassessment method (CRM) [6] used in single-agent dose-finding trials, whereas the six-parameter model, proposed by Thall et al. [3], is simplified into a logistic model, marginally, over each single agent.
Less attention, however, has been given to the performance of such models under different escalation strategies, when only a discrete set of dose combinations are available for experimentation. Different escalation strategies may result in different MTDs being identified and recommended for Phase II evaluation [7]. In single-agent trials, the risk of overdosing can be avoided by doseescalation strategies that do not skip any predefined dose levels. However, in combination therapy trials, the set of admissible doses, with which the next cohort can be treated, is more complex. A number of authors have suggested escalation to neighbouring dose combinations within the two-dimensional space, with both agents increased concurrently [3] and where only one agent is increased at a time [2,4].
In this paper, we consider different strategies for escalation and for the search of the dose-toxicity space, and show how these strategies affect the number and suitability of the RPII doses. We compare diagonal escalation where both drugs are increased simultaneously with escalation where only one drug is increased at a time between patient cohorts. We show how efficiency, gained through allowing 'positive' diagonal escalation, must be traded-off against an increased risk of overdosing. In combination therapies, however, the severity of overdosing may be less pronounced because dose ranges are likely to be more targeted, which is due to previous Phase I experimentation of single agents. In addition, we also propose strategies in moving between dose combinations based on either selecting the next dose whose estimated probability of dose-limiting toxicity (DLT) is closest to the TTL or using a Bayesian D-optimality criteria. The latter allows the design to search the dose-toxicity space more fully and to identify a larger number of MTDs. We propose that RPII doses be chosen on the basis of the estimated probability of DLT being within a tolerance interval of the TTL to allow more than one combination to be recommended for future evaluation, and we restrict our choice of RPII doses to only combinations that have been experimented on within the course of the trial. From this definition, we contrast two proposed models in their ability to identify the RPII dose combinations by using a set of simulation studies.

PARAMETRIC MODELS FOR COMBINATION THERAPY PHASE I TRIALS
We shall start by reviewing two proposed parametric models used in combination therapy Phase I trials: a six-parameter model [3] and a three-parameter copula-type model [2].
The model proposed by Thall et al. [3] uses four parameters to describe the probability of toxicity at the margins (i.e. when each drug is used in isolation) and two parameters to describe the magnitude and shape of the dose-toxicity curve when the drugs are used in combination. Specifically, let x D .x Ai , x Bj / denote the dose combination when drug A is used at level i .i D 1, : : : , I/ and drug B is used at level j .j D 1, : : : , J/. The probability of a DLT is then given by where the six parameters Â 1 D .˛1,ˇ1,˛2,ˇ2,˛3,ˇ3/ are all positive. Suitable priors can be obtained for the parameters˛1,˛2,ˇ1 andˇ2 by using data from single-agent trials or eliciting opinions from physicians. For the interaction parameters,˛3 andˇ3, Thall et al. recommend using reasonably vague Gamma priors. Yin and Yuan [2] propose a copula-type model with three parameters, Â 2 D .ı, , /. The model is of the following form: where p i is a prespecified 'best guess' probability of DLT when drug A is used in isolation at level i, and q j a 'best guess' probability of DLT when drug B is used in isolation at level j. These quantities are fixed in advance using prior knowledge (and are sometimes referred to as the 'skeleton' in a CRM). A monotonically increasing sequence is specified for both the ps and the qs. Nevertheless, in order to reflect the fact that the marginal probabilities are uncertain, the true probabilities are taken as p ı i and q j , where ı and are unknown parameters with a prior mean equal to one. The final model parameter is > 0, which characterises any interaction between the drugs. This copula model has been regarded as a multivariate extension of the CRM power model because if drug A is used in isolation, then q D 0, and hence, ..x Ai , 0/; Â 2 / D p ı i . Similarly, when drug B is used in isolation,

DECISION RULES FOR ESCALATION
Suppose that n patients have currently been treated in the trial. The data can be summarised by the doses each patient received and the toxicity outcome indicators (Y D 1 for a DLT, and Y D 0 otherwise); hence, Z n D f.x k , Y k /, k D 1, : : : , ng. Let f .Â/ denote the prior distribution for the parameter vector Â, where Â D Â 1 for the six-parameter model and Â D Â 2 for the threeparameter model. The posterior distribution after n patients by Bayes theorem is where the likelihood is binomial and given by The choice of dose combination for patient .n C 1/ is based upon the posterior distribution. For safety purposes and for the prevention of 'dose skipping' , the set of doses for patient .n C 1/ may be restricted to dose combinations close to the current combination, x n . These sets of doses will be labelled the admissible doses for patient .n C 1/. A number of possible approaches in choosing the admissible dose set are described in Section 4. For now, suppose that represents our chosen set of admissible doses. One approach in choosing the next dose combination from this admissible set is to find the dose combination whose posterior mean toxicity is closest to the TTL , that is where E OE . ; Â/ jZ n D R .
; Â/ f .ÂjZ n / dÂ. In a decision theoretic framework, this is equivalent to minimising the posterior expected loss with respect to the loss function L. ; Â/ D . . ; Â/ / 2 , for 2 [8]. This approach is commonly used in many singleagent dose-finding designs [4,6,9] and can be readily adapted to the combination therapy setting. We shall subsequently refer to an escalation strategy based on Equation (3) as decision rule D1.
An alternative approach is to select the next dose on the basis of a constrained Bayesian D-optimality design [3,10]. For example, within a two-dimensional dose-finding trial, it may be the case that among the admissible dose sets for patient .n C 1/, a number of dose combinations have an estimated probability of toxicity within a small tolerance of the TTL. The dose whose posterior mean probability of DLT is closest to the TTL could be chosen, as described previously, or the investigator may wish to select the next dose on the basis of maximising the information of the model parameters while still assigning a dose that is close to the TTL. Let I. ; Â/ D E h @`.. ,Y/;Â/ @Â @`.. ,Y/;Â/ @Â T i denote the Fisher information matrix associated with treating a patient at dose combination , given the parameters Â, and where`.. , Y/ ; Â/ D log f .. , Y/ ; Â/ is the log likelihood function. Bayesian D-optimality assigns patient n C 1 to dose x nC1 on the basis of maximising the posterior expectation of the log determinant of the information matrix given the current data, that is, In order to ensure that patients are not assigned to highly toxic doses with this design, the admissible dose set should be further restricted to doses that have toxicity within a certain tolerance of the TTL. We achieve this by restricting the admissible set, after n patients have been recruited, to those doses whose posterior mean probability of DLT is within a tolerance of the TTL, that is An alternative restricted dose set includes dose combinations that, with some large degree of probability, have a probability of DLT less than a maximal acceptable toxicity [10]. If the restricted dose set is empty, that is there are no admissible doses whose posterior mean probability of DLT is within of the TTL, then the next dose combination is chosen as the one whose posterior mean probability of DLT is closest to the TTL, as in Equation (3), subject to the original admissibility constraints . An escalation a) Non-diagonal escalation strategy b) Diagonal escalation strategy strategy, based on D-optimality equation (4) and restriction (5), will be labelled strategy D2.

ADMISSIBLE DOSES
Overdosing is a major concern in Phase I trials, and in order to address this problem, designs often incorporate additional constraints to prevent skipping of predefined dose levels. In a combination therapy trial, this is equivalent to only allowing the next cohort to be treated at one dose level above the current prescribed dose for drug A, whereas drug B remains fixed, or one dose level above the current level for drug B, whereas drug A is kept fixed. The set of admissible dose combinations for the next cohort are therefore restricted to the neighbouring dose combinations in the two-dimensional space, without allowing diagonal moves (Figure 1(a)), as recommended by Yin and Yuan [2]. Such an admissible dose set will be labelled 1 .
A slightly less conservative approach to dose finding is to additionally allow diagonal escalation, whereby both drugs are increased simultaneously (Figure 1(b)). This is equivalent to skipping one dose combination in the non-diagonal design because it would take two steps to reach this dose combination using such a design. However, in the context of combination therapy, such escalations may be tolerable (at least for the clinical trialist) because both drugs are likely to already have a known toxicity profile from earlier Phase I experimentation. Hence, the dose ranges used in the combination trial are likely to be more targeted. An admissible dose set based on diagonal escalation is labelled 2 .
A third approach is to allow diagonal escalation to neighbouring combinations while also allowing the administration of any previously experimented dose combination. This has the advantage of allowing the design to explore more efficiently around the dose space. For example, suppose that n cohorts have been treated thus far in the trial and escalation has proceeded as depicted in Figure 2, using any neighbouring dose combinations as admissible doses. The current cohort are treated at combination .x A6 , x B1 /. The majority of cohorts have been treated at high levels of drug A and low levels of drug B. However, this current imbalance in the allocation of doses means that little has been learned about the toxicity when low doses of drug A and high doses of drug B are given. Designs based on D-optimality are likely to propose escalation to such dose combinations, but a jump from the current dose combination to, say, dose combination .x A1 , x B6 / would involve recruiting five more cohorts if only neighbouring dose combinations are admissible. Allowing jumps to previously experimented dose combinations would, in this example, allow a jump directly to combination .x A3 , x B4 /, with the hope of making the overall design more efficient. An admissible dose set based on neighbouring and previously experimented doses is labelled 3 .

SIMULATIONS
We investigate the operating characteristics of both decision rules D1 and D2, described in Section 3, by using the admissible dose sets 1 , 2 and 3 , described in Section 4. Note that, for the  D-optimal design (D2), each admissible dose set is further restricted by Equation (5).
Simulations are conducted for both the six-parameter and three-parameter models, described in Section 2. The operating characteristics of the approaches are illustrated using a combination therapy cancer trial of Gemcitabine and Cyclophosphamide, which is described fully in Thall et al. [3]. Priors for the six-parameter model, shown in Table I, were elicited from expert physicians, and these give rise to a toxicity surface, as shown in Figure 3, where the prior mean probability of DLT contours are plotted. We consider six dose levels which can be used for each drug. These are standardised to their known MTD doses when each drug is used in isolation. Hence, dose combination .1, 0/ refers to the single-agent MTD for drug A, and .0, 1/ refers to the single-agent MTD for drug B. The six dose levels are as follows: .0.2, 0.5, 0.7, 0.8, 0.9, 0.95/. These were chosen for the experimentation to be focused around the presumed MTD contour (corresponding to a TTL of 0.3), as shown in Figure 3. The prior mean probability of DLT, when each drug is used in isolation, is calculated from the model and used as the skeleton in the three-parameter model (the fixed ps and qs in Equation (2)). All parameters in the three-parameter model are given independent gamma.2, 2/ priors (mean 1 and variance 0.5). Note that Yin and Yuan [2] also used these priors for ı and , whereas they used a more vague gamma(0.1,0.1) prior for the interaction parameter . However, we found such a prior to be numerically unstable, particularly when calculating the information matrix, because sampling values of were very close to zero.
The first cohort is treated at the lowest dose combination (i.e. (0.2,0.2)), and two individuals are treated in each cohort. Escalation then proceeds using either decision rule D1 or D2, under a certain admissible design set ( 1 , 2 , or 3 ). The TTL is set to 30% and D 0.025; that is, for design D2, we further restrict our admissible set to dose combinations whose posterior mean probability is within the range .0.275, 0.325/. A total of 20 cohorts (40 patients) are treated in the trial. The final RPII dose combinations are selected as those that have been experimented on during the course of the Phase I trial and whose posterior mean probability of DLT is within of the TTL, that is the set where E is the set of dose combinations experimented on during the trial. One thousand simulations were performed, in which the toxicity outcome for each patient is drawn from a Bernoulli distribution with the true probability of toxicity depending on the current dose combination. All analyses were carried out in R linked to the MCMC package JAGS [11], and the code is available from the authors upon request.

Scenarios
Four scenarios are investigated. The first takes the true probability of toxicity at each dose combination from the prior mean, as specified from the six-parameter model (Table II(a)). Both models are therefore expected to perform well under this scenario. Four of the prespecified dose combinations are MTDs, and it is desirable for the designs to recommend as many of these doses for Phase II experimentation as possible. In the second scenario, the probability of a DLT is higher than the specified prior mean probabilities for high-dose combinations (Table II(b)). In this scenario, there are five possible MTDs, each with a probability of DLT equal to 30%. However, a one level increase above any MTD for either drug results in a large jump in the probability of a DLT to 45%. Hence, in this scenario, the risk of overdosing is high. The scenario was not derived from a specific choice of model parameters using either the three-parameter or six-parameter model; rather, the aim is to assess the robustness of the escalation procedure to model any prior misspecification. The third scenario assumes an asymmetric dose-toxicity surface (Table II(c)), where drug A is more toxic than B over the dose range. This could arise if the prior is misspecified for the single-agent MTD dose for drug A. There are four possible MTDs in this scenario. The fourth scenario investigated assumes the dose-toxicity surface is relatively flat, with the probability of DLT ranging from 0.19 to 0.415 over the dose combinations. Here, there are nine possible MTDs (doses whose toxicity is within of the TTL).

Escalation: six-parameter model
The mean probability of the DLT for each patient recruited in the trial is shown in Figure 4 for each scenario under the various admissible sets, using the six-parameter model and decision rule D1. A very similar escalation pattern is seen under the decision rule D2, and hence, these results are not plotted. In every scenario, designs using either the admissible set 2 or 3 escalate faster compared to those using 1   receiving suboptimal doses. Scenarios 2 and 3 result in some overdosing, due to the unexpected toxic nature of the drug combination, before the designs start to converge down to the TTL. Designs 2 and 3 peak faster and at a higher toxicity level than the more conservative 1 design. Under the flat dose-toxicity surface of Scenario 4, all designs escalate at roughly the same rate and converge on the TTL from below. Figure 5 illustrates the percentage of times each dose combination is experimented on under Scenario 1 for the competing designs. Using the admissible set 1 , escalation generally proceeds along the margin of the two-dimensional space (i.e. drug B remains at its first level) before escalation to the TTL. This occurs because the prior forˇ2 is higher thanˇ1. Hence, although no DLTs have been observed, at each decision point, an increase in drug A is estimated to get closer to the TTL than an increase in drug B. If the prior mean forˇ2 was less than that ofˇ1 then, on average, drug B dose levels will be escalated first. In fact, only if the presumed dose-toxicity curve is convex (this happens if 1 < 1 andˇ2 < 1) does the design behave in a step-like fashion by escalating drug A and B in turns. Under the non-diagonal design (D1 1 ), a large proportion of patients are treated at just one of the four MTDs, .x A6 , x B3 /, whereas other MTDs are very rarely experimented on. Using the admissible set 2 , escalation proceeds up the diagonal of the two-dimensional space, and experimentation among the four MTDs is more equally spread. The D-optimal designs all produce slightly more varied experimentations compared with designs D1, and this is especially true when the admissible dose set includes previously experimented doses ( 3 ). Table III shows how each design performs in terms of underdosing and overdosing patients recruited to the trial, under the varying scenarios. It is clear that operating characteristics are strongly dependent on the scenario investigated. Under Scenario 1, approximately 40% of all trial participants are treated at a target level (25%-34% toxicity). Designs that use a diagonal escalation strategy ( 2 and 3 ) tend to dose more patients at the target level under Scenarios 1-3, whereas very similar dosing profiles are seen in Scenario 4. However, these designs also tend to overdose more often compared with the more conservative 1 escalation. The D-optimal design (D2) slightly improves operating characteristics under the 1 strategy but does not consistently improve experimentation percentages under designs 2 and 3 . Table IV shows the toxicity of the RPII doses at the end of the trial. As to be expected, at the target level, the recommendation percentages are higher than the experimentation percentages shown in Table III. It is interesting to note, however, that these percentages are still relatively low, suggesting that a much larger trial is required to have adequate power to recommend Phase II doses close to the target level. The advantage of the D-optimal designs (D2) is more pronounced when studying the recommendation percentages. RPII doses are generally less likely to be above the TTL, whereas higher percentages of the true MTDs are finally selected for Phase II compared with the D1 designs.

Recommendation: three-parameter model
It should be noted that a formal comparison between the operating characteristics of the three-parameter and six-parameter models is difficult because of the choice of prior distributions. The priors chosen for the two models in this analysis were not intended to give matching dose-toxicity surfaces, in terms of either prior means or variances. Therefore, in this paper, we have adopted a pragmatic approach, using priors that have been previously recommended. Table V shows the toxicity of the RPII doses using the threeparameter model. A similar pattern is observed, as seen with the investigations using the six-parameter model, with reasonable operating characteristics under Scenarios 1 and 4, whereas a fair proportion of RPII doses under Scenarios 2 and 3 are above the true TTL. In contrast to the six-parameter model, the D2 designs appear to reduce the proportion of RPII doses that have toxicity at the TTL. However, the percentage of true MTDs recommended under these designs does increase. This apparent contradiction arises because the D-optimal designs are recommending more Phase II doses, both correctly at the TTL and incorrectly outside of the TTL. The diagonal designs ( 2 and 3 ) generally recommend a higher proportion of RPII doses at the TTL, but at the expense of increased overdosing of patients during the trial (experimentation results for three-parameter model not shown). Under these prior specifications, the six-parameter model tends to outperform the three-parameter model in Scenarios 1, 2 and 4 in terms of RPII doses at the TTL and the overall percentage of the true selected MTDs (Tables IV and V). Interestingly, however, the three-parameter model appears to perform much better under Scenario 3 when the dose-toxicity curve is asymmetric.

DISCUSSION
In this paper, we have shown a diagonal escalation strategy to be more efficient in that it reaches the TTL quicker, with approximately 20%-50% fewer patients dosed at suboptimal levels and correctly recommends more MTDs at the end of the trial. We found that the non-diagonal strategies start by escalating a single agent while keeping the other agent fixed, and we did not see the step-like escalation that was anticipated. This behaviour may be undesirable because the aim may be to show safety when both drugs are being used in reasonable quantity. For example, in our simulation studies, drug A was first escalated near to its MTD, whereas drug B remained fixed at the lowest level. This results in some MTDs at the end of the trial being rarely recommended. The planned sample size of the trial must, therefore, take into account the proposed escalation strategy.
The tradeoff for using a diagonal escalation strategy is that it will lead to an increased risk of overdosing within some trial patients; the extent of which depends on the underlying dosetoxicity surface and the proposed increments in dose for each drug. The severity of this consequence will depend very much on the disease and the drugs under consideration. However, if escalation to potentially overly toxic doses is of major concern, then the dose range should be subdivided into a finer grid, if possible. In other words, the prespecification of the doses should take into account whether a diagonal or non-diagonal strategy is to be used in the escalation procedure. The flexibility of model-based adaptive designs present an advantage here because dose levels may also be refined during the course of the trial. The models would treat the added dose combinations as additional design points for consideration when deciding the next cohort's dose, with the admissible dose set also being updated appropriately. Researchers must, therefore, consider these design issues carefully in order to ensure that overdosing is not too severe and is kept to a minimum.
Designs based on Bayesian D-optimality criteria are shown to allow better traversing of the dose space. This, in turn, slightly improves the percentage of correctly recommended MTDs. When combined with a strategy that allows diagonal escalation and experimentation at previous administered doses, the performance of the D-optimal design is further enhanced. Such escalation strategies are therefore important, especially for Phase I trials with a large number of prespecified dose combinations.
In three out of the four scenarios investigated, the sixparameter model has been found to outperform the threeparameter model when the objective is to identify more than one MTD. However, care must be taken when comparing between models because of the different prior specifications. One advantage of the CRM approach in the single-agent trials has been that underparameterised models can be used to provide good local fit at the TTL. However, in combination therapy trials, models need to be more flexible, and hence complex, in order to provide a good approximation of the whole dose-toxicity contour at the TTL. In our experience, we have found that the six-parameter model allows a more explicit specification of prior information from single-agent data, with four of the six parameters specifically related to this. Prior information can be elicited from experts, as was the case of the Gemcitabine and Cyclophosphamide trial [3], or priors can be obtained directly from previous Phase I trial data. For the latter, some discounting (i.e. increasing the prior variance) may be necessary if previous trials are conducted in different populations and with different protocols.