A Bayesian adaptive design for dual‐agent phase I–II oncology trials integrating efficacy data across stages

Abstract Combination of several anticancer treatments has typically been presumed to have enhanced drug activity. Motivated by a real clinical trial, this paper considers phase I–II dose finding designs for dual‐agent combinations, where one main objective is to characterize both the toxicity and efficacy profiles. We propose a two‐stage Bayesian adaptive design that accommodates a change of patient population in‐between. In stage I, we estimate a maximum tolerated dose combination using the escalation with overdose control (EWOC) principle. This is followed by a stage II, conducted in a new yet relevant patient population, to find the most efficacious dose combination. We implement a robust Bayesian hierarchical random‐effects model to allow sharing of information on the efficacy across stages, assuming that the related parameters are either exchangeable or nonexchangeable. Under the assumption of exchangeability, a random‐effects distribution is specified for the main effects parameters to capture uncertainty about the between‐stage differences. The inclusion of nonexchangeability assumption further enables that the stage‐specific efficacy parameters have their own priors. The proposed methodology is assessed with an extensive simulation study. Our results suggest a general improvement of the operating characteristics for the efficacy assessment, under a conservative assumption about the exchangeability of the parameters a priori.


Introduction
The primary objective of early phase clinical trials is to identify a dose that is safe and efficacious.Seamless phase I-II clinical trial designs are efficient approaches to study these two aspects in a single protocol.In the literature, we find two types of seamless phase I-II designs: one-stage and two-stage.The former usually estimates the joint probability of toxicity and efficacy using the accumulating data to recommend a best-suited dose to patients in the next cohort Ivanova [2003], Thall and Cook [2004], Yuan and Yin [2009], Liu et al. [2018].This setting is favoured when the efficacy outcome can be observed relatively soon after administration of the dose, e.g., after one or two cycles of therapy.By contrast, two-stage phase I-II designs come into play when efficacy cannot be ascertained in a short period of time.Specifically, stage I would commonly focus on toxicity considerations alone for dose (de-)escalation, despite that efficacy data are collected.This is then followed by stage II, where the evaluation of efficacy is the priority.A considerable amount of statistical literature has been written for the two-stage type Rogatko et al. [2008], Le Tourneau et al. [2009], Tighiouart [2019], Jiménez et al. [2020], Jiménez and Tighiouart [2022].
In their proposed phase I-II study design, stage I was inspired by a conducted phase I trial Lockhart et al. [2014] in patients with advanced solid tumors, where a single maximum tolerated dose (MTD) of cisplatin/cabazitaxel 15/75 mg/m 2 was recommended.Based on the available results and other preliminary efficacy data, it was hypothesized by the clinical team that there could be a series of tolerable and efficacious dose combinations for prostate cancer.Over the last years, clinical trials with drug combinations have received a fair amount of attention.This interest is motivated by the fact that drug combinations are able to induce synergistic treatment effects by simultaneously inhibiting resistance mechanisms and targeting multiple pathways.In stage II of the cisplatin-cabazitaxel trial Tighiouart [2019], 30 additional patients were enrolled to identify the dose combinations with high probability of efficacy, along the MTD curve estimated from stage I data.Another characteristic of the cisplatin-cabazitaxel trial Tighiouart [2019] was that each stage has a different patient population, being the population in stage II the one of interest for the clinical team.Then, since the patient populations were not exactly the same in stages I and II, it was hypothesised that the dose-efficacy profiles could differ.Consequently, the dose-efficacy relationship was estimated using stage II data alone.The cisplatin-cabazitaxel trial Tighiouart [2019] can further be improved in two ways: i) uncertainty about the estimated MTD curve should better be taken into account in stage II (i.e., the MTD curve may further be updated during stage II), and ii) efficacy data from stage I could be used for the final analysis by the end of stage II.The first limitation was recently addressed Jiménez and Tighiouart [2022] allowing for a continuous update of the MTD curve throughout the entire phase I-II design.The novelty of the present article lies in addressing the second aspect; that is, we aim to integrate the efficacy information from both stages without neglecting the potential heterogeneity caused by the change of patient population across stages.
A robust Bayesian hierarchical model is fitted to allow combining the efficacy data across stages.The associated parameters are assumed to be either exchangeable or non-exchangeable.In our specific application, the benefit of borrowing is expected to lead to a precision improvement of the model parameter estimates and a reduction number of patients treated at sub-therapeutic dose combinations when there is a consistency between the efficacy profiles across stages.For cases of data inconsistency, the stage I efficacy data needs to be discounted effectively.
The manuscript is organized as follows.In Section 2, we review the cisplatin-cabazitaxel trial, which serves as motivating example for the present work, as well as the proposed marginal dose-toxicity and dose-efficacy models to fit the trial.In Section 3, we introduce the proposed dose-finding algorithm for stages I and II, whereas in Section 4 we present a simulation study to evaluate the operating characteristics of the design, with focus on stage II.We provide concluding remarks in Section 5.
2 Motivating example and statistical models

The cisplatin-cabazitaxel trial and data collection
The original cisplatin-cabazitaxel trial enrolled patients with metastatic, castration resistant prostate cancer.A combination of continuous doses ranging from 10 to 25 mg/m 2 for cisplatin and from 50 to 100 mg/m 2 for cabazitaxel were administered intravenously every three weeks.As informed by a precedent study Lockhart et al. [2014], three specific combinations of cisplatin/cabazitaxel, 15/75, 20/75 and 25/75 mg/m 2 , were evaluated.In stage I, the study enrolls 30 patients using conditional escalation with overdose control (EWOC) algorithm Tighiouart et al. [2017] to estimate the MTD curve.In stage II, the study enrolls another 30 patients from the same population of patients but with visceral metastasis to identify dose combinations with high probability of efficacy along the MTD curve estimated at the end of stage I.These patients are allocated to dose combinations along the MTD curve using a Bayesian adaptive design after modeling the dose-efficacy curve with cubic splines.
The recommended MTD was 15/75 mg/m 2 on the basis of data from 24 patients (i.e., 9 evaluable patients in phase I and 15 patients in the expansion cohort) where only 2 out of 18 patients treated at the recommended MTD had dose limiting toxicity (DLT).Considering the low toxicity rate at the MTD reported Lockhart et al. [2014], as well as other (unpublished) preliminary efficacy data, the clinicians that contributed to the design of the Cisplatin-Cabazitaxel trial hypothesized that a series of tolerable dose combinations which could be efficacious in prostate cancer, could exist.
In this article, we regard the potential differences between stage I and stage II efficacy profiles from a different perspective.More specifically, we are motivated to establish a robust model formally accounting for such uncertainty, so that we can enhance the conduct and analysis of the stage II when there is a certain level of similarity between the efficacy profiles across the patient populations (the stages), as well as to discount the stage I efficacy data in case of dissimilarity.

Problem formulation
Let x and y be the respective dose levels, on their original continuous scales, of two compounds (labelled X and Y ) of interest, and further, {X min , Y min , X max , Y max } be the lower and upper bounds.The measurement scales of x and y might differ from each other substantially.To avoid one variable being overly influential in the risk of toxicity, we standardise the doses using the transformations h 1 (x) = (x−X min )/(X max −X min ) and h 2 (y) = (y −Y min )/(Y max − Y min ), so that the standardised doses fall within the interval of [0, 1].Thus, the dose combination (0, 0) corresponds to the lowest dose combination available in the trial and not to a lack of dose combination administration.For ease of notation, we retain the notation of x and y to denote the standardised dose levels.
Let Z ∈ {0, 1} be the binary indicator of DLT where Z = 1 represents the presence of a DLT and Z = 0 otherwise.Likewise, let E ∈ {0, 1} be the binary indicator of treatment response where E = 1 represents a positive response, and E = 0 otherwise.In this article, following the motivating trial, we assume that only the DLT can be observed rapidly after drug administration (e.g., after one cycle of therapy), whereas it takes three cycles or more for the efficacy outcome to be observable.Following Tighiouart [2019], let θ T = 0.33 be the target probability of DLT and p 0 = 0.15 be the probability of efficacy of the standard of care treatment.When employing synergistic cytotoxic agents, it is common to assume that both the dose-toxicity and dose-efficacy relationship are monotonically increasing functions.This implies that the optimal dose combination (i.e., the dose combination with most desirable benefit-risk trade-off) will lie in the MTD set, defined as M = {(x, y) : P (Z = 1|x, y) = θ T }, i.e., any dose combination (x, y) with probability of DLT equal to θ T .A formal definition of the optimal dose combination is given in section 3. Given the two-stage formulation of this design, let S ∈ {1, 2} be the stage enrollment indicator to stage I and stage II, respectively, and let D S,i = {(Z i , E i , x i , y i )} be the data collected in stage S for the i-th patient.

A marginal dose-toxicity model
We assume that the binary outcomes of toxicity and efficacy are independent Ivanova et al. [2009], Cai et al. [2014], Lyu et al. [2019].Alternatively, one could also account for the relationship between toxicity and efficacy either with the use of a Copula Thall and Cook [2004] or with a latent variable approach Liu et al. [2018], Lin et al. [2020].However, this would add an additional layer of complexity to the design that is not in the scope of the article.Let the model for the marginal probability of DLT be where F (.) is the cumulative distribution function of the logistic distribution, i.e., F (u) = 1/(1 + e −u ).The parameters in this model can be interpreted as follows: i) α 0 determines the probability of DLT at the lowest dose combination available in the trial, i.e., (x = 0, y = 0), ii) α 1 and α 2 determine the contribution of compounds X and Y to the overall probability of DLT, and iii) α 3 captures the potential increase in the probability of DLT due to drug-drug interaction.
Note that in Model (1), because the number of attributable DLTs is expected to be very low given the cytotoxic nature of cisplatin and cabazitaxel, we do not take into account toxicity attributions Jimenez et al. [2019].

A marginal dose-efficacy model
We now shift our focus to estimate the dose-efficacy relationship in a dual-dimensional plane; for stage S = {1, 2}, we stipulate the stagewise dose-efficacy data model as where F (.) remains to be the cumulative distribution function of the logistic distribution.Because the motivating trial employs cytotoxic agents, we assume that the probability of efficacy does not decrease with the dose of any agent when the other agent is held constant.To ensure this property, we apply the exponential function to β 1S and β 2S since exp(u) > 0. We also assume that β 3S > 0, which means that there is a synergistic effect due to the interaction of the two compounds Gasparini [2013].The parameters in this model can be interpreted as follows: i) β 0 determines the probability of efficacy at the lowest dose combination available in the trial, i.e., (x = 0, y = 0), ii) β 1 and β 2 determine the contribution of compounds X and Y to the overall probability of efficacy, and iii) β 3 captures the potential increase in the probability of efficacy due to drug-drug interaction.
Let Ψ S = (β 1S , β 2S ) denote the main effects of the treatment specific to stage.We consider a meta-analytic-combined (MAC) approach Neuenschwander et al. [2016] to establish a Bayesian predictive distribution for Ψ 2 | D 1 , D 2 .This would allow the investigator to estimate main effects of drugs X and Y, using the efficacy data from both stages.
We assume a normal-normal hierarchical model to relate the stagewise main effects of efficacy for the dual-agent.Specifically, at stage S = 1 Continuing the phase I-II trial to stage S = 2 in a new population, we introduce a non-exchangeability distribution and stipulate that with and The variance terms in Φ represent between-stage heterogeneity.Mind that in each stage of the cisplatin-cabazitaxel trial we have two distinct and well designated populations.Consequently, between-stage heterogeneity and betweenpopulation heterogeneity cannot be disentangled.If our trial would involve multiple populations in each stage, we would need additional random-effects distributions to account for the between-population differencesZheng et al. [2020,2021].
The values of m 0 and R 0 are selected so that they induce weakly informative prior distributions over the parameters in Ψ 2 .This Bayesian hierarchical random-effects model is completed by the following hyperpriors: 2 ), τ 1 ∼ HN(z 1 ), τ 2 ∼ HN(z 2 ), ξ ∼ U (0, 0.5), ζ ∼ U (0, 0.5), where HN(z) denotes a half-normal distribution formed by truncating a N (0, z 2 ) so it covers the interval (0, ∞).We select HN(0.5) anticipating for substantial between-stage heterogeneity in the main effects model parameters.Other viable choices of HN(z) and the indication have been notedZheng et al. [2020], Roychoudhury and Neuenschwander [2020].The value z = 0.5 serves as a weakly informative prior distribution, although the values of z 1 and z 2 can be justified appropriately for the user's own case, with evidence suggesting the similarity or dissimilarity of efficacy in such two patient populations.
The specification of ω requires, in practice, the input of subject-matter experts and needs to be fixed a priori.We place weakly informative prior distributions over the dose-efficacy model parameters: For illustration purposes, we set to implement the model.Overall, the weakly informative prior distributions we select in this article translate into the median probability of efficacy estimates with 95% credible intervals displayed in Table S1 of the supplementary material.

An integrated phase I-II design for dose finding
Stage I will enroll a total of N 1 = C 1 × m 1 patients, where C 1 denotes the total number of cohorts in phase I with each of the size m 1 .Stage II will enroll a total of N 2 = n 2 + C 2 × m 2 patients, where n 2 is the number of patients in the first cohort of stage II, C 2 the additional number of cohorts and m 2 its size.
In the original cisplatin-cabazitaxel trial, stage I efficacy data was entirely discarded and therefore, at the beginning of stage II, an initial cohort n 2 was used to collect efficacy data homogeneously across the entire MTD curve.In this article, we choose to keep n 2 as a short run-in period that can inform the data (in)consistency and thus determine the degree of information sharing.
Let N = N 1 + N 2 be the total number of patients that the entire study will enroll and M D1 be the estimated MTD set based on data from stage I.We select m 1 = 2, m 2 = 5 and n 2 = 10.At the end of stage II, we test the following null and alternative hypotheses and we reject the null hypothesis if arg max where δ u = 0.4 is a pre-specified design parameter.Moreover, the dose combination is recommended as the optimal dose combination and is selected for further phase IIb or III studies.
As previously mentioned, stage I is based on the escalation with overdose control (EWOC) principle Babb et al. [1998], Tighiouart et al. [2005Tighiouart et al. [ , 2010Tighiouart et al. [ , 2017]], Tighiouart and Rogatko [2012], Shi and Yin [2013] where the posterior probability of overdosing the next cohort of patients is bounded by a feasibility bound α.For the definition of the algorithm, let λ(Γ X|Y =y |D 1 ) represent the posterior distribution of the MTD of drug X given that the level of drug Y is equal to y (i.e., given that Y is fixed) based on stage I data D 1 (see equation ( 2) for the definition of the MTD).Also, let Λ −1 Γ X|Y =y (α|D 1 ) denote the α-th percentile of λ(Γ X|Y =y |D 1 ).In a cohort with two patients, the first one would receive a new dose of compound X given that the dose y of compound Y that was previously assigned.The other patient would receive a new dose of compound Y given that dose x of compound X was previously assigned.These steps are described in Stage I of Algorithm 1.Using EWOC, these new doses are at the α-th percentile of the conditional posterior distribution of the maximum tolerated dose combinations.The feasibility bound α increases from 0.25 up to 0.5 in increments of 0.05 (see Wheeler et al. (2017)Wheeler et al. [2017]).Accrual continues until the maximum sample size in stage I is reached or the trial is stopped early for safety.
Stage II follows the response-adaptive randomization principle.This type of Monte Carlo algorithm uses the current parameter estimates to sample a cohort of m 2 dose combinations from the estimated dose-efficacy standardized density of π S=2 E (x, y) along the estimated MTD curve.Note that π S=2 E (x, y) uses the Bayes estimates of the dose-efficacy model parameters.As explained in Jiménez and Tighiouart (2022) Jiménez and Tighiouart [2022], because stage II selects doses on the estimated MTD curve M D1 , and there is a one-to-one correspondence between (x, y) ∈ M D1 , we may write π S=2 E (x, y) = π S=2 E (x) for (x, y) ∈ M D1 .In other words, by having the value of x, using the definition of the MTD in equation ( 2) we can easily obtain the corresponding value of y.Thus, to facilitate the definition of the standardize density function, instead of writing π S=2 E (x, y) we simply write π S=2 E (x).The standardized density of the estimated efficacy curve is πS=2 . A rejection sampling algorithm is then used to sample m 2 dose combinations from this density.These steps are are described in Stage II of Algorithm 1.
-Allocate n 2 patients to dose combinations equally spaced along the estimated MTD curve M D1 .
-Calculate the posterior median of the parameters ( β 02 , β 12 , β 22 , β 32 ) using the MAC approach given data D 1 , D 2 .for c 2 = 1 : C 2 do -Generate a sample of dose combinations of size m 2 that belong to M D1 from the (estimated) standardized density π S=2 E (x, y), and assign it to the subsequent cohort of m 2 patients.-Calculate the posterior median of the parameters ( β 02 , β 12 , β 22 , β 32 ) using the MAC approach given data D 1 , D 2 .end for The dose finding algorithm contains the following stopping rules for safety and futility: • Futility stopping rule: For ethical considerations and to avoid exposing patients to sub-therapeutic dose combinations, we would stop the trial for futility if arg max where δ 0 is a pre-specified threshold.For the purposes of illustration in this article, we choose δ 0 = 0.1.Mind that this stopping rule applies only after the run-in cohort of n 2 patients in stage II.

• Safety stopping rule
The design contains two stopping rules for safety, one for stage I and a less stringent one for stage II.During stage I, we would stop the trial if where δ θ1 = 0.5.In contrast, during stage II we would stop the trial if where Θ represents the rate of DLTs for both stages of the design regardless of dose and δ θ2 = 0.9 represents the confidence level (i.e., 90%) that a prospective trial results in an excessive DLT rate.A non-informative Jeffrey's prior Beta(0.5, 0.5) is placed on the parameter Θ.
4 Simulation study

Operating characteristics
In this section, we present a simulation study that will assess the operating characteristics of our design.Since we apply an already established dose-escalation procedure in stage I, we concentrate on evaluating the design's operating characteristics for the stage II, which leverages efficacy data from stage I.We report the simulation results according to the following metrics: • Distribution of the recommended optimal dose combinations.
• Proportion of recommended optimal dose combinations with true probability of efficacy above p 0 .For simplicity, this metric is referred as the percentage of correct recommendation.• (Approximated) Bayesian power (or type-I error probability under H 0 ): where '1(.)' represents an indicator function and J represents the total number of simulated trials, and x + exp β 32 xy .Under null scenarios as defined in (7), the above formula represents the (approximated) Bayesian type-I error probability.
• Average posterior probability of early stopping for futility and safety.
• Proportion of patients in stage II allocated to dose combinations with true probability of efficacy above p 0 .

Scenarios
In stage I, we construct two dose-toxicity scenarios considered as highly plausible by the principal investigator of the motivating trial Tighiouart [2019], Jiménez and Tighiouart [2022].The true dose-toxicity model parameters are presented in Table S2 in the supplementary material, and displayed in Figure 1.Furthermore, we assume that stage II has the same dose-toxicity profile as stage I (i.e., the dose-toxicity profiles does not vary across patient populations) Tighiouart [2019], Jiménez et al. [2020], Jiang et al. [2021].The target probability of DLT is θ T = 0.33.For each of the dose-toxicity scenarios, we consider two different stage II dose-efficacy profiles that place the dose combination with highest efficacy in opposite locations.In terms of the stage I dose-efficacy profiles, we consider the following three hypothetical situations: 1.The stage I and stage II dose-efficacy profiles are perfectly consistent.For reading purposes, we refer to this profile as "complete agreement between stage I and stage II dose-efficacy profiles" or simply as "CA", which is short for "Complete Agreement".2. The stage I and stage II dose-efficacy profiles point to the same dose combination with highest efficacy, but the probabilities of efficacy are different across stages.For reading purposes, we refer to this profile as "partial agreement between stage I and stage II dose-efficacy profiles" or simply as "PA", which is short for "Partial Agreement".
3. The stage I and stage II dose-efficacy profiles are completely different, and place the dose combination with highest efficacy in different locations.For reading purposes, we refer to this profile as "complete disagreement between stage I and stage II dose-efficacy profiles" or simply as "CD", which is short for "Complete Disagreement".
To reflect low to high levels of prior confidence in the efficacy data consistency across stages, we run the simulations per scenario with the prior probability of exchangeability ω = 0, 0.25, 0.5, 0.75, 1.For scenarios under the alternative hypothesis H 1 we assume an effect size of 0.25 (i.e., in all stage II dose-efficacy profiles, the highest probability of efficacy is equal to p 0 + 0.25 = 0.4, with p 0 = 0.15).For scenarios under H 0 , the highest probability of efficacy in stage II is equal to p 0 .
Overall, we have a large a number of comparisons given that for each dose-toxicity profile there are two different stage II dose-efficacy profiles, each coupled with three different stage I dose-efficacy profiles (i.e., CA, PA and CD).
We have scenarios under H 1 and H 0 , and furthermore five different values of ω.To facilitate the communication of the simulation results over a large number of scenarios, we label the configuration of dose-toxicity and dose-efficacy profiles by Scenario A to H as follows: • The true dose-efficacy profile per scenario, with specification of model parameters, is given in Table S3 in the supplementary material, and displayed graphically in Figure 2.
Figure 2: True dose-efficacy profiles favoring the alternative hypothesis H 1 under each dose-toxicity scenarios varying with the dose of Cisplatin.In each efficacy scenario we have the true stage II efficacy profile (red), and three stage I efficacy scenarios: i) one that is exactly like the stage II dose-efficacy profile (in red), ii) one in which the optimal dose combination is the same but the efficacy profile is slightly different (green) and iii) one that is completely different to the stage II dose-efficacy profile (blue).The gray line represents the threshold p 0 = 0.15.The sample sizes for Stages I and II are N 1 = N 2 = 30 and we simulated J = 1000 trials using Algorithm 1.The DLT and efficacy responses were generated from models (1) and (3), respectively.

Results
As discussed in section 2.4, the value ω = 0 implements no borrowing of information.In other words, the treatment efficacy in equation ( 4) would be estimated using data from one stage solely, leading to a complete discard of stage I efficacy data.It is of interest to quantify the improvement achieved by allowing the combination of efficacy data based on the assumption of full exchangeability (with ω = 1) or partial exchangeability (with 0 < ω < 1).
In Figure 3, we display the power and type-I error values obtained at different values of ω > 0 with respect to ω = 0.Under H 0 (i.e., scenarios E-H), if we choose low to medium values of ω (e.g., 0 < ω ≤ 0.25), the type-I error varies between 0 and 0.049 with respect to ω = 0, depending on the scenario and the level of agreement.That is, the type-I error remains very close to its reference value (i.e., with ω = 0).With larger values of ω (i.e., ω > 0.25), the type-I error varies between 0.012 and 0.106 with respect to ω = 0.Under H 1 (i.e., scenarios A-D), the power increases as the value of ω, with differences, with respect to ω = 0, up to 0.121.We notice that for values of ω ≤ 0.25 the power gain is already notable.The numerical results of power and type-I error for all values of ω are presented in the supplementary material (Figure S1).In scenarios under H 1 with ω = 0, the power ranges between 0.66 and 0.93, whereas in scenarios under H 0 with ω = 0, the type-I error ranges between 0.11 and 0.21.These power and type-I error results are consistent with those reported in previous publications Tighiouart [2019], Jiménez et al. [2020], Jiménez and Tighiouart [2022].
Figure 3: Differences in the probability of rejecting H 0 with respect to ω = 0 in scenarios under the H 1 (i.e., power) and under the H 0 (i.e., type-I error).Scenario A-D and E-H correspond to settings under H 1 and H 0 , respectively.In Figure 4 we present the distribution of the recommended optimal dose combinations across all scenarios, agreement levels and values of ω.Overall, we see how the design correctly identifies the most efficacious region of the MTD curve by allocating there the majority of patients.At ω = 0, we notice that the level of dispersion of the distribution is generally higher than with ω > 0. As we increase the value of ω, the dispersion shrinks towards the mode of the distribution.This behaviour is manifested under the levels of CA and PA.These results are as expected: our model effectively discounts efficacy data from stage I if it is not consistent with the efficacy data observed in stage II.In Figure 5, we present the difference in the recommended optimal dose combinations with true probability of efficacy above p 0 (also known as the proportion of correct recommendation), between models with ω > 0 and ω = 0.In settings with ω = 0, the proportion of correct recommendation ranges between 81-100%, which is consistent with the values reported in previous publicationsTighiouart [2019], Jiménez et al. [2020], Jiménez and Tighiouart [2022].For values of ω > 0, such proportion varies between -0.49% and 7.95%.In scenario D, the proportion of correct recommendation with ω = 0 is already practically 100%, which remains the same for values of ω > 0. This explains why the difference in the proportion of correct recommendation between ω = 0 and ω > 0 is approximately 0. In Figure 6, we show the differences in terms of the probability of early stopping for futility, under both H 1 and H 0 .At ω = 0, the probability of early stopping for futility ranges between 0.012 and 0.126 under H 1 , and between 0.478 and 0.612 under H 0 , depending on the scenario.Under H 1 , by increasing ω, we see that the probability of early stopping for futility varies between -0.058 and 0.008, with respect to ω = 0.Under H 0 , by increasing ω, we see that the probability of early stopping for futility varies between -0.167 and -0.060, with respect to ω = 0.It is worth mentioning that in scenarios under H 1 , the probabilities of early stopping for futility with ω = 0 are already low, and therefore it is reasonable that allowing for robust sharing of efficacy data across stages does not have a major impact on the probability of early stopping for futility.On the other hand, in scenarios under H 0 , the probability of early stopping for futility with ω = 0 is, as expected, higher and a big decrease would be problematic.However, we see that by selecting a conservative value of ω, such as ω = 0.25, the decrease in the probability of early stopping is usually lower than 0.1.
In Table S5 of the supplementary material, we show the average sample sizes obtained when applying this early stopping rule.Under H 1 , results show that the observed decrease in the probability of early stopping for futility caused by the increment of ω translated into an average sample size increase of 0-1 patients with respect to ω = 0.Under H 0 , the increment of ω translated into an average sample size increase of 0-2 patients with respect to ω = 0.
In Figure S2 of the supplementary material, we present the difference in the proportion of patients allocated to dose combinations with true probability of efficacy above p 0 in stage II, between models with ω > 0 and ω = 0.With ω = 0, the proportion of patients allocated to dose combinations with true probability of efficacy above p 0 in stage II ranges between 40-95%, which is consistent with the values reported by Tighiouart [2019], Jiménez et al.
In terms of safety, we observe that scenarios A, B, E and F (i.e., dose-toxicity scenario 1 in Figure 1) have an overall (i.e., stage I + stage II) average DLT rate between 27% and 35%, depending on the scenario, with an average proportion of trials with DLT rate above θ T + 0.1 of 0%.Stage II alone in these scenarios has an average DLT rate between 28% and 43%, with an average proportion of trials with DLT rate above θ T + 0.1 of 7% and 42%.Scenarios C, D, G and H (i.e., dose-toxicity scenario 2 in Figure 1) have an overall (i.e., stage I + stage II) average DLT rate between 26% and 35%, depending on the scenario, with an average proportion of trials with DLT rate above θ T + 0.1 of 0%.Stage II alone in these scenarios has an average DLT rate between 28% and 43%, with an average proportion of trials with DLT rate above θ T + 0.1 between 7% and 43%.Because toxicity data is not shared across stages, these values are constant across all values of ω.In Figure S3 of the supplementary material, we display the probability of early stopping for safety.In scenarios A, B, E and F, this probability is close 0.25 whereas in scenarios C, D, G and H is close to 0.05.These values are consistent with the informative prior distributions for the dose-toxicity models and the distance between the dose combination 15/75 mg/m 2 , which has a prior probability if DLT of approximately 0.33, and the true MTD curves.Also, because toxicity data is not shared across stages, we do not present the average sample sizes as they are in line with those reported by Jiménez et al. [2020] in a similar setting.

Discussions
Motivated by a real phase I-II trial that combines continuous dose levels of cisplatin and cabazitaxel involving two different populations of patients with advanced prostate cancer, in this paper we present a phase I-II design in two stages that allows robust integration of efficacy data across relevant patient populations.The main contribution of this article lies in the formal consideration about the uncertainty around the potentially different dose-efficacy profiles across stages.We propose to employ a robust Bayesian hierarchical random-effects model to allow sharing of information on the efficacy across stages, assuming that the related parameters are either exchangeable or nonexchangeable.In other words, the key idea is to exploit any potential similarities between the dose-efficacy profiles so as to borrow information, while avoiding too optimistic borrowing under the presence of data inconsistency across stages.This proposal requires specification of the prior probability that the main effects set of parameters are exchangeable across stages.We denote this prior probability by ω which in practice is selected by subject-matter experts.When ω = 0, the design estimates the stage II dose-efficacy profile independently from stage I Tighiouart [2019] and Jiménez et al. [2020].In this article, we focus on analyzing the operating characteristics of stage II, and we study how these vary, with respect to ω = 0, as we increase the prior probability of exchangeability ω under different stage I dose-efficacy profiles.The selection of the dose-efficacy data model is closely related to the type of compound investigated in a phase I-II clinical trial.With cytotoxic agents the monotonicity assumption is expected to hold also from an efficacy perspective (i.e., a compound will have greater activity as the dose increases).Thus a linear model such as the one defined in (3) will be sufficient to capture the dose-efficacy relationship.However, with other types of compounds such as molecularly targeted therapies, more flexible modelling approaches may be needed to capture dose-efficacy relationships where the probability of efficacy may not even increase with the dose.
We have limited the simulations to the two main dose-toxicity scenarios considered by the principal investigator of the motivating example.In each of these dose-toxicity profiles, we have studied two different stage II dose-efficacy profiles, each one accompanied with three stage I dose-efficacy profiles that have different levels of similarity with respect to the stage II dose-efficacy profile.Also, because we allow the main effects set of parameters to be exchangeable across stages, similarity or agreement across stages is based only on these two parameters.However, depending on the application and the definition of the dose-efficacy profile, this work could be extended by tweaking the JAGS code, which we have made publicly available, so it includes other parameters in the set of parameters that could be exchangeable across stages.
The evaluation we present in this article aims to assess whether the overall operating characteristics of design improve by allowing robust integration of efficacy data across stages in scenarios under complete agreement, partial agreement and complete disagreement between the stage I and stage II dose-efficacy profiles.In other words, we aim to evaluate how much we can benefit from sharing efficacy data across stages when efficacy the data is completely or partially consistent across stages based on the main effects set of parameters, but also to what extend we expect to penalize the design's operating characteristics when the efficacy data is inconsistent across stages.The assessment is done in the original setting with continuous dose combination levels under H 0 and H 1 following the case study described in section 2.
In scenarios favoring the alternative hypothesis and visualized in Figure 2 we observe a generalized improvement of the operating characteristics by permitting sharing the efficacy data across stages (i.e., ω > 0).The degree of improvement would depend however on the genuine extent of consistency between the stage-wise efficacy profiles and of course on the value of ω that we select.We note that with ω = 0.25 there is already a considerable improvement of the designs operating characteristics in comparison with higher values of ω.
In scenarios under the null hypothesis we observed a small inflation in the type-I error and a slight decrease in the probability of early stopping for futility.Under this hypothesis, having a high value of ω in situations of complete disagreement across the stage I and stage II efficacy profiles generally yields the worst performance.However, with ω = 0.25, the differences are much smaller with respect to settings in which there is complete or partial agreement across the stage I and stage II efficacy profiles.
Overall, we believe that allowing for sharing of efficacy data across stages increases the probability of finding an appropriate dose combination for further phase III studies.However, this approach requires preliminary knowledge on the drug combination.We regard this as acceptable, since there is not a unique design configuration that will fit all applications.For example, in our proposal we allow the main effects set of parameters to be exchangeable or nonexchangeable across stages, and thus our definition of similarity or agreement across stages is based solely on the main effects parameters.Moreover, we have seen that two dose-efficacy profiles that are similar in terms of the two main effects set of parameters can have completely different intercepts, and can potentially induce either a power loss or a type-I error inflation.Therefore, a clear understanding of what is considered "similar" is key to decide how we want to synthesise the efficacy data across stages.A potential solution to this problem could be to include the intercept in the set of parameters for the assumption of exchangeability or non-exchangeability.
One potential extension of the methodology presented in this manuscript, which we plan to explore in the future, is to robustly combine toxicity data across stages in this particular setting.By doing so, we would eliminate the assumption that the dose-toxicity profiles are equivalent across different patient population and we would account for population-specific characteristics with respect to the MTD.

Figure 1 :
Figure 1: MTD curves obtained with the dose-toxicity model parameter values presented in Table S2 in the supplementary material.The point at the cisplatin/cabazitaxel 15/75 mg/m 2 combination represents the MTD found by Lockhart et al. (2014)Lockhart et al. [2014].
Stage II & Stage I under CA Stage I under PA Stage I under CD

Figure 4 :
Figure 4: Distribution of the recommended optimal dose combinations in scenarios under the alternative hypothesis (A-D) and levels of agreement CA, PA and CD.The black curve represents the MTD curve.

Figure 5 :ω
Figure 5: Difference in the proportion of correct dose combination recommendation between models with ω > 0 and ω = 0 in settings under H 1 .

Figure 6 :ω
Figure6: Difference in the proportion of trials with early stopping for futility, between models with ω > 0 and ω = 0 under H 1 (A-D) and H 0 (E-H).

Figure S2 :ω
Figure S2: Difference in the proportion of patients allocated to dose combination with true probability of efficacy above p 0 (i.e., patients correctly allocated) in stage II with ω > 0 with respect to ω = 0.

Figure S3 :ω
Figure S3: Probability of early stopping for safety.
Mourad Tighiouart, Quanlin Li, and André Rogatko.A bayesian adaptive design for estimating the maximum tolerated dose curve using drug combinations in cancer phase i clinical trials.Statistics in medicine, 36(2):280-290, 2017.Graham M Wheeler, Michael J Sweeting, and Adrian P Mander.Toxicity-dependent feasibility bounds for the escalation with overdose control approach in phase i cancer trials.Statistics in medicine, 36(16):2499-2513, 2017.Ying Yuan and Guosheng Yin.Bayesian dose finding by jointly modelling toxicity and efficacy as time-to-event outcomes.Journal of the Royal Statistical Society: Series C (Applied Statistics), 58(5):719-736, 2009.Haiyan Zheng, Lisa V Hampson, and Simon Wandel.A robust bayesian meta-analytic approach to incorporate animal data into phase i oncology trials.Statistical methods in medical research, 29(1):94-110, 2020.Haiyan Zheng, Lisa V Hampson, and Thomas Jaki.Bridging across patient subgroups in phase i oncology trials that incorporate animal data.Statistical Methods in Medical Research, 30(4):1057-1071, 2021.

Table S3 :
True dose-efficacy model parameter values for stage I and II scenarios under H 1 .

Table S4 :
True dose-efficacy model parameter values for stage I and II scenarios under H 0 .

Table S5 :
Average sample size under the early stopping for futility rule under H 1 (Scenarios A-D) and H 0 (Scenarios E-H).FigureS1: Probability of rejecting H 0 in scenarios under H 1 (i.e., power) and H 0 (i.e., type-I error).Scenario A-D and E-H are scenario under H 1 and H 0 , respectively.