Dynamic enrichment of Bayesian small‐sample, sequential, multiple assignment randomized trial design using natural history data: a case study from Duchenne muscular dystrophy

In Duchenne muscular dystrophy (DMD) and other rare diseases, recruiting patients into clinical trials is challenging. Additionally, assigning patients to long‐term, multi‐year placebo arms raises ethical and trial retention concerns. This poses a significant challenge to the traditional sequential drug development paradigm. In this paper, we propose a small‐sample, sequential, multiple assignment, randomized trial (snSMART) design that combines dose selection and confirmatory assessment into a single trial. This multi‐stage design evaluates the effects of multiple doses of a promising drug and re‐randomizes patients to appropriate dose levels based on their Stage 1 dose and response. Our proposed approach increases the efficiency of treatment effect estimates by (i) enriching the placebo arm with external control data, and (ii) using data from all stages. Data from external control and different stages are combined using a robust meta‐analytic combined (MAC) approach to consider the various sources of heterogeneity and potential selection bias. We reanalyze data from a DMD trial using the proposed method and external control data from the Duchenne Natural History Study (DNHS). Our method's estimators show improved efficiency compared to the original trial. Also, the robust MAC‐snSMART method most often provides more accurate estimators than the traditional analytic method. Overall, the proposed methodology provides a promising candidate for efficient drug development in DMD and other rare diseases.


INTRODUCTION
Duchenne muscular dystrophy (DMD) is a potentially deadly inherited genetic disease with a birth prevalence of 19.8 per 100,000 live male births (Crisafulli et al., 2020).Patients often progressively lose the ability to walk or function independently and die at a young age from lung or heart problems.There is an unmet need for effective treatments in this patient population.Given the limited number of patients affected by DMD, conducting separate dose-finding and randomized confirmatory trials is nearly impossible.Moreover, variability in disease progression, challenges in maintaining patient blindness to treatment, and ethical issues with long-term follow-up for placebo controls complicate trial design in DMD and other rare diseases (Muntoni et al., 2022).This paper aims to address some of the challenges in DMD and rare disease trial research and is motivated by the SPITFIRE trial (NCT03039686), a randomized, two-phase, double-blind, placebo-controlled study (see Figure 1a) assessing the efficacy, safety, and tolerability of a new therapy (RO7239361) in 6-11 years old, ambulatory boys with DMD.We develop statistical methodology that improves the design and analysis of DMD trials using all relevant information in the trial and external control data (i.e., relevant patient information gathered from sources outside of the prospective study).
In the SPITFIRE trial, patients were equally randomized to one of two doses (low and high) of treatment or a placebo arm in Stage 1.After completion of Stage 1 (48 weeks after the study entry), patients in the placebo arm were randomized to either high or low dose in Stage 2. The trial's primary analysis planned to use only Stage 1 data which ignores critical information generated in Stage 2 about the drug.Moreover, the SPITFIRE design did not allow patients who performed poorly with low dose in Stage 1 to receive high dose in Stage 2. Finally, the SPITFIRE trial did not formally incorporate external control data.Further innovations are necessary to consider a trial design where patients receive multiple stages of treatment and analytic methods where treatment effect estimates consider data from all stages and from external controls.
This paper proposes a new small-sample, sequential, multiple assignment, randomized trial (snSMART) design with an efficient Bayesian analytic approach that synthesizes all relevant information and provides precise treatment effects for better decision-making about drugs in rare, life-threatening diseases like DMD.However, combining data from different sources pose unique challenges, including the choice of relevant external information, potential conflict between external and concurrent control, and "drift" in clinical outcomes between the two stages due to differences in the population.Lack of con-sideration of these factors may result in biased treatment effect estimates and increase false positive results.We modify existing snSMART designs and develop a modelbased approach that enables the use of external control data and combines data across trial stages resulting in more precise estimates of the treatment effects.Given the small-sample size, the proposed model must be parsimonious and robust when there is conflict between different data sources.

SPITFIRE TRIAL: A MOTIVATING EXAMPLE OF PHASE IIB/III TRIAL IN DMD
We use the SPITFIRE trial setting and published results to demonstrate the utility of the proposed design and analytic approaches.Though SPITFIRE was stopped early for futility, it reflects the development paradigm for a new drug in many rare diseases.For example, a similar design was used for the ATMOSPHERE (NCT04704921) study.Note that we have no intention to comment on or judge the clinical activity of the treatments involved or any decisions by sponsors or regulators associated with the trial or compound.
In the SPITFIRE trial, a total of 166 participants were randomized to receive one of the two doses of RO7239361 or placebo (1:1:1) for 48 weeks at Stage 1.After completion of Stage 1, subjects entered an open-label stage where participants who originally received placebo were randomized into low or high-dose groups, and all other participants stayed on their original treatment for up to 192 weeks.The primary and one of the secondary outcomes were the change (from baseline to week 48) of Stage 1 North Star Ambulatory Assessment (NSAA) total score and 6-min walk distance (6MWD) test.
The original SPITFIRE design is similar to an snSMART (Tamura et al., 2016) but lacks many fundamental features.SPITFIRE (a) re-randomized only the participants in the control group in Stage 2 and (b) used only Stage 1 data in the primary analysis.This differs from the snSMART design used in the ARAMIS study (NCT02939573; as shown in Web Figure 5; Tamura et al. 2016), where data from both stages were used to estimate Stage 1 treatment effects.There has been notable progress in snSMART methods (Hartman et al., 2021;Wei et al., 2018Wei et al., , 2020) ) that have been coded into an R package (Wang & Kidwell, 2022).Other snSMART extensions include a group-sequential design that allows early stopping of an arm (Chao et al., 2020) and inclusion of multiple dose levels (Fang et al., 2021(Fang et al., , 2022)).
In recent years, there has been an increased interest to consider the use of external control data to expedite the development of a new drug in rare diseases with an unmet medical need.Including external control data allows more patients to receive the investigational drug aiding in recruitment and retention, a smaller sample size, and therefore faster development.However, heterogeneity between external control and concurrent trial data is often a limiting factor in real-life applications.Yet, many approaches have been proposed to leverage external data in clinical trial while addressing heterogeneity between data sources, different types of available data (i.e., individual patient data or aggregated data), and outcomes of interest.Bayesian approaches for borrowing external control information differ in terms of assumptions regarding the relevance and exchangeability of external data with current trial data (Wadsworth et al., 2018).Use of the power prior approach directly down-weights the external control using fixed weights (Ibrahim & Chen, 2000).Dynamic methods adjust the informativeness of the prior based on measures of conflict between external control and the new trial data.Notable dynamic methods include normalized power prior (Duan, 2005;Neuenschwander et al., 2009), commensurate priors (Hobbs et al., 2011), and robust meta-analytic-predictive priors (Neuenschwander et al., 2010(Neuenschwander et al., , 2016;;Spiegelhalter et al., 2004;Schmidli et al., 2014).In the context of basket trials, Ouma et al. (2022) explored Bayesian treatment effect borrowing and treatment response borrowing models that can be expanded to enable external control data borrowing.In addition to Bayesian methods, frequentist methods such as propensity score based matching (Rosenbaum & Rubin, 1983), stratification, and inverse probability weighting (Lin et al., 2018) are widely used when aggregate level information and baseline covariates are available.
Existing snSMART designs and methods do not incorporate external control data.In DMD, the Duchenne Natural History Study (DNHS) conducted by the Cooperative International Neuromuscular Research Group (CINRG), which along with the PRO-DMD-01 Prospective Natural History Study (NCT01753804) and the University College London Natural History Study (NCT02780492) provide rich, external control information that can be combined with concurrent trial placebo data.Motivated by the DMD setting, we aim to fill this gap with an innovative snSMART design and Bayesian model as an alternative to the current practice of DMD and rare disease drug development.The new approach proposes three key improvements to current snSMART and rare disease design and methods: (a) use of external control data to reduce the sample size of the placebo arm, (b) allow Stage 1 low-dose nonresponders to receive higher dose in Stage 2, similar to the design proposed by Fang et al. (2021Fang et al. ( , 2022)), and (c) use a Bayesian hierarchical model that dynamically incorporates external control data and borrows information across both trial stages.These features are extremely attractive in the rare disease setting where sample sizes and opportunities to perform clinical trials are limited.Our proposed design and methods assume (a) the study condition remains relatively stable throughout the trial period (i.e., when there is no treatment effect, patients' primary outcomes do not fluctuate dramatically), (b) an adequate washout period between the two trial stages, (c) exchangeable treatment effects between stages, and (d) similar external control and placebo effects.An snSMART design is not appropriate for conditions that are not stable during the trial period, and we offer modifications to the model and sensitivity analyses to address other violations of these model assumptions.

Proposed modification for SPITFIRE trial design
Our proposed snSMART design is shown in Figure 1b, where eligible patients are randomized unequally (e.g., 1:2:2 or 1:3:3) between placebo, low dose, and high dose (e.g., of RO7239361) in Stage 1.After 48 weeks, participants are re-assigned or re-randomized to either the same or a different dose of treatment depending on their initial treatment and post-baseline NSAA total score.Here, we define a participant as a treatment responder at week 48 if their post-baseline NSAA total score increases, stays the same, or does not decrease by more than 3.1 (Muntoni et al., 2018).Participants who received placebo in Stage 1 are rerandomized with equal probability to either the low-dose or high-dose treatment arm in Stage 2, regardless of their Stage 1 response.This is beneficial to participants in the trial because everyone receives a dose of the treatment.This design differs slightly from the ones proposed by Fang et al. (2021Fang et al. ( , 2022) ) in Stage 2 for those who received low or high dose in Stage 1.These slight modifications may make the design more patient-centered.Participants who received low dose in Stage 1 are assigned to stay on low dose if they responded in Stage 1 or switch to high dose if they did not respond.Participants who first received high dose and responded are re-randomized equally between low and high doses.However, the nonresponders to high dose are discontinued in Stage 2. In most settings, the rerandomization of high-dose responders is a viable design option because low dose may continue to be effective and possibly more tolerable.On the other hand, when the high dose proves ineffective for some participants, it is unlikely that a lower dose will yield better results for them.Additionally, administering an ineffective high dose that potentially poses higher toxicity is not ethical.Thus, we chose to exclude Stage 1 high-dose nonresponders in Stage 2. METHODS The following notation is used in this section.  and   denote the observed and underlying true mean change in outcome (e.g., NSAA score) from the baseline values for stage  = 1, 2 and treatment  =  (placebo),  (low dose), ℎ (high dose), respectively. ′  and  ′  ( = 1, 2, … ,   ) denote the observed and true placebo mean change in outcome from baseline for external control data.For example, in the SPITFIRE trial,  1 ,  1 , and  1ℎ represent the true mean change from baseline in the NSAA total score or 6MWD for placebo, low, and high-dose groups, respectively.The key estimands of interest are the Stage 1 differences in effects between each dose and placebo ( 1 −  1 ,  1ℎ −  1 ).Traditional analyses for  1 (change from baseline in NSAA or 6MWD) in SPITFIRE or similar studies include analysis of variance (ANOVA), analysis of covariance (ANCOVA), or a conjugate Bayesian model (normal data and normal prior).Note that a traditional analytic method excludes external control data and uses only Stage 1 outcomes.

Bayesian joint stage model
A modified version of the existing Bayesian joint stage model (BJSM) by Fang et al. (2022) may be used to analyze our snSMART design.Though it uses data from both stages, the existing BJSM has not previously been presented to formally incorporate external data.One indirect or crude way to incorporate external control data is to use informative normal distribution priors for the model parameter associated with the placebo effect.For the BJSM, normal distribution prior parameters may be derived using the method of moments approximation using external data, where the variance is further adjusted to ensure the desired prior effective sample size (ESS, see Web Appendix B for an example).

Meta-analytic combined approaches
The meta-analytic combined, or MAC-snSMART, approach is a novel and unified analytic framework that incorporates "all" relevant information for efficient estimation of Stage 1 treatment effects,  1 ,  1 , and  1ℎ .
We use a Bayesian hierarchical model framework to dynamically borrow information from different sources (e.g., external control, Stage 2).The framework consists of a series of models that link different sets of parameters (as shown in Web Figure 6).However, implementation of such a framework in the snSMART setting requires innovation to (a) handle heterogeneity between different sources of placebo data, (b) account for the potential conflict between internal and external placebo data despite careful selection of external data, and (c) account for selection bias due to non-randomized treatment assignments in Stage 2 for participants who received low dose or did not respond to high dose in Stage 1.

Use of external information for placebo
Controlling for potential bias is a primary concern when considering external control data in a clinical trial.To handle bias due to potential unmeasured confounders in the design phase and increase the validity of results, statistical methods should be prespecified.).If patient-level information about important predictors is available, the model can be extended via meta-regression.In contrast to the classical exchangeability assumption which assumes the same variance for  ′  and  1 , this structure accounts for external control data outliers by allowing for different betweentrial standard deviations.However, the exchangeability assumption results in overly optimistic borrowing and causes biased estimates when information from different sources' conflicts.
The amount of information borrowed from external control data while estimating the treatment effects for control, low-, and high-dose groups can be quantified.This amount of borrowed information can be expressed as the ESS.Here, we use the expected local-information-ratio, which fulfills a basic predictive consistency requirement, as introduced by Neuenschwander et al. (2020).We, like Neuenschwander et al. (2020), recommend that the ESS of each external control dataset should not exceed the number of participants on the placebo arm in the trial.

Robustification
To avoid overly optimistic borrowing, we include a mixture model adopted from Neuenschwander et al. (2016) for  ′  and  1 .The proposed model allows for nonexchangeability of any of the parameters associated with the placebo effect: and  1 are fixed a priori to reflect confidence about the similarity between external and concurrent control data.The values of  ′  ,  1 ,  ′  and  1 are chosen based on expert knowledge.Weakly informative priors, such as priors worth approximately one observation (Kass & Wasser-1995), are used for non-exchangeability parameters ( ′  ,  1 ,  ′  , and  1 ).The derivation of the posterior distribution can be found in Web Appendix A. We refer to the method as "robust MAC-snSMART" if the robustification component is included in addition to the structure outlined in Section 3.2.1.If this component is not included (  =  1 = 1), the method is simply referred to as "MAC-snSMART".
Under the MAC-snSMART approach, the conditional distribution of   follows  1 ,  2 ∼ (  ,  2  ), where  = , ℎ, given that we assume Stages 1 and 2 expected outcomes for the same treatment follow the same normal distribution.That is, the treatment effects of the same dose level are fully exchangeable across trial stages.Since the snSMART design is the most appropriate for relatively stable conditions and requires a washout period between stages, we believe that this assumption of exchangeable treatment effects is valid in rare disease settings where an snSMART would be applied.In cases where stagewise non-exchangeability is likely to occur, the robustification component described in Section 3.2.2can be easily incorporated into  2 ′ in Formula 1 to account for non-exchangeable treatment effects across trial stages.

3.2.4
Prior specification We suggest generalizable, weakly informative normal distributions as priors for   and   , that is, priors that are worth approximately one observation (Kass & Wasserman, 1995).To ensure the identifiability of the bias parameters   ,  ℎ and  ℎ , we use weakly informative uniform distributions or normal distributions that cover all possible bias values as their priors (Verde, 2021).We suggest using non-informative Unif(−1, 1) prior distributions for   ′ since the correlation between Stage 1 treatment  and Stage 2 treatment  ′ outcomes ranges from −1 to 1 and is uncertain.
Half-normal priors with standard deviations roughly equal to  ′  ∕2,  1 ∕2 and   ∕2 for  ′  ,  1 and   , respectively, are used to cover very small to large between-trial heterogeneity (Spiegelhalter et al., 2004).According to the bias-corrected meta-analysis model proposed by Verde (2021), roughly four participants worth of information is provided by the priors of the bias parameter.Hence, we recommend half-normal priors with standard deviation roughly equal to  1 ∕4,  1 ∕4, and  1ℎ ∕4 for   ,  ℎ , and  ℎ , respectively.Then, the treatment effects of low dose in Stages 1 and 2 follow the same normal distribution and are therefore exchangeable.Similarly, the high-dose treatment effects in Stages 1 and 2 are exchangeable.

SIMULATION SETTINGS
We assess the sensitivity of our proposed MAC-snSMART methods under various data-generating settings, treatment effects, sample sizes, and lack of exchangeability between external control and current snSMART data.We assess two different data-generation processes, the first one matches with the proposed model, and the second one allows violation of exchangeability between Stage 1 and 2. The first data-generation process follows the assumption of the MAC-snSMART so that  1 is set to a predetermined treatment effect and  2 is randomly generated from ( 1 , 0.25 2 ).Thus, the summary level Stage 1 treatment outcomes are generated based on (  , 0.5 2 ), the Stage 2 outcomes are randomly generated according to formula 1, and   ′ are randomly chosen within a certain range, that is,   ∈ (−0.20, 0.20),  ℎ ∈ (−0.15, 0.15),   ∈ (0.70, 1),  ℎ ∈ (−0.50, 0),  ℎ ∈ (−0.30, 0.30), a n d  ℎℎ ∈ (0.70, 1).In the second data-generation process  1 and  2 are not exchangeable.We set  1 to a predetermined treatment effect, and let  2 =  1 + 1. Formula 1 is used to randomly generate Stages 1 and 2 treatment outcomes.This type of data may result from an snS-MART, where a washout period between Stages 1 and 2 is inadequate.
In addition to testing different data-generation processes, we investigate the performance of our proposed models considering four treatment effect scenarios.Here, considering the DMD context, we use an NSAA score of four as the threshold for categorizing responders and nonresponders at the end of Stage 1.In scenario 1, we assume that the new drug is ineffective ( 1 =  1 =  1ℎ = 0).In scenario 2, we assume that only the high dose is effective ( 1 =  1 = 0 <  1ℎ = 6).For scenario 3, the low dose has a small, but not clinically meaningful treatment effect and high dose has a clinically meaningful treatment effect ( 1 <  1 = 2 <  1ℎ = 6).In contrast, both low and high doses are assumed to have a clinically meaningful effect for scenario 4 ( 1 = 0 <  1 = 4 <  1ℎ = 8).The last two scenarios assess the sensitivity of our model to the alignment of external control and current data (scenario 5:  ′  ≠  1 for some ; scenario 6:  ′  ≠  1 for all ).A summary of all simulation scenarios can be found in Web Table 3.
Under each data-generating process, 10,000 realizations per scenario were simulated.Each realization includes five sets of summary-level (mean and standard deviation) external control data and one concurrent snSMART dataset.Estimations from the MAC-snSMART, robust MAC-snSMART, traditional analysis, and BJSM are compared.We calculate coverage rate, root-mean-square error (rMSE), average bias, and average width of the 95% credible interval (CI) of each estimator.Note that the Monte Carlo error for all average bias and CI width is less than 0.005.Finally, type I error (under scenario 1) and power (under Scenarios 2-6) of different methods are also calculated using the probability that the 95% CIs of θ1 − θ1 and θ1ℎ − θ1 do not include 0. Results are provided for sample size of 25 and 50 randomized with a 1:2:2 ratio to the placebo, low-dose, and high-dose arms, respectively.All computations are done via the R function jags in R package rjags (Plummer, 2022).

Results
For the data-generation process where assumptions of exchangeability are upheld (data-generating process 1), the rMSE, average bias, coverage rate, and average CI width for estimators of the expected treatment effects are shown in the left columns of Figure 2, Web Figures 1-3, respectively.The estimators from the robust MAC-snSMART method have smaller rMSEs than the traditional method and BJSM.The robust MAC-snSMART method provides similar coverage compared with the traditional method while having smaller 95% CI width.Even though the robust MAC-snSMART method has slightly higher average biases, the biases are negligible.
A comparison of all metrics among the traditional method, BJSM, and the MAC-snSMART methods clearly shows that estimation of the placebo effect is improved by including external control data, even when external control data are not entirely aligned with the placebo data in the current snSMART ( ′  ≠  1 for some ).However, when external control data are not completely aligned with the current trial data (simulation scenarios 5 and 6), the placebo effect estimate is less biased under the robust MAC-snSMART method.Scenario 6 verifies that the robustification component introduced in Section 3.2.2F I G U R E 2 Simulated root-mean-square error (rMSE) for the estimators of  1 .  is the stage  treatment effect of treatment , where  = 1, 2,  = , , ℎ, p = placebo, l = low dose, and h = high dose.Two hierarchical models: MAC-snSMART (MS) and robust MAC-snSMART (RMS) methods are compared against the traditional method and the Bayesian Joint Stage Model (BJSM).The results of total sample size 50 are shown as the colored bars, while the results of total sample size 25 are shown as the overlaying gray bars.The simulation settings are described on the top of each graph, where   denotes the true value of the expected treatment effects of treatment  in stage ,  = 1, 2,  = , , ℎ, and some/cmplt unaligned means some/all of the placebo treatment effects in external control data are inconsistent with the placebo treatment effect in the current trial.This figure appears in color in the electronic version of this paper, and color refers to that version.Note: The type I error of all presented methods is defined as the probability that the credible intervals of θ1 − θ1 and θ1ℎ − θ1 do not include 0 when there are no treatment effect differences between low dose and placebo and high dose and placebo.Data-generating process 1 generates data-sets under the assumption that the expected treatment effect of the same dose level follows the same normal distribution across stages, and data-generating process 2 generates datasets without a hierarchical structure or exchangeability between stages. denotes the total number of participants in the trial.Two hierarchical models: MAC-snSMART (MS) and robust MAC-snSMART (RMS) are compared against the traditional method and the Bayesian Joint Stage Model (BJSM).
is effective in avoiding overly optimistic external control borrowing, even when the external controls are completely unaligned with the current trial data.The robust MAC-snSMART and the MAC-snSMART methods estimate lowand high-dose effects adequately under all scenarios.
The right columns of Figure 2, Web Figures 1-3 present the rMSE, average bias, coverage rate, and average CI width for estimators of the treatment effects when data are generated violating the exchangeability assumption between Stages 1 and 2 (data-generating process 2).Due to information borrowing across both stages, the MAC-snSMART leads to larger positive average biases and subsequently lower coverage rates and higher rMSE values compared to the traditional method.The decrease in coverage rate can be mitigated by setting a larger standard deviation for the half-normal prior of   or incorporating the robustification component (Section 3.2.2) into the distribution of  2 ′ in Formula 1.The BJSM is less susceptible to this issue since a "shift" parameter  is incorporated in the model (Fang et al., 2022).Under this data-generation setting, the traditional method provides the best placebo treatment effect estimators among the presented methods, and the BJSM provides the best low-and high-dose treatment effect estimators.Conclusions were consistent for sample size 25 under both data-generation processes and all scenarios.
Table 1 and Web Figure 4 present the type I error and power of each model.The MAC-snSMART methods have smaller type I errors than the traditional method when its assumption of full exchangeability between stages is upheld, and the type I error is still reasonable (below 0.1) when this assumption is violated.The BJSM has the smallest type I error when the exchangeability assumption between two stages is violated.When the treatment effect is clinically meaningful (i.e., ⩾ 4), all methods have power close to 1 under all scenarios and sample sizes.The MAC-snSMART methods have a significant increase in power compared to the traditional method and the BJSM when the treatment effect is equal to 2. The power gain is even greater for a smaller sample size ( = 25).Overall, the robust MAC-snSMART is the best-performing model when exchangeability of stage-wise treatment effects is upheld.

RE-ANALYSIS OF SPITFIRE TRIAL
To illustrate the practical utility of the proposed snS-MART design and MAC-snSMART methods, we conducted a reanalysis of the SPITFIRE study.Given that the trial stopped for futility after Stage 1, we only have summary-level NSAA total scores and 6MWD at baseline and week 48 (details in Web Table 2).To create a dataset that matches our proposed trial design, we simulated Stage 1 patient-level data by randomly drawing from normal distributions based on the SPITFIRE data.NSAA outcome data for placebo ( = 30), low dose ( = 29) and high dose ( = 33) were generated using (−2.99,0.65 2 × √ 30), (−3.44,0.67 2 × √ 29), and (−2.41,0.64 2 × √ 33), respectively.As per study protocol, we set a change of ⩾ −3.1 points from baseline as the threshold for a clinically meaningful treatment effect (Muntoni et al., 2018) and to categorize Stage 1 responders and nonresponders.Stage 2 patient-level treatment outcomes were again randomly generated using formula 1 with   ′ randomly chosen between −1 and 1, and lowand high-dose outcomes drawn from (−3.44, 0.67 2 √  2 ) and (−2.41,0.64 2 √  2ℎ ), respectively.Our proposed snS-MART design and randomly generated Stage 1 outcomes dictated  2 and  2ℎ .We followed the same procedure to simulate Stage 1 and Stage 2 6MWD data.
We used CINRG DNHS data as the source of external control in this re-analysis.Data from DNHS participants who met the SPITFIRE trial eligibility criteria and had NSAA total score or 6MWD records were used for the Note: θ1 is the estimated stage 1 treatment effect or change from baseline to 48 weeks in the NSAA or 6MWD for treatment , where  = , , ℎ,  = placebo,  = low dose, and ℎ = high dose.Four analytic methods: original SPITFIRE trial results, traditional analytic method, robust MAC-snSMART (RMS), and Bayesian joint stage modeling (BJSM) are compared."NSAA" stands for "North Star Ambulatory Assessment total score", and "6MWD" stands for "6-minute walk distance".The 95% confidence or credible intervals of the estimates are shown in the parentheses.
external control group.Our model assumption of exchangeability between external and current trial controls seems valid due to careful selection of control data and similarity of demographics and disease severity in patients.Since the participant visit schedule of DNHS was different from the SPITFIRE trial, for each participant, we picked the test record with "days from baseline" closest to 336 (48 × 7) as their "Week 48" record.In the end, for NSAA total score, data from 25 participants were used for the external control data with a mean NSAA total score change from baseline being −1.04 and its standard deviation being 0.77.The same data were used for 6MWD with a mean 6MWD change from baseline being −22.36 and its standard deviation being 27.98.
Considering the wide variation in outcomes, 30,000 realizations were simulated for both 6MWD and NSAA total score to assess model performance.When implementing the robust MAC-snSMART method, we carefully followed the prior specification rules outlined in Section 3.2.4.For example, based on the SPITFIRE Stage 1 NSAA total scores, most of the observed biases   ,  ℎ , and  ℎ should range between 0 and 7. Therefore, we used a conservative uniform distribution Unif(0, 15) as the priors for   ,  ℎ , and  ℎ to cover the range.The details of all other prior specifications can be found in the R code provided.
We fitted the traditional analytic method, BJSM, and robust MAC-snSMART method, with results shown in Table 2.Note that the estimators obtained from the robust MAC-snSMART method and BJSM are consistent with each other and have significantly smaller CI widths than the traditional method because of the efficient use of data across both stages.Thus, even though the BJSM and the robust MAC-snSMART reached the same conclusion as the SPITFIRE trial, that is, failing to reject the null hypothesis, more precise treatment effects estimations were provided by these two methods.

DISCUSSION
In this paper, we were motivated by the current drug development paradigm for DMD and other similar rare diseases to present an alternative design and Bayesian methods with efficient use of all available evidence including data from both stages and external control.We have proposed a new snSMART design and robust MAC-snSMART method to estimate the treatment effect of placebo, low and high doses with a continuous outcome of interest.The robust MAC-snSMART method provides accurate and robust estimators when the expected treatment effect of the same dose level across stages are similar.Our proposed snS-MART design and robust MAC-snSMART methods are aligned with the mission of FDA's Complex Innovative Design program (2020).We have provided guidelines for prior distributions and alternative models for sensitivity analyses in practical implementation.At the planning stage, it is critical to consider a wide number of scenarios (like those presented in Section 4) to understand the impact of model assumptions, prior choice, and sample size for the proposed design and analytic method.This exercise is crucial for sponsors and regulators to understand the practical efficiency and robustness of the model.The proposed robust MAC-snSMART method assumes treatment effects from Stages 1 and 2 are exchangeable, which relies on this assumption of stable disease (and an adequate washout period) to assume stable treatment effects across stages.Diseases like DMD used as motivation here, corticobasal degeneration (CBD), and familial Mediterranean fever (FMF), which are slower to progress, are good candidates for the proposed snSMART design and analytic methods.If first-and second-stage treatment effects are not similar or there is not an adequate washout period, a multi-stage design and robust exchangeable model may not be appropriate and lead to biased estimation of treatment effects.At the end of the trial, sensitivity analyses that compare results from a traditional analytic method and the robust MAC-snSMART method can be conducted to assess these assumptions.
The proposed model is tested with extensive simulation studies across various scenarios.The scenarios include incorporating external control data which are consistent and inconsistent with the current trial placebo arm.Note that this is not equivalent to a simple pooled analysis of external and concurrent control data as the MAC-snSMART model takes into account between-trial heterogeneity.For example, in the SPITFIRE trial re-analysis, the ESS for the robust MAC-snSMART analysis is 12, which is less than the sample size of external placebo data ( = 25).Thus, more data contributing to the placebo will not always be useful for decreasing the number of patients needed on the placebo arm.It depends on the degree of heterogeneity between different data sources.
While this paper concentrates on enriching the control arm, the MAC framework permits enrichment of treatment arms.However, we believe relevant treatment data are less likely to be available for use in dose-finding studies.An exception may be the use of adult data in pediatric drug development, but there is debate and uncertainty surrounding the validity and reliability of extrapolating safety and efficacy data from adult populations to pediatric populations.
It is possible that even a high dose of the investigative treatment cannot provide a clinically meaningful treatment effect by the end of Stage 1.Under this scenario, it is not ethical to continue the trial for those who received high dose in Stage 1.Hence, in practice, we recommend a stopping rule such that if less than 30% of patients respond to high dose in Stage 1, Stage 2 will not be conducted, and all treatment effects will be calculated based on Stage 1 data only.There may be additional trade-offs between efficacy and toxicity which are beyond the scope of this paper and the subject of some of our future work.
The SPITFIRE trial and many other DMD trials incorporate participant demographic and baseline characteristic covariates into their analysis.In the future, we hope to extend the robust MAC-snSMART method to include patient-level covariates.The use of patient-level covariates is discussed in Kotalik et al. (2021) as a way to assess covariate-adjusted exchangeability.The study uses a linear model and the existing multi-source exchangeability models framework to enable borrowing even when marginal treatment effects are different, but covariate-adjusted exchangeability is maintained.Integrating this feature into the snSMART approach would broaden borrowing opportunities.In addition, data in DMD trials are usually collected in a longitudinal manner with three or more visits.In our study, like in SPITFIRE, we employed the commonly used "change from baseline" as the primary endpoint, which is in line with regulatory standards.The placebo group in our study allows us to assess the possible "regression to the mean" effect.Although commonly used, it is worth noting that "change from baseline" may not always be the most appropriate measure.Our approach remains valid if using the absolute outcome at the end of each stage, and the performance of our method is not affected.If absolute outcomes are used, we can examine the CIs in the difference of the differences among the high-dose, low-dose, and placebo groups.Alternatively, it is important and future work to investigate ways to incorporate longitudinal data into our analytic methods.

A C K N O W L E D G M E N T S
Kelley Kidwell and Satrajit Roychoudhury equally contributed to this work.Part of this work is supported by FDA's Advancing Regulatory Science Board Agency Announcement Contract #75F40120C00195.The authors thank the CINRG DNHS investigators for providing access to their natural history study data (see Web Table 1 for details).

D ATA AVA I L A B I L I T Y S TAT E M E N T
The Duchenne Natural History Study (DNHS) data that support the findings in this paper are available on request.Please email info@trinds.comfor more details.The data are not publicly available due to privacy or ethical restrictions.

R E F E R E N C E S
Study design of the SPITFIRE trial (NCT03039686).(R) denotes randomization.(b) Study design of the proposed snSMART design that formally incorporates external control data.Participants are randomized (R) with 1:2:2 or 1:3:3 chances of receiving placebo, low dose, or high dose, respectively, in Stage 1.At the end of Stage 1, participants are assigned or re-randomized to their Stage 2 treatment based on their Stage 1 treatment and response status.Outcomes are collected at the end of Stages 1 and 2.
where  = , , ℎ;  ′ = , ℎ.   can be estimated based on observed data in the snSMART.  ′ denotes the correlation between Stage 1 treatment  outcomes and Stage 2 treatment  ′ outcomes.Note  is a selection bias correction term.This selection bias is due to design since those who receive low dose in Stage 1 are not re-randomized and those who do not respond to high dose in Stage 1 are excluded in Stage 2. Given that we know which treatment sequences the participants follow, to account for the difference from the Stage 1 mean, the bias correction term is defined as:  = ( = ,  ′ = )  − ( = ,  ′ = ℎ) ℎ − ( = ℎ) ℎ .Explicitly,   denotes the expected difference between the Stage 1 mean treatment effect of group (1, 2) and the overall Stage 1 low-dose mean treatment effect,  ℎ denotes the expected difference between the Stage 1 mean treatment effect of group (1, 2ℎ) and the overall Stage 1 low-dose mean treatment effect, and  ℎ denotes the expected difference between the Stage 1 mean treatment effect of high-dose responders and the overall Stage 1 highdose mean treatment effect.

TA B L E 1
Simulated type I error.

TA B L E 2
Example data analysis result comparison.