Modern analysis of incomplete longitudinal outcomes involves formulating assumptions about the missingness mechanisms and then using a statistical method that produces valid inferences under this assumption. In this manuscript, we define missingness strategies for analyzing randomized clinical trials (RCTs) based on plausible clinical scenarios. Penalties for dropout are also introduced in an attempt to balance benefits against risks. Some missingness mechanisms are assumed to be non-future dependent, which is a subclass of missing not at random. Non-future dependent stipulates that missingness depends on the past and the present information but not on the future. Missingness strategies are implemented in the pattern-mixture modeling framework using multiple imputation (MI), and it is shown how to estimate the marginal treatment effect. Next, we outline how MI can be used to investigate the impact of dropout strategies in subgroups of interest. Finally, we provide the reader with some points to consider when implementing pattern-mixture modeling-MI analyses in confirmatory RCTs. The data set that motivated our investigation comes from a placebo-controlled RCT design to assess the effect on pain of a new compound. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Biostatisticians recognize the importance of precise definitions of technical terms in randomized controlled clinical trial (RCCT) protocols, statistical analysis plans, and so on, in part because definitions are a foundation for subsequent actions. Imprecise definitions can be a source of controversies about appropriate statistical methods, interpretation of results, and extrapolations to larger populations. This paper presents precise definitions of some familiar terms and definitions of some new terms, some perhaps controversial. The glossary contains definitions that can be copied into a protocol, statistical analysis plan, or similar document and customized. The definitions were motivated and illustrated in the context of a longitudinal RCCT in which some randomized enrollees are non-adherent, receive a corrupted treatment, or withdraw prematurely. The definitions can be adapted for use in a much wider set of RCCTs. New terms can be used in place of controversial terms, for example, *subject*. We define terms specifying a person's progress through RCCT phases and that precisely define the RCCT's phases and milestones. We define terms that distinguish between subsets of an RCCT's enrollees and a much larger patient population. ‘The intention-to-treat (ITT) principle’ has multiple interpretations that can be distilled to the definitions of the ‘ITT analysis set of randomized enrollees’. Most differences among interpretations of ‘the’ ITT principle stem from an RCCT's primary objective (mainly efficacy versus effectiveness). Four different ‘authoritative’ definitions of ITT analysis set of randomized enrollees illustrate the variety of interpretations. We propose a separate specification of the analysis set of data that will be used in a specific analysis. Copyright © 2016 John Wiley & Sons, Ltd.

There have been many approximations developed for sample sizing of a logistic regression model with a single normally-distributed stimulus. Despite this, it has been recognised that there is no consensus as to the best method. In pharmaceutical drug development, simulation provides a powerful tool to characterise the operating characteristics of complex adaptive designs and is an ideal method for determining the sample size for such a problem. In this paper, we address some issues associated with applying simulation to determine the sample size for a given power in the context of logistic regression. These include efficient methods for evaluating the convolution of a logistic function and a normal density and an efficient heuristic approach to searching for the appropriate sample size. We illustrate our approach with three case studies. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In longitudinal studies of biomarkers, an outcome of interest is the time at which a biomarker reaches a particular threshold. The CD4 count is a widely used marker of human immunodeficiency virus progression. Because of the inherent variability of this marker, a single CD4 count below a relevant threshold should be interpreted with caution. Several studies have applied persistence criteria, designating the outcome as the time to the occurrence of two consecutive measurements less than the threshold. In this paper, we propose a method to estimate the time to attainment of two consecutive CD4 counts less than a meaningful threshold, which takes into account the patient-specific trajectory and measurement error. An expression for the expected time to threshold is presented, which is a function of the fixed effects, random effects and residual variance. We present an application to human immunodeficiency virus-positive individuals from a seroprevalent cohort in Durban, South Africa. Two thresholds are examined, and 95% bootstrap confidence intervals are presented for the estimated time to threshold. Sensitivity analysis revealed that results are robust to truncation of the series and variation in the number of visits considered for most patients. Caution should be exercised when interpreting the estimated times for patients who exhibit very slow rates of decline and patients who have less than three measurements. We also discuss the relevance of the methodology to the study of other diseases and present such applications. We demonstrate that the method proposed is computationally efficient and offers more flexibility than existing frameworks. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The effect of correlation among covariates on covariate selection was examined with linear and nonlinear mixed effect models. Demographic covariates were extracted from the National Health and Nutrition Examination Survey III database. Concentration-time profiles were Monte Carlo simulated where only one covariate affected apparent oral clearance (CL/F). A series of univariate covariate population pharmacokinetic models was fit to the data and compared with the reduced model without covariate. The “best” covariate was identified using either the likelihood ratio test statistic or AIC. Weight and body surface area (calculated using Gehan and George equation, 1970) were highly correlated (*r* = 0.98). Body surface area was often selected as a better covariate than weight, sometimes as high as 1 in 5 times, when weight was the covariate used in the data generating mechanism. In a second simulation, parent drug concentration and three metabolites were simulated from a thorough QT study and used as covariates in a series of univariate linear mixed effects models of ddQTc interval prolongation. The covariate with the largest significant LRT statistic was deemed the “best” predictor. When the metabolite was formation-rate limited and only parent concentrations affected ddQTc intervals the metabolite was chosen as a better predictor as often as 1 in 5 times depending on the slope of the relationship between parent concentrations and ddQTc intervals. A correlated covariate can be chosen as being a better predictor than another covariate in a linear or nonlinear population analysis by sheer correlation These results explain why for the same drug different covariates may be identified in different analyses. Copyright © 2016 John Wiley & Sons, Ltd.

We study the properties of treatment effect estimate in terms of odds ratio at the study end point from logistic regression model adjusting for the baseline value when the underlying continuous repeated measurements follow a multivariate normal distribution. Compared with the analysis that does not adjust for the baseline value, the adjusted analysis produces a larger treatment effect as well as a larger standard error. However, the increase in standard error is more than offset by the increase in treatment effect so that the adjusted analysis is more powerful than the unadjusted analysis for detecting the treatment effect. On the other hand, the true adjusted odds ratio implied by the normal distribution of the underlying continuous variable is a function of the baseline value and hence is unlikely to be able to be adequately represented by a single value of adjusted odds ratio from the logistic regression model. In contrast, the risk difference function derived from the logistic regression model provides a reasonable approximation to the true risk difference function implied by the normal distribution of the underlying continuous variable over the range of the baseline distribution. We show that different metrics of treatment effect have similar statistical power when evaluated at the baseline mean. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Model-based dose-finding methods for a combination therapy involving two agents in phase I oncology trials typically include four design aspects namely, size of the patient cohort, three-parameter dose-toxicity model, choice of start-up rule, and whether or not to include a restriction on dose-level skipping. The effect of each design aspect on the operating characteristics of the dose-finding method has not been adequately studied. However, some studies compared the performance of rival dose-finding methods using design aspects outlined by the original studies. In this study, we featured the well-known four design aspects and evaluated the impact of each independent effect on the operating characteristics of the dose-finding method including these aspects. We performed simulation studies to examine the effect of these design aspects on the determination of the true maximum tolerated dose combinations as well as exposure to unacceptable toxic dose combinations. The results demonstrated that the selection rates of maximum tolerated dose combinations and UTDCs vary depending on the patient cohort size and restrictions on dose-level skipping However, the three-parameter dose-toxicity models and start-up rules did not affect these parameters. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Dropouts from randomized trials, often for lack of efficacy or toxicity, have usually been handled as 'missing data'. We suggest that they are instead complete observations, just not numeric ones. We propose an exact test of the hypothesis of no drug effect, taking all randomized patients into account, based on a readily interpretable statistic. The method also copes with a drug that is toxic in some patients but beneficial to others, a difficult problem for standard methods. A robust conclusion of efficacy can be drawn with no assumptions other than randomization. Published 2016. This article is a U.S. Government work and is in the public domain in the USA

]]>For two-arm randomized phase II clinical trials, previous literature proposed an optimal design that minimizes the total sample sizes subject to multiple constraints on the standard errors of the estimated event rates and their difference. The original design is limited to trials with dichotomous endpoints. This paper extends the original approach to be applicable to phase II clinical trials with endpoints from the exponential dispersion family distributions. The proposed optimal design minimizes the total sample sizes needed to provide estimates of population means of both arms and their difference with pre-specified precision. Its applications on data from specific distribution families are discussed under multiple design considerations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This article describes how a frequentist model averaging approach can be used for concentration–QT analyses in the context of thorough QTc studies. Based on simulations, we have concluded that starting from three candidate model families (linear, exponential, and Emax) the model averaging approach leads to treatment effect estimates that are quite robust with respect to the control of the type I error in nearly all simulated scenarios; in particular, with the model averaging approach, the type I error appears less sensitive to model misspecification than the widely used linear model. We noticed also few differences in terms of performance between the model averaging approach and the more classical model selection approach, but we believe that, despite both can be recommended in practice, the model averaging approach can be more appealing because of some deficiencies of model selection approach pointed out in the literature. We think that a model averaging or model selection approach should be systematically considered for conducting concentration–QT analyses. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Recent research has fostered new guidance on preventing and treating missing data. Consensus exists that clear objectives should be defined along with the causal estimands; trial design and conduct should maximize adherence to the protocol specified interventions; and a sensible primary analysis should be used along with plausible sensitivity analyses. Two general categories of estimands are effects of the drug as actually taken (*de facto*, effectiveness) and effects of the drug if taken as directed (*de jure*, efficacy). Motivated by examples, we argue that no single estimand is likely to meet the needs of all stakeholders and that each estimand has strengths and limitations. Therefore, stakeholder input should be part of an iterative study development process that includes choosing estimands that are consistent with trial objectives. To this end, an example is used to illustrate the benefit from assessing multiple estimands in the same study. A second example illustrates that maximizing adherence reduces sensitivity to missing data assumptions for *de jure* estimands but may reduce generalizability of results for *de facto* estimands if efforts to maximize adherence in the trial are not feasible in clinical practice. A third example illustrates that whether or not data after initiation of rescue medication should be included in the primary analysis depends on the estimand to be tested and the clinical setting. We further discuss the sample size and total exposure to placebo implications of including post-rescue data in the primary analysis. Copyright © 2016 John Wiley & Sons, Ltd.

ICH E9 Statistical Principles for Clinical Trials was issued in 1998. In October 2014, an addendum to ICH E9 was proposed relating to estimands and sensitivity analyses. In preparation for the release of the addendum, Statisticians in the Pharmaceutical Industry held a 1-day expert group meeting in February 2015. Topics debated included definition, development, implementation, education and communication challenges associated with estimands and sensitivity analyses. The topic of estimands is an important and relatively new one in clinical development. A clear message from the meeting was that estimands bridge the gap between study objectives and statistical methods. When defining estimands, an iterative process linking trial objectives, estimands, trial design, statistical and sensitivity analysis needs to be established. Each objective should have at least one distinct estimand, supported by sensitivity analyses. Because clinical trials are multi-faceted and expensive, it is unrealistic to restrict a study to a single objective and associated estimand. The actual set of estimands and sensitivity analyses for a study will depend on the study objectives, the disease setting and the needs of the various stakeholders. Copyright © 2016 John Wiley & Sons, Ltd.

]]>No abstract is available for this article.

Different arguments have been put forward why drug developers should commit themselves early for what they are planning to do for children. By EU regulation, paediatric investigation plans should be agreed on in early phases of drug development in adults. Here, extrapolation from adults to children is widely applied to reduce the burden and avoids unnecessary clinical trials in children, but early regulatory decisions on how far extrapolation can be used may be highly uncertain. Under special circumstances, the regulatory process should allow for *adaptive paediatric investigation plans* explicitly foreseeing a re-evaluation of the early decision based on the information accumulated later from adults or elsewhere. A small step towards adaptivity and learning from experience may improve the quality of regulatory decisions in particular with regard to how much information can be borrowed from adults. © 2016 The Authors. Pharmaceutical Statistics Published by John Wiley & Sons Ltd.

Decision theory is applied to the general problem of comparing two treatments in an experiment with subjects assigned to the treatments at random. The inferential agenda covers collection of evidence about superiority, non-inferiority and average bioequivalence of the treatments. The proposed approach requires defining the terms ‘small’ and ‘large’ to qualify the magnitude of the treatment effect and specifying the losses (or loss functions) that quantify the consequences of the incorrect conclusions. We argue that any analysis that ignores these two inputs is deficient, and so is any *ad hoc* way of taking them into account. Sample size calculation for studies intended to be analysed by this approach is also discussed. Copyright © 2016 John Wiley & Sons, Ltd.

The goals of phase II clinical trials are to gain important information about the performance of novel treatments and decide whether to conduct a larger phase III trial. This can be complicated in cases when the phase II trial objective is to identify a novel treatment having *several* factors. Such multifactor treatment scenarios can be explored using fixed sample size trials. However, the alternative design could be response adaptive randomization with interim analyses and additionally, longitudinal modeling whereby more data could be used in the estimation process. This combined approach allows a quicker and more responsive adaptation to early estimates of later endpoints. Such alternative clinical trial designs are potentially more powerful, faster, and smaller than fixed randomized designs. Such designs are particularly challenging, however, because phase II trials tend to be smaller than subsequent confirmatory phase III trials. The phase II trial may need to explore a large number of treatment variations to ensure that the efficacy of optimal clinical conditions is not overlooked. Adaptive trial designs need to be carefully evaluated to understand how they will perform and to take full advantage of their potential benefits. This manuscript discusses a Bayesian response adaptive randomization design with a longitudinal model that uses a multifactor approach for predicting phase III study success via the phase II data. The approach is based on an actual clinical trial design for the hyperbaric oxygen brain injury treatment trial. Specific details of the thought process and the models informing the trial design are provided. Copyright © 2016 John Wiley & Sons, Ltd.

In this paper, we propose a multistage group sequential procedure to design survival trials using historical controls. The formula for the number of events required for historical control trial designs is derived. Furthermore, a transformed information time is proposed for trial monitoring. An example is given to illustrate the application of the proposed methods to survival trial designs using historical controls. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In recent years, immunological science has evolved, and cancer vaccines are now approved and available for treating existing cancers. Because cancer vaccines require time to elicit an immune response, a delayed treatment effect is expected and is actually observed in drug approval studies. Accordingly, we propose the evaluation of survival endpoints by weighted log-rank tests with the Fleming–Harrington class of weights. We consider group sequential monitoring, which allows early efficacy stopping, and determine a semiparametric information fraction for the Fleming–Harrington family of weights, which is necessary for the error spending function. Moreover, we give a flexible survival model in cancer vaccine studies that considers not only the delayed treatment effect but also the long-term survivors. In a Monte Carlo simulation study, we illustrate that when the primary analysis is a weighted log-rank test emphasizing the late differences, the proposed information fraction can be a useful alternative to the surrogate information fraction, which is proportional to the number of events. Copyright © 2016 John Wiley & Sons, Ltd.

]]>With the development of molecular targeted drugs, predictive biomarkers have played an increasingly important role in identifying patients who are likely to receive clinically meaningful benefits from experimental drugs (i.e., sensitive subpopulation) even in early clinical trials. For continuous biomarkers, such as mRNA levels, it is challenging to determine cutoff value for the sensitive subpopulation, and widely accepted study designs and statistical approaches are not currently available. In this paper, we propose the Bayesian adaptive patient enrollment restriction (BAPER) approach to identify the sensitive subpopulation while restricting enrollment of patients from the insensitive subpopulation based on the results of interim analyses, in a randomized phase 2 trial with time-to-endpoint outcome and a single biomarker. Applying a four-parameter change-point model to the relationship between the biomarker and hazard ratio, we calculate the posterior distribution of the cutoff value that exhibits the target hazard ratio and use it for the restriction of the enrollment and the identification of the sensitive subpopulation. We also consider interim monitoring rules for termination because of futility or efficacy. Extensive simulations demonstrated that our proposed approach reduced the number of enrolled patients from the insensitive subpopulation, relative to an approach with no enrollment restriction, without reducing the likelihood of a correct decision for next trial (no-go, go with entire population, or go with sensitive subpopulation) or correct identification of the sensitive subpopulation. Additionally, the four-parameter change-point model had a better performance over a wide range of simulation scenarios than a commonly used dichotomization approach. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A composite endpoint consists of multiple endpoints combined in one outcome. It is frequently used as the primary endpoint in randomized clinical trials. There are two main disadvantages associated with the use of composite endpoints: a) in conventional analyses, all components are treated equally important; and b) in time-to-event analyses, the first event considered may not be the most important component. Recently Pocock et al. (2012) introduced the win ratio method to address these disadvantages. This method has two alternative approaches: the matched pair approach and the unmatched pair approach. In the unmatched pair approach, the confidence interval is constructed based on bootstrap resampling, and the hypothesis testing is based on the non-parametric method by Finkelstein and Schoenfeld (1999). Luo et al. (2015) developed a close-form variance estimator of the win ratio for the unmatched pair approach, based on a composite endpoint with two components and a specific algorithm determining winners, losers and ties. We extend the unmatched pair approach to provide a generalized analytical solution to both hypothesis testing and confidence interval construction for the win ratio, based on its logarithmic asymptotic distribution. This asymptotic distribution is derived via U-statistics following Wei and Johnson (1985). We perform simulations assessing the confidence intervals constructed based on our approach versus those per the bootstrap resampling and per Luo et al. We have also applied our approach to a liver transplant Phase III study. This application and the simulation studies show that the win ratio can be a better statistical measure than the odds ratio when the importance order among components matters; and the method per our approach and that by Luo et al., although derived based on large sample theory, are not limited to a large sample, but are also good for relatively small sample sizes. Different from Pocock et al. and Luo et al., our approach is a generalized analytical method, which is valid for any algorithm determining winners, losers and ties. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Bayesian predictive power, the expectation of the power function with respect to a prior distribution for the true underlying effect size, is routinely used in drug development to quantify the probability of success of a clinical trial. Choosing the prior is crucial for the properties and interpretability of Bayesian predictive power. We review recommendations on the choice of prior for Bayesian predictive power and explore its features as a function of the prior. The density of power values induced by a given prior is derived analytically and its shape characterized. We find that for a typical clinical trial scenario, this density has a *u*-shape very similar, but not equal, to a *β*-distribution. Alternative priors are discussed, and practical recommendations to assess the sensitivity of Bayesian predictive power to its input parameters are provided. Copyright © 2016 John Wiley & Sons, Ltd.