There are strong arguments, ethical, logistical and financial, for supplementing the evidence from a new clinical trial using data from previous trials with similar control treatments. There is a consensus that historical information should be down-weighted or discounted relative to information from the new trial, but the determination of the appropriate degree of discounting is a major difficulty. The degree of discounting can be represented by a bias parameter with specified variance, but a comparison between the historical and new data gives only a poor estimate of this variance. Hence, if no strong assumption is made concerning its value (i.e. if ‘dynamic borrowing’ is practiced), there may be little or no gain from using the historical data, in either frequentist terms (type I error rate and power) or Bayesian terms (posterior distribution of the treatment effect). It is therefore best to compare the consequences of a range of assumptions. This paper presents a clear, simple graphical tool for doing so on the basis of the mean square error, and illustrates its use with historical data from clinical trials in amyotrophic lateral sclerosis. This approach makes it clear that different assumptions can lead to very different conclusions. External information can sometimes provide strong additional guidance, but different stakeholders may still make very different judgements concerning the appropriate degree of discounting. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Breast cancers are clinically heterogeneous based on tumor markers. The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program provides baseline data on these tumor markers for reporting cancer burden and trends over time in the US general population. These tumor markers, however, are often prone to missing observations. In particular, estrogen receptor (ER) status, a key biomarker in the study of breast cancer, has been collected since 1992 but historically was not well-reported, with missingness rates as high as 25% in early years. Previous methods used to correct estimates of breast cancer incidence or ER-related odds or prevalence ratios for unknown ER status have relied on a missing-at-random (MAR) assumption. In this paper, we explore the sensitivity of these key estimates to departures from MAR. We develop a predictive mean matching procedure that can be used to multiply impute missing ER status under either an MAR or a missing not at random assumption and apply it to the SEER breast cancer data (1992–2012). The imputation procedure uses the predictive power of the rich set of covariates available in the SEER registry while also allowing us to investigate the impact of departures from MAR. We find some differences in inference under the two assumptions, although the magnitude of differences tends to be small. For the types of analyses typically of primary interest, we recommend imputing SEER breast cancer biomarkers under an MAR assumption, given the small apparent differences under MAR and missing not at random assumptions. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Longitudinal binomial data are frequently generated from multiple questionnaires and assessments in various scientific settings for which the binomial data are often overdispersed. The standard generalized linear mixed effects model may result in severe underestimation of standard errors of estimated regression parameters in such cases and hence potentially bias the statistical inference. In this paper, we propose a longitudinal beta-binomial model for overdispersed binomial data and estimate the regression parameters under a probit model using the generalized estimating equation method. A hybrid algorithm of the Fisher scoring and the method of moments is implemented for computing the method. Extensive simulation studies are conducted to justify the validity of the proposed method. Finally, the proposed method is applied to analyze functional impairment in subjects who are at risk of Huntington disease from a multisite observational study of prodromal Huntington disease. Copyright © 2016 John Wiley & Sons, Ltd.

]]>There is an increasing demand for personalization of disease screening based on assessment of patient risk and other characteristics. For example, in breast cancer screening, advanced imaging technologies have made it possible to move away from ‘one-size-fits-all’ screening guidelines to targeted risk-based screening for those who are in need. Because diagnostic performance of various imaging modalities may vary across subjects, applying the most accurate modality to the patients who would benefit the most requires personalized strategy. To address these needs, we propose novel machine learning methods to estimate personalized diagnostic rules for medical screening or diagnosis by maximizing a weighted combination of sensitivity and specificity across subgroups of subjects. We first develop methods that can be applied when competing modalities or screening strategies that are observed on the same subject (paired design). Next, we present methods for studies where not all subjects receive both modalities (unpaired design). We study theoretical properties including consistency and risk bound of the personalized diagnostic rules and conduct simulation studies to examine performance of the proposed methods. Lastly, we analyze data collected from a brain imaging study of Parkinson's disease using positron emission tomography and diffusion tensor imaging with paired and unpaired designs. Our results show that in some cases, a personalized modality assignment is estimated to improve empirical area under the receiver operating curve compared with a ‘one-size-fits-all’ assignment strategy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Network meta-analysis enables comprehensive synthesis of evidence concerning multiple treatments and their simultaneous comparisons based on both direct and indirect evidence. A fundamental pre-requisite of network meta-analysis is the consistency of evidence that is obtained from different sources, particularly whether direct and indirect evidence are in accordance with each other or not, and how they may influence the overall estimates. We have developed an efficient method to quantify indirect evidence, as well as a testing procedure to evaluate their inconsistency using Lindsay's composite likelihood method. We also show that this estimator has complete information for the indirect evidence. Using this method, we can assess the degree of consistency between direct and indirect evidence and their contribution rates to the overall estimate. Sensitivity analyses can be also conducted with this method to assess the influences of potentially inconsistent treatment contrasts on the overall results. These methods can provide useful information for overall comparative results that might be biased from specific inconsistent treatment contrasts. We also provide some fundamental requirements for valid inference on these methods concerning consistency restrictions on multi-arm trials. In addition, the efficiency of the developed method is demonstrated based on simulation studies. Applications to a network meta-analysis of 12 new-generation antidepressants are presented. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Stratified medicine utilizes individual-level covariates that are associated with a differential treatment effect, also known as treatment-covariate interactions. When multiple trials are available, meta-analysis is used to help detect true treatment-covariate interactions by combining their data. Meta-regression of trial-level information is prone to low power and ecological bias, and therefore, individual participant data (IPD) meta-analyses are preferable to examine interactions utilizing individual-level information. However, one-stage IPD models are often wrongly specified, such that interactions are based on amalgamating within- and across-trial information. We compare, through simulations and an applied example, fixed-effect and random-effects models for a one-stage IPD meta-analysis of time-to-event data where the goal is to estimate a treatment-covariate interaction. We show that it is crucial to centre patient-level covariates by their mean value in each trial, in order to separate out within-trial and across-trial information. Otherwise, bias and coverage of interaction estimates may be adversely affected, leading to potentially erroneous conclusions driven by ecological bias. We revisit an IPD meta-analysis of five epilepsy trials and examine age as a treatment effect modifier. The interaction is −0.011 (95% CI: −0.019 to −0.003; *p* = 0.004), and thus highly significant, when amalgamating within-trial and across-trial information. However, when separating within-trial from across-trial information, the interaction is −0.007 (95% CI: −0.019 to 0.005; *p* = 0.22), and thus its magnitude and statistical significance are greatly reduced. We recommend that meta-analysts should only use within-trial information to examine individual predictors of treatment effect and that one-stage IPD models should separate within-trial from across-trial information to avoid ecological bias. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

In meta-analyses, where a continuous outcome is measured with different scales or standards, the summary statistic is the mean difference standardised to a common metric with a common variance. Where trial treatment is delivered by a person, nesting of patients within care providers leads to clustering that may interact with, or be limited to, one or more of the arms. Assuming a common standardising variance is less tenable and options for scaling the mean difference become numerous. Metrics suggested for cluster-randomised trials are within, between and total variances and for unequal variances, the control arm or pooled variances. We consider summary measures and individual-patient-data methods for meta-analysing standardised mean differences from trials with two-level nested clustering, relaxing independence and common variance assumptions, allowing sample sizes to differ across arms. A general metric is proposed with comparable interpretation across designs. The relationship between the method of standardisation and choice of model is explored, allowing for bias in the estimator and imprecision in the standardising metric. A meta-analysis of trials of counselling in primary care motivated this work. Assuming equal clustering effects across trials, the proposed random-effects meta-analysis model gave a pooled standardised mean difference of −0.27 (95% CI −0.45 to −0.08) using summary measures and −0.26 (95% CI −0.45 to −0.09) with the individual-patient-data. While treatment-related clustering has rarely been taken into account in trials, it is now recommended that it is considered in trials and meta-analyses. This paper contributes to the uptake of this guidance. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In follow-up studies on chronic disease cohorts, individuals are often observed at irregular visit times that may be related to their previous disease history and other factors. This can produce bias in standard methods of estimation. Working in the context of multistate models, we consider a method of nonparametric estimation for state occupancy probabilities that adjusts for dependent follow-up through the use of inverse-intensity-of-visit weighted estimating functions and smoothing. The methodology is applied to the estimation of viral rebound probabilities in the Canadian Observational Cohort on HIV-positive persons. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Prediction models fitted with logistic regression often show poor performance when applied in populations other than the development population. Model updating may improve predictions. Previously suggested methods vary in their extensiveness of updating the model. We aim to define a strategy in selecting an appropriate update method that considers the balance between the amount of evidence for updating in the new patient sample and the danger of overfitting. We consider recalibration in the large (re-estimation of model intercept); recalibration (re-estimation of intercept and slope) and model revision (re-estimation of all coefficients) as update methods. We propose a closed testing procedure that allows the extensiveness of the updating to increase progressively from a minimum (the original model) to a maximum (a completely revised model). The procedure involves multiple testing with maintaining approximately the chosen type I error rate. We illustrate this approach with three clinical examples: patients with prostate cancer, traumatic brain injury and children presenting with fever. The need for updating the prostate cancer model was completely driven by a different model intercept in the update sample (adjustment: 2.58). Separate testing of model revision against the original model showed statistically significant results, but led to overfitting (calibration slope at internal validation = 0.86). The closed testing procedure selected recalibration in the large as update method, without overfitting. The advantage of the closed testing procedure was confirmed by the other two examples. We conclude that the proposed closed testing procedure may be useful in selecting appropriate update methods for previously developed prediction models. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Clinical trials target patients who are expected to benefit from a new treatment under investigation. However, the magnitude of the treatment benefit, if it exists, often depends on the patient baseline characteristics. It is therefore important to investigate the consistency of the treatment effect across subgroups to ensure a proper interpretation of positive study findings in the overall population. Such assessments can provide guidance on how the treatment should be used. However, great care has to be taken when interpreting consistency results. An observed heterogeneity in treatment effect across subgroups can arise because of chance alone, whereas true heterogeneity may be difficult to detect by standard statistical tests because of their low power. This tutorial considers issues related to subgroup analyses and their impact on the interpretation of findings of completed trials that met their main objectives. In addition, we provide guidance on the design and analysis of clinical trials that account for the expected heterogeneity of treatment effects across subgroups by establishing treatment benefit in a pre-defined targeted subgroup and/or the overall population. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Prevalent sampling is frequently a convenient and economical sampling technique for the collection of time-to-event data and thus is commonly used in studies of the natural history of a disease. However, it is biased by design because it tends to recruit individuals with longer survival times. This paper considers estimation of time-dependent receiver operating characteristic curves when data are collected under prevalent sampling. To correct the sampling bias, we develop both nonparametric and semiparametric estimators using extended risk sets and the inverse probability weighting techniques. The proposed estimators are consistent and converge to Gaussian processes, while substantial bias may arise if standard estimators for right-censored data are used. To illustrate our method, we analyze data from an ovarian cancer study and estimate receiver operating characteristic curves that assess the accuracy of the composite markers in distinguishing subjects who died within 3–5 years from subjects who remained alive. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The design of phase I studies is often challenging, because of limited evidence to inform study protocols. Adaptive designs are now well established in cancer but much less so in other clinical areas. A phase I study to assess the safety, pharmacokinetic profile and antiretroviral efficacy of C34-PEG_{4}-Chol, a novel peptide fusion inhibitor for the treatment of HIV infection, has been set up with Medical Research Council funding. During the study workup, Bayesian adaptive designs based on the continual reassessment method were compared with a more standard rule-based design, with the aim of choosing a design that would maximise the scientific information gained from the study. The process of specifying and evaluating the design options was time consuming and required the active involvement of all members of the trial's protocol development team. However, the effort was worthwhile as the originally proposed rule-based design has been replaced by a more efficient Bayesian adaptive design. While the outcome to be modelled, design details and evaluation criteria are trial specific, the principles behind their selection are general. This case study illustrates the steps required to establish a design in a novel context. Copyright © 2016 John Wiley & Sons, Ltd.

Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., ‘frailty’) Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

]]>Characterizing the technical precision of measurements is a necessary stage in the planning of experiments and in the formal sample size calculation for optimal design. Instruments that measure multiple analytes simultaneously, such as in high-throughput assays arising in biomedical research, pose particular challenges from a statistical perspective. The current most popular method for assessing precision of high-throughput assays is by scatterplotting data from technical replicates. Here, we question the statistical rationale of this approach from both an empirical and theoretical perspective, illustrating our discussion using four example data sets from different genomic platforms. We demonstrate that such scatterplots convey little statistical information of relevance and are potentially highly misleading. We present an alternative framework for assessing the precision of high-throughput assays and planning biomedical experiments. Our methods are based on *repeatability*—a long-established statistical quantity also known as the intraclass correlation coefficient. We provide guidance and software for estimation and visualization of repeatability of high-throughput assays, and for its incorporation into study design. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Popular approaches to spatial cluster detection, such as the spatial scan statistic, are defined in terms of the responses. Here, we consider a varying-coefficient regression and spatial clusters in the regression coefficients. For varying-coefficient regression, such as the geographically weighted regression, different regression coefficients are obtained for different spatial units. It is often of interest to the practitioners to identify clusters of spatial units with distinct patterns in a regression coefficient, but there is no formal statistical methodology for that. Rather, cluster identification is often ad-hoc such as by eyeballing the map of fitted regression coefficients and discerning patterns. In this paper, we develop new methodology for spatial cluster detection in the regression setting based on hypotheses testing. We evaluate our methods in terms of power and coverages for true clusters via simulation studies. For illustration, our methodology is applied to a cancer mortality dataset. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In current practice, the most frequently applied approach to the handling of ties in the Mann–Whitney-Wilcoxon (MWW) test is based on the conditional distribution of the sum of mid-ranks, given the observed pattern of ties. Starting from this conditional version of the testing procedure, a sample size formula was derived and investigated by Zhao et al. (Stat Med 2008). In contrast, the approach we pursue here is a nonconditional one exploiting explicit representations for the variances of and the covariance between the two *U*-statistics estimators involved in the Mann–Whitney form of the test statistic. The accuracy of both ways of approximating the sample sizes required for attaining a prespecified level of power in the MWW test for superiority with arbitrarily tied data is comparatively evaluated by means of simulation. The key qualitative conclusions to be drawn from these numerical comparisons are as follows:

- With the sample sizes calculated by means of the respective formula, both versions of the test maintain the level and the prespecified power with about the same degree of accuracy.
- Despite the equivalence in terms of accuracy, the sample size estimates obtained by means of the new formula are in many cases markedly lower than that calculated for the conditional test.

Perhaps, a still more important advantage of the nonconditional approach based on ** U**-statistics is that it can be also adopted for noninferiority trials. Copyright © 2016 John Wiley & Sons, Ltd.

Even though consistency is an important issue in multi-regional clinical trials and inconsistency is often anticipated, solutions for handling inconsistency are rare. If a region's treatment effects are inconsistent with that of the other regions, pooling all the regions to estimate the overall treatment effect may not be reasonable. Unlike the multiple center clinical trials conducted in the USA and Europe, in multi-regional clinical trials, different regional regulatory agencies may have their own ways to interpret data and approve new drugs. It is therefore practical to consider the case in which the data from the region with the minimal observed treatment effect is excluded from the analysis in order to attain the regulatory approval of the study drug. Under such cases, what is the appropriate statistical approach for the remaining regions? We provide a solution first formulated within the fixed effects framework and then extend it to discrete random effects models. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This study proposes a time-varying effect model for examining group differences in trajectories of zero-inflated count outcomes. The motivating example demonstrates that this zero-inflated Poisson model allows investigators to study group differences in different aspects of substance use (e.g., the probability of abstinence and the quantity of alcohol use) simultaneously. The simulation study shows that the accuracy of estimation of trajectory functions improves as the sample size increases; the accuracy under equal group sizes is only higher when the sample size is small (100). In terms of the performance of the hypothesis testing, the type I error rates are close to their corresponding significance levels under all settings. Furthermore, the power increases as the alternative hypothesis deviates more from the null hypothesis, and the rate of this increasing trend is higher when the sample size is larger. Moreover, the hypothesis test for the group difference in the zero component tends to be less powerful than the test for the group difference in the Poisson component. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The proportional hazard model is one of the most important statistical models used in medical research involving time-to-event data. Simulation studies are routinely used to evaluate the performance and properties of the model and other alternative statistical models for time-to-event outcomes under a variety of situations. Complex simulations that examine multiple situations with different censoring rates demand approaches that can accommodate this variety. In this paper, we propose a general framework for simulating right-censored survival data for proportional hazards models by simultaneously incorporating a baseline hazard function from a known survival distribution, a known censoring time distribution, and a set of baseline covariates. Specifically, we present scenarios in which time to event is generated from exponential or Weibull distributions and censoring time has a uniform or Weibull distribution. The proposed framework incorporates any combination of covariate distributions. We describe the steps involved in nested numerical integration and using a root-finding algorithm to choose the censoring parameter that achieves predefined censoring rates in simulated survival data. We conducted simulation studies to assess the performance of the proposed framework. We demonstrated the application of the new framework in a comprehensively designed simulation study. We investigated the effect of censoring rate on potential bias in estimating the conditional treatment effect using the proportional hazard model in the presence of unmeasured confounding variables. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Meta-analyses of clinical trials often treat the number of patients experiencing a medical event as binomially distributed when individual patient data for fitting standard time-to-event models are unavailable. Assuming identical drop-out time distributions across arms, random censorship, and low proportions of patients with an event, a binomial approach results in a valid test of the null hypothesis of no treatment effect with minimal loss in efficiency compared with time-to-event methods. To deal with differences in follow-up—at the cost of assuming specific distributions for event and drop-out times—we propose a hierarchical multivariate meta-analysis model using the aggregate data likelihood based on the number of cases, fatal cases, and discontinuations in each group, as well as the planned trial duration and groups sizes. Such a model also enables exchangeability assumptions about parameters of survival distributions, for which they are more appropriate than for the expected proportion of patients with an event across trials of substantially different length. Borrowing information from other trials within a meta-analysis or from historical data is particularly useful for rare events data. Prior information or exchangeability assumptions also avoid the parameter identifiability problems that arise when using more flexible event and drop-out time distributions than the exponential one. We discuss the derivation of robust historical priors and illustrate the discussed methods using an example. We also compare the proposed approach against other aggregate data meta-analysis methods in a simulation study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>End-stage renal disease (ESRD) is one of the most serious diabetes complications. Numerous studies have been devoted to revealing the risk factors of the onset time of ESRD. In this article, we propose a proportional mean residual life (MRL) model with latent variables to assess the effects of observed and latent risk factors on the MRL function of ESRD in a cohort of Chinese type 2 diabetic patients. The proposed model generalizes the conventional proportional MRL model to accommodate the latent risk factor that cannot be measured by a single observed variable. We employ a factor analysis model to characterize the latent risk factors via multiple observed variables. We develop a borrow-strength estimation procedure, which incorporates the expectation–maximization algorithm and an extended estimating equation approach. The asymptotic properties of the proposed estimators are established. Simulation shows that the performance of the proposed methodology is satisfactory. The application to the study of type 2 diabetes reveals insights into the prevention of ESRD. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Introduced by Hansen in 2008, the prognostic score (PGS) has been presented as ‘the prognostic analogue of the propensity score’ (PPS). PPS-based methods are intended to estimate marginal effects. Most previous studies evaluated the performance of existing PGS-based methods (adjustment, stratification and matching using the PGS) in situations in which the theoretical conditional and marginal effects are equal (i.e., collapsible situations). To support the use of PGS framework as an alternative to the PPS framework, applied researchers must have reliable information about the type of treatment effect estimated by each method. We propose four new PGS-based methods, each developed to estimate a specific type of treatment effect. We evaluated the ability of existing and new PGS-based methods to estimate the conditional treatment effect (CTE), the (marginal) average treatment effect on the whole population (ATE), and the (marginal) average treatment effect on the treated population (ATT), when the odds ratio (a non-collapsible estimator) is the measure of interest. The performance of PGS-based methods was assessed by Monte Carlo simulations and compared with PPS-based methods and multivariate regression analysis. Existing PGS-based methods did not allow for estimating the ATE and showed unacceptable performance when the proportion of exposed subjects was large. When estimating marginal effects, PPS-based methods were too conservative, whereas the new PGS-based methods performed better with low prevalence of exposure, and had coverages closer to the nominal value. When estimating CTE, the new PGS-based methods performed as well as traditional multivariate regression. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The ‘gold standard’ design for three-arm trials refers to trials with an active control and a placebo control in addition to the experimental treatment group. This trial design is recommended when being ethically justifiable and it allows the simultaneous comparison of experimental treatment, active control, and placebo. Parametric testing methods have been studied plentifully over the past years. However, these methods often tend to be liberal or conservative when distributional assumptions are not met particularly with small sample sizes. In this article, we introduce a studentized permutation test for testing non-inferiority and superiority of the experimental treatment compared with the active control in three-arm trials in the ‘gold standard’ design. The performance of the studentized permutation test for finite sample sizes is assessed in a Monte Carlo simulation study under various parameter constellations. Emphasis is put on whether the studentized permutation test meets the target significance level. For comparison purposes, commonly used Wald-type tests, which do not make any distributional assumptions, are included in the simulation study. The simulation study shows that the presented studentized permutation test for assessing non-inferiority in three-arm trials in the ‘gold standard’ design outperforms its competitors, for instance the test based on a quasi-Poisson model, for count data. The methods discussed in this paper are implemented in the R package ThreeArmedTrials which is available on the comprehensive R archive network (CRAN). Copyright © 2016 John Wiley & Sons, Ltd.

]]>Multiple imputation (MI) is becoming increasingly popular for handling missing data. Standard approaches for MI assume normality for continuous variables (conditionally on the other variables in the imputation model). However, it is unclear how to impute non-normally distributed continuous variables. Using simulation and a case study, we compared various transformations applied prior to imputation, including a novel non-parametric transformation, to imputation on the raw scale and using predictive mean matching (PMM) when imputing non-normal data. We generated data from a range of non-normal distributions, and set 50% to missing completely at random or missing at random. We then imputed missing values on the raw scale, following a zero-skewness log, Box–Cox or non-parametric transformation and using PMM with both type 1 and 2 matching. We compared inferences regarding the marginal mean of the incomplete variable and the association with a fully observed outcome. We also compared results from these approaches in the analysis of depression and anxiety symptoms in parents of very preterm compared with term-born infants. The results provide novel empirical evidence that the decision regarding how to impute a non-normal variable should be based on the nature of the relationship between the variables of interest. If the relationship is linear in the untransformed scale, transformation can introduce bias irrespective of the transformation used. However, if the relationship is non-linear, it may be important to transform the variable to accurately capture this relationship. A useful alternative is to impute the variable using PMM with type 1 matching. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We investigate the estimation of intervention effect and sample size determination for experiments where subjects are supposed to contribute paired binary outcomes with some incomplete observations. We propose a hybrid estimator to appropriately account for the mixed nature of observed data: paired outcomes from those who contribute complete pairs of observations and unpaired outcomes from those who contribute either pre-intervention or post-intervention outcomes. We theoretically prove that if incomplete data are evenly distributed between the pre-intervention and post-intervention periods, the proposed estimator will always be more efficient than the traditional estimator. A numerical research shows that when the distribution of incomplete data is unbalanced, the proposed estimator will be superior when there is moderate-to-strong positive within-subject correlation. We further derive a closed-form sample size formula to help researchers determine how many subjects need to be enrolled in such studies. Simulation results suggest that the calculated sample size maintains the empirical power and type I error under various design configurations. We demonstrate the proposed method using a real application example. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Extrapolating from information available on one patient group to support conclusions about another is common in clinical research. For example, the findings of clinical trials, often conducted in highly selective patient cohorts, are routinely extrapolated to wider populations by policy makers. Meanwhile, the results of adult trials may be used to support conclusions about the effects of a medicine in children. For example, if the effective concentration of a drug can be assumed to be similar in adults and children, an appropriate paediatric dosing rule may be found by ‘bridging’, that is, by matching the adult effective concentration. However, this strategy may result in children receiving an ineffective or hazardous dose if, in fact, effective concentrations differ between adults and children. When there is uncertainty about the equality of effective concentrations, some pharmacokinetic–pharmacodynamic data may be needed in children to verify that differences are small. In this paper, we derive optimal group sequential tests that can be used to verify this assumption efficiently. Asymmetric inner wedge tests are constructed that permit early stopping to accept or reject an assumption of similar effective drug concentrations in adults and children. Asymmetry arises because the consequences of under- and over-dosing may differ. We show how confidence intervals can be obtained on termination of these tests and illustrate the small sample operating characteristics of designs using simulation. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Cancer studies frequently yield multiple event times that correspond to landmarks in disease progression, including non-terminal events (i.e., cancer recurrence) and an informative terminal event (i.e., cancer-related death). Hence, we often observe semi-competing risks data. Work on such data has focused on scenarios in which the cause of the terminal event is known. However, in some circumstances, the information on cause for patients who experience the terminal event is missing; consequently, we are not able to differentiate an informative terminal event from a non-informative terminal event. In this article, we propose a method to handle missing data regarding the cause of an informative terminal event when analyzing the semi-competing risks data. We first consider the nonparametric estimation of the survival function for the terminal event time given missing cause-of-failure data via the expectation–maximization algorithm. We then develop an estimation method for semi-competing risks data with missing cause of the terminal event, under a pre-specified semiparametric copula model. We conduct simulation studies to investigate the performance of the proposed method. We illustrate our methodology using data from a study of early-stage breast cancer. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Arming the immune system against cancer has emerged as a powerful tool in oncology during recent years. Instead of poisoning a tumor or destroying it with radiation, therapeutic cancer vaccine, a type of cancer immunotherapy, unleashes the immune system to combat cancer. This indirect mechanism-of-action of vaccines poses the possibility of a delayed onset of clinical effect, which results in a delayed separation of survival curves between the experimental and control groups in therapeutic cancer vaccine trials with time-to-event endpoints. This violates the proportional hazard assumption. As a result, the conventional study design based on the regular log-rank test ignoring the delayed effect would lead to a loss of power. In this paper, we propose two innovative approaches for sample size and power calculation using the piecewise weighted log-rank test to properly and efficiently incorporate the delayed effect into the study design. Both theoretical derivations and empirical studies demonstrate that the proposed methods, accounting for the delayed effect, can reduce sample size dramatically while achieving the target power relative to a standard practice. Copyright © 2016 John Wiley & Sons, Ltd.

]]>From the statistical learning perspective, this paper shows a new direction for the use of growth mixture modeling (GMM), a method of identifying latent subpopulations that manifest heterogeneous outcome trajectories. In the proposed approach, we utilize the benefits of the conventional use of GMM for the purpose of generating potential candidate models based on empirical model fitting, which can be viewed as unsupervised learning. We then evaluate candidate GMM models on the basis of a direct measure of success; how well the trajectory types are predicted by clinically and demographically relevant baseline features, which can be viewed as supervised learning. We examine the proposed approach focusing on a particular utility of latent trajectory classes, as outcomes that can be used as valid prediction targets in clinical prognostic models. Our approach is illustrated using data from the Longitudinal Assessment of Manic Symptoms study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Although single-index models have been extensively studied, the monotonicity of the link function *f* in the single-index model is rarely studied. In many situations, it is desirable that *f* is monotonic, which results in a monotonic single-index model that can be very useful in economics and biometrics. In this article, we propose a monotonic single-index model in which the link function is constructed using penalized I-splines along with constraints on coefficients to achieve monotonicity of the link function *f*. An algorithm to estimate the single-index parameters and the link function is developed, and the sandwich estimate of the variance of the index parameters is provided. We propose to apply this monotonic single-index model to estimate the dose–response surface and assess drug interactions while considering the variability of the observed data. An extensive simulation study was carried out to evaluate the performance of the proposed monotonic single-index model. A case study is provided to illustrate the application of the proposed model to estimate the dose–response surface and assess drug interactions. Both the simulation and case study show that the proposed monotonic single-index model works very well. Copyright © 2016 John Wiley & Sons, Ltd.

When several treatments are available for evaluation in a clinical trial, different design options are available. We compare multi-arm multi-stage with factorial designs, and in particular, we will consider a 2 × 2 factorial design, where groups of patients will either take treatments A, B, both or neither. We investigate the performance and characteristics of both types of designs under different scenarios and compare them using both theory and simulations. For the factorial designs, we construct appropriate test statistics to test the hypothesis of no treatment effect against the control group with overall control of the type I error. We study the effect of the choice of the allocation ratios on the critical value and sample size requirements for a target power. We also study how the possibility of an interaction between the two treatments A and B affects type I and type II errors when testing for significance of each of the treatment effects. We present both simulation results and a case study on an osteoarthritis clinical trial. We discover that in an optimal factorial design in terms of minimising the associated critical value, the corresponding allocation ratios differ substantially to those of a balanced design. We also find evidence of potentially big losses in power in factorial designs for moderate deviations from the study design assumptions and little gain compared with multi-arm multi-stage designs when the assumptions hold. © 2016 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.

]]>Composite endpoints are widely used as primary endpoints of randomized controlled trials across clinical disciplines. A common critique of the conventional analysis of composite endpoints is that all disease events are weighted equally, whereas their clinical relevance may differ substantially. We address this by introducing a framework for the weighted analysis of composite endpoints and interpretable test statistics, which are applicable to both binary and time-to-event data. To cope with the difficulty of selecting an exact set of weights, we propose a method for constructing simultaneous confidence intervals and tests that asymptotically preserve the family-wise type I error in the strong sense across families of weights satisfying flexible inequality or order constraints based on the theory of -distributions. We show that the method achieves the nominal simultaneous coverage rate with substantial efficiency gains over Scheffé's procedure in a simulation study and apply it to trials in cardiovascular disease and enteric fever. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

]]>Many functional neuroimaging-based studies involve repetitions of a task that may require several phases, or states, of mental activity. An appealing idea is to use relevant brain regions to identify the states. We developed a novel change-point methodology that adapts to the repeated trial structure of such experiments by assuming the number of states stays fixed across similar trials while allowing the timing of change-points to change across trials. Model fitting is based on reversible-jump MCMC. Simulation studies verified its ability to identify change-points successfully. We applied this technique to data collected via functional magnetic resonance imaging (fMRI) while each of 20 subjects solved unfamiliar arithmetic problems. Our methodology supplies both a summary of state dimensionality and uncertainty assessments about number of states and the timing of state transitions. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Analysing the determinants and consequences of hospital-acquired infections involves the evaluation of large cohorts. Infected patients in the cohort are often rare for specific pathogens, because most of the patients admitted to the hospital are discharged or die without such an infection. Death and discharge are competing events to acquiring an infection, because these individuals are no longer at risk of getting a hospital-acquired infection. Therefore, the data is best analysed with an extended survival model – the extended illness-death model. A common problem in cohort studies is the costly collection of covariate values. In order to provide efficient use of data from infected as well as uninfected patients, we propose a tailored case-cohort approach for the extended illness-death model. The basic idea of the case-cohort design is to only use a random sample of the full cohort, referred to as subcohort, and all cases, namely the infected patients. Thus, covariate values are only obtained for a small part of the full cohort. The method is based on existing and established methods and is used to perform regression analysis in adapted Cox proportional hazards models. We propose estimation of all cause-specific cumulative hazards and transition probabilities in an extended illness-death model based on case-cohort sampling. As an example, we apply the methodology to infection with a specific pathogen using a large cohort from Spanish hospital data. The obtained results of the case-cohort design are compared with the results in the full cohort to investigate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We consider recurrent events of the same type that occur during alternating restraint and non-restraint time periods. This research is motivated by a study on juvenile recidivism, where the probationers were followed for re-offenses during alternating placement periods and free-time periods. During the placement periods, the probationers were under a restricted environment with direct supervision of the probation officers. During the free-time periods, the probationers were released to home and not under direct supervision. Although re-offenses can occur during both types of time periods, the intensities of the re-offenses are very different. Thus, these two types of time periods should be modeled differently. The same data structure also arises in many biomedical settings, as exemplified by tumor metastases during chemotherapy and chemo-free periods. In this paper, we propose a joint modeling framework that explicitly accounts for the different types of time periods, as well as the within-subject dependence during the same type and between different types of time periods. The estimation procedure is implemented in SAS and is easily accessible to practical investigators. We evaluate the proposed method through simulation studies under several realistic scenarios and demonstrate the feasibility of the proposed method by applying it to the juvenile recidivism dataset. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Multistate processes provide a convenient framework when interest lies in characterising the transition intensities between a set of defined states. If, however, there is an unobserved event of interest (not known if and when the event occurs), which when it occurs stops future transitions in the multistate process from occurring, then drawing inference from the joint multistate and event process can be problematic. In health studies, a particular example of this could be resolution, where a resolved patient can no longer experience any further symptoms, and this is explored here for illustration. A multistate model that includes the state space of the original multistate process but partitions the state representing absent symptoms into a latent absorbing resolved state and a temporary transient state of absent symptoms is proposed. The expanded state space explicitly distinguishes between resolved and temporary spells of absent symptoms through disjoint states and allows the uncertainty of not knowing if resolution has occurred to be easily captured when constructing the likelihood; observations of absent symptoms can be considered to be temporary or having resulted from resolution. The proposed methodology is illustrated on a psoriatic arthritis data set where the outcome of interest is a set of intermittently observed disability scores. Estimated probabilities of resolving are also obtained from the model. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Meta-analysis using individual participant data (IPD) obtains and synthesises the raw, participant-level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta-analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual-level interactions, such as treatment-effect modifiers. There are two statistical approaches for conducting an IPD meta-analysis: one-stage and two-stage. The one-stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two-stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta-analysis model. There have been numerous comparisons of the one-stage and two-stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ.

In this tutorial paper, we outline the key statistical methods for one-stage and two-stage IPD meta-analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one-stage or two-stage itself. We illustrate the concepts with recently published IPD meta-analyses, summarise key statistical software and provide recommendations for future IPD meta-analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Using both simulated and real datasets, we compared two approaches for estimating absolute risk from nested case-control (NCC) data and demonstrated the feasibility of using the NCC design for estimating absolute risk. In contrast to previously published results, we successfully demonstrated not only that data from a matched NCC study can be used to unbiasedly estimate absolute risk but also that matched studies give better statistical efficiency and classify subjects into more appropriate risk categories. Our result has implications for studies that aim to develop or validate risk prediction models. In addition to the traditional full cohort study and case-cohort study, researchers designing these studies now have the option of performing a NCC study with huge potential savings in cost and resources. Detailed explanations on how to obtain the absolute risk estimates under the proposed approach are given. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Many prediction models have been developed for the risk assessment and the prevention of cardiovascular disease in primary care. Recent efforts have focused on improving the accuracy of these prediction models by adding novel biomarkers to a common set of baseline risk predictors. Few have considered incorporating repeated measures of the common risk predictors. Through application to the Atherosclerosis Risk in Communities study and simulations, we compare models that use simple summary measures of the repeat information on systolic blood pressure, such as (i) baseline only; (ii) last observation carried forward; and (iii) cumulative mean, against more complex methods that model the repeat information using (iv) ordinary regression calibration; (v) risk-set regression calibration; and (vi) joint longitudinal and survival models. In comparison with the baseline-only model, we observed modest improvements in discrimination and calibration using the cumulative mean of systolic blood pressure, but little further improvement from any of the complex methods. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

When two imperfect diagnostic tests are carried out on the same subject, their results may be correlated even after conditioning on the true disease status. While past work has focused on the consequences of ignoring conditional dependence, the degree to which conditional dependence can be induced has not been systematically studied. We examine this issue in detail by introducing a hypothetical missing covariate that affects the sensitivities of two imperfect dichotomous tests. We consider four forms for this covariate, normal, uniform, dichotomous and trichotomous. In the case of a dichotomous covariate, we derive an expression showing that the conditional covariance is a function of the product of the changes in test sensitivities (or specificities) between the subgroups defined by the covariate. The maximum possible covariance is induced by a dichotomous covariate with a very strong effect on both tests. Through simulations, we evaluate the extent to which fitting a latent class model ignoring each type of covariate but including a general covariance term can adjust for the correlation induced by the covariate. We compare the results to when the conditional dependence is ignored. We find that the bias because of ignoring conditional dependence is generally small even for moderate covariate effects, and when bias is present, a model including a covariance term works well. We illustrate our methods by analyzing data from a childhood tuberculosis study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Discrimination slope, defined as the slope of a linear regression of predicted probabilities of event derived from a prognostic model on the binary event status, has recently gained popularity as a measure of model performance. It is as a building block for the integrated discrimination improvement that equals the difference in discrimination slopes between the two models being compared. Several authors have pointed out that it does not make sense to apply the integrated discrimination improvement and discrimination slope when working with mis-calibrated models, whereas others have raised concerns about the ability of improving discrimination slope without adding new information. In this paper, we show that under certain assumptions the discrimination slope is asymptotically related to two other R-squared measures, one of which is a rescaled version of the Brier score, known to be proper. Furthermore, we illustrate how a simple recalibration makes the slope equal to the rescaled Brier R-squared metric. We also show that the discrimination slope can be interpreted as a measure of reduction in expected regret for the Gini-Brier regret function. Using theoretical and practical examples, we illustrate how all of these metrics are affected by different levels of model mis-calibration. In particular, we demonstrate that simple recalibration ascertaining calibration in-the-large and calibration slope equal to 1 are not sufficient to correct for some forms of mis-calibration. We conclude that R-squared metrics, including the discrimination slope, offer an attractive choice for quantifying model performance as long as one accounts for their sensitivity to model calibration. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information of multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets – or equivalently, similarity of model sparsity structures – across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In sequential multiple assignment randomized trials, longitudinal outcomes may be the most important outcomes of interest because this type of trials is usually conducted in areas of chronic diseases or conditions. We propose to use a weighted generalized estimating equation (GEE) approach to analyzing data from such type of trials for comparing two adaptive treatment strategies based on generalized linear models. Although the randomization probabilities are known, we consider estimated weights in which the randomization probabilities are replaced by their empirical estimates and prove that the resulting weighted GEE estimator is more efficient than the estimators with true weights. The variance of the weighted GEE estimator is estimated by an empirical sandwich estimator. The time variable in the model can be linear, piecewise linear, or more complicated forms. This provides more flexibility that is important because, in the adaptive treatment setting, the treatment changes over time and, hence, a single linear trend over the whole period of study may not be practical. Simulation results show that the weighted GEE estimators of regression coefficients are consistent regardless of the specification of the correlation structure of the longitudinal outcomes. The weighted GEE method is then applied in analyzing data from the Clinical Antipsychotic Trials of Intervention Effectiveness. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The net reclassification improvement (NRI) is an attractively simple summary measure quantifying improvement in performance because of addition of new risk marker(s) to a prediction model. Originally proposed for settings with well-established classification thresholds, it quickly extended into applications with no thresholds in common use. Here we aim to explore properties of the NRI at event rate. We express this NRI as a difference in performance measures for the new versus old model and show that the quantity underlying this difference is related to several global as well as decision analytic measures of model performance. It maximizes the relative utility (standardized net benefit) across all classification thresholds and can be viewed as the Kolmogorov–Smirnov distance between the distributions of risk among events and non-events. It can be expressed as a special case of the continuous NRI, measuring reclassification from the ‘null’ model with no predictors. It is also a criterion based on the value of information and quantifies the reduction in expected regret for a given regret function, casting the NRI at event rate as a measure of incremental reduction in expected regret. More generally, we find it informative to present plots of standardized net benefit/relative utility for the new versus old model across the domain of classification thresholds. Then, these plots can be summarized with their maximum values, and the increment in model performance can be described by the NRI at event rate. We provide theoretical examples and a clinical application on the evaluation of prognostic biomarkers for atrial fibrillation. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The integrated discrimination improvement (IDI) is commonly used to compare two risk prediction models; it summarizes the extent a new model increases risk in events and decreases risk in non-events. The IDI averages risks across events and non-events and is therefore susceptible to Simpson's paradox. In some settings, adding a predictive covariate to a well calibrated model results in an overall negative (positive) IDI. However, if stratified by that same covariate, the strata-specific IDIs are positive (negative). Meanwhile, the calibration (observed to expected ratio and Hosmer–Lemeshow Goodness of Fit Test), area under the receiver operating characteristic curve, and Brier score improve overall and by stratum. We ran extensive simulations to investigate the impact of an imbalanced covariate upon metrics (IDI, area under the receiver operating characteristic curve, Brier score, and *R*^{2}), provide an analytic explanation for the paradox in the IDI, and use an investigative metric, a Weighted IDI, to better understand the paradox. In simulations, all instances of the paradox occurred under stratum-specific mis-calibration, yet there were mis-calibrated settings in which the paradox did not occur. The paradox is illustrated on Cancer Genomics Network data by calculating predictions based on two versions of BRCAPRO, a Mendelian risk prediction model for breast and ovarian cancer. In both simulations and the Cancer Genomics Network data, overall model calibration did not guarantee stratum-level calibration. We conclude that the IDI should only assess model performance among a clinically relevant subset when stratum-level calibration is strictly met and recommend calculating additional metrics to confirm the direction and conclusions of the IDI. Copyright © 2016 John Wiley & Sons, Ltd.

No abstract is available for this article.

]]>A relatively recent development in the design of Phase I dose-finding studies is the inclusion of expansion cohort(s), that is, the inclusion of several more patients at a level considered to be the maximum tolerated dose established at the conclusion of the ‘pure’ Phase I part. Little attention has been given to the additional statistical analysis, including design considerations, that we might wish to consider for this more involved design. For instance, how can we best make use of new information that may confirm or may tend to contradict the estimate of the maximum tolerated dose based on the dose escalation phase. Those patients included during the dose expansion phase may possess different eligibility criteria. During the expansion phase, we will also wish to have an eye on any evidence of efficacy, an aspect that clearly distinguishes such studies from the classical Phase I study. Here, we present a methodology that enables us to continue the monitoring of safety in the dose expansion cohort while simultaneously trying to assess efficacy and, in particular, which disease types may be the most promising to take forward for further study. The most elementary problem is where we only wish to take account of further toxicity information obtained during the dose expansion cohort, and where the initial design was model based or the standard 3+3. More complex set-ups also involve efficacy and the presence of subgroups. Copyright © 2016 John Wiley & Sons, Ltd.

]]>There has been constant development of novel statistical methods in the design of early-phase clinical trials since the introduction of model-based designs, yet the traditional or modified 3+3 algorithmic design remains the most widely used approach in dose-finding studies. Research has shown the limitations of this traditional design compared with more innovative approaches yet the use of these model-based designs remains infrequent. This can be attributed to several causes including a poor understanding from clinicians and reviewers into how the designs work, and how best to evaluate the appropriateness of a proposed design. These barriers are likely to be enhanced in the coming years as the recent paradigm of drug development involves a shift to more complex dose-finding problems. This article reviews relevant information that should be included in clinical trial protocols to aid in the acceptance and approval of novel methods. We provide practical guidance for implementing these efficient designs with the aim of augmenting a broader transition from algorithmic to adaptive model-guided designs. In addition we highlight issues to consider in the actual implementation of a trial once approval is obtained. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In oncology, combinations of drugs are often used to improve treatment efficacy and/or reduce harmful side effects. Dual-agent phase I clinical trials assess drug safety and aim to discover a maximum tolerated dose combination via dose-escalation; cohorts of patients are given set doses of both drugs and monitored to see if toxic reactions occur. Dose-escalation decisions for subsequent cohorts are based on the number and severity of observed toxic reactions, and an escalation rule. In a combination trial, drugs may be administered concurrently or non-concurrently over a treatment cycle. For two drugs given non-concurrently with overlapping toxicities, toxicities occurring after administration of the first drug yet before administration of the second may be attributed directly to the first drug, whereas toxicities occurring after both drugs have been given some present ambiguity; toxicities may be attributable to the first drug only, the second drug only or the synergistic combination of both. We call this mixture of attributable and non-attributable toxicity semi-attributable toxicity. Most published methods assume drugs are given concurrently, which may not be reflective of trials with non-concurrent drug administration. We incorporate semi-attributable toxicity into Bayesian modelling for dual-agent phase I trials with non-concurrent drug administration and compare the operating characteristics to an approach where this detail is not considered. Simulations based on a trial for non-concurrent administration of intravesical Cabazitaxel and Cisplatin in early-stage bladder cancer patients are presented for several scenarios and show that including semi-attributable toxicity data reduces the number of patients given overly toxic combinations. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

The majority of phase I methods for multi-agent trials have focused on identifying a single maximum tolerated dose combination (MTDC) among those being investigated. Some published methods in the area have been based on the notion that there is no unique MTDC and that the set of dose combinations with acceptable toxicity forms an equivalence contour in two dimensions. Therefore, it may be of interest to find multiple MTDCs for further testing for efficacy in a phase II setting. In this paper, we present a new dose-finding method that extends the continual reassessment method to account for the location of multiple MTDCs. Operating characteristics are demonstrated through simulation studies and are compared with existing methodology. Some brief discussion of implementation and available software is also provided. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a new design for dose finding for cytotoxic agents in two ordered groups of patients. By ordered groups, we mean that prior to the study there is clinical information that would indicate that for a given dose one group would be more susceptible to toxicities than patients in the other group. The designs are evaluated relative to two previously proposed designs for ordered groups over a range of scenarios generated randomly from a family of dose-toxicity curves. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The Bayesian model averaging continual reassessment method (CRM) is a Bayesian dose-finding design. It improves the robustness and overall performance of the continual reassessment method (CRM) by specifying multiple skeletons (or models) and then using Bayesian model averaging to automatically favor the best-fitting model for better decision making. Specifying multiple skeletons, however, can be challenging for practitioners. In this paper, we propose a default way to specify skeletons for the Bayesian model averaging CRM. We show that skeletons that appear rather different may actually lead to equivalent models. Motivated by this, we define a nonequivalence measure to index the difference among skeletons. Using this measure, we extend the model calibration method of Lee and Cheung (2009) to choose the optimal skeletons that maximize the average percentage of correct selection of the maximum tolerated dose and ensure sufficient nonequivalence among the skeletons. Our simulation study shows that the proposed method has desirable operating characteristics. We provide software to implement the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present a cancer phase I clinical trial design of a combination of two drugs with the goal of estimating the maximum tolerated dose curve in the two-dimensional Cartesian plane. A parametric model is used to describe the relationship between the doses of the two agents and the probability of dose limiting toxicity. The model is re-parameterized in terms of the probabilities of toxicities at dose combinations corresponding to the minimum and maximum doses available in the trial and the interaction parameter. Trial design proceeds using cohorts of two patients receiving doses according to univariate escalation with overdose control (EWOC), where at each stage of the trial, we seek a dose of one agent using the current posterior distribution of the MTD of this agent given the current dose of the other agent. The maximum tolerated dose curve is estimated as a function of Bayes estimates of the model parameters. Performance of the trial is studied by evaluating its design operating characteristics in terms of safety of the trial and percent of dose recommendation at dose combination neighborhoods around the true MTD curve and under model misspecifications for the true dose–toxicity relationship. The method is further extended to accommodate discrete dose combinations and compared with previous approaches under several scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

Toxicity probability interval designs have received increasing attention as a dose-finding method in recent years. In this study, we compared the two-stage, likelihood-based continual reassessment method (CRM), modified toxicity probability interval (mTPI), and the Bayesian optimal interval design (BOIN) in order to evaluate each method's performance in dose selection for phase I trials. We use several summary measures to compare the performance of these methods, including percentage of correct selection (PCS) of the true maximum tolerable dose (MTD), allocation of patients to doses at and around the true MTD, and an accuracy index. This index is an efficiency measure that describes the entire distribution of MTD selection and patient allocation by taking into account the distance between the true probability of toxicity at each dose level and the target toxicity rate. The simulation study considered a broad range of toxicity curves and various sample sizes. When considering PCS, we found that CRM outperformed the two competing methods in most scenarios, followed by BOIN, then mTPI. We observed a similar trend when considering the accuracy index for dose allocation, where CRM most often outperformed both mTPI and BOIN. These trends were more pronounced with increasing number of dose levels. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A random effects meta-analysis combines the results of several independent studies to summarise the evidence about a particular measure of interest, such as a treatment effect. The approach allows for unexplained between-study heterogeneity in the true treatment effect by incorporating random study effects about the overall mean. The variance of the mean effect estimate is conventionally calculated by assuming that the between study variance is known; however, it has been demonstrated that this approach may be inappropriate, especially when there are few studies. Alternative methods that aim to account for this uncertainty, such as Hartung–Knapp, Sidik–Jonkman and Kenward–Roger, have been proposed and shown to improve upon the conventional approach in some situations. In this paper, we use a simulation study to examine the performance of several of these methods in terms of the coverage of the 95*%* confidence and prediction intervals derived from a random effects meta-analysis estimated using restricted maximum likelihood. We show that, in terms of the confidence intervals, the Hartung–Knapp correction performs well across a wide-range of scenarios and outperforms other methods when heterogeneity was large and/or study sizes were similar. However, the coverage of the Hartung–Knapp method is slightly too low when the heterogeneity is low (*I*^{2} < 30*%*) and the study sizes are quite varied. In terms of prediction intervals, the conventional approach is only valid when heterogeneity is large (*I*^{2} > 30*%*) and study sizes are similar. In other situations, especially when heterogeneity is small and the study sizes are quite varied, the coverage is far too low and could not be consistently improved by either increasing the number of studies, altering the degrees of freedom or using variance inflation methods. Therefore, researchers should be cautious in deriving 95*%* prediction intervals following a frequentist random-effects meta-analysis until a more reliable solution is identified. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The study considers the problem of estimating incidence of a non remissible infection (or disease) with possibly differential mortality using data from a(several) cross-sectional prevalence survey(s). Fitting segmented polynomial models is proposed to estimate the incidence as a function of age, using the maximum likelihood method. The approach allows automatic search for optimal position of knots, and model selection is performed using the Akaike information criterion. The method is applied to simulated data and to estimate HIV incidence among men in Zimbabwe using data from both the NIMH Project Accept (HPTN 043) and Zimbabwe Demographic Health Surveys (2005–2006). Copyright © 2016 John Wiley & Sons, Ltd.

]]>Statistical analysis of count data typically starts with a Poisson regression. However, in many real-life applications, it is observed that the variation in the counts is larger than the mean, and one needs to deal with the problem of overdispersion in the counts. Several factors may contribute to overdispersion: (1) unobserved heterogeneity due to missing covariates, (2) correlation between observations (such as in longitudinal studies), and (3) the occurrence of many zeros (more than expected from the Poisson distribution). In this paper, we discuss a model that allows one to explicitly take each of these factors into consideration. The aim of this paper is twofold: (1) investigate whether we can identify the cause of overdispersion via model selection, and (2) investigate the impact of a misspecification of the model on the power of a covariate. The paper is motivated by a study of the occurrence of drug-induced arrhythmia in beagle dogs based on electrocardiogram recordings, with the objective to evaluate the effect of potential drugs on the heartbeat irregularities. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Recently goodness-of-fit tests have been proposed for checking the proportional subdistribution hazards assumptions in the Fine and Gray regression model. Zhou, Fine, and Laird proposed weighted Schoenfeld-type residuals tests derived under an assumed model with specific form of time-varying regression coefficients. Li, Sheike, and Zhang proposed an omnibus test based on cumulative sums of Schoenfeld-type residuals. In this article, we extend the class of weighted residuals tests by allowing random weights of Schoenfeld-type residuals at ordered event times. In particular, it is demonstrated that weighted residuals tests using monotone weight functions of time are consistent against monotone proportional subdistribution hazards assumptions. Extensive Monte Carlo studies were conducted to evaluate the finite-sample performance of recent goodness-of-fit tests. Results from simulation studies show that weighted residuals tests using monotone random weight functions commonly used in non-proportional hazards regression settings tend to be more powerful for detecting monotone departures than other goodness-of-fit tests assuming no specific time-varying effect or misspecified time-varying effects. Two examples using real data are provided for illustrations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Pooling information from multiple, independent studies (meta-analysis) adds great value to medical research. Random effects models are widely used for this purpose. However, there are many different ways of estimating model parameters, and the choice of estimation procedure may be influential upon the conclusions of the meta-analysis. In this paper, we describe a recently proposed Bayesian estimation procedure and compare it with a profile likelihood method and with the DerSimonian–Laird and Mandel–Paule estimators including the Knapp–Hartung correction. The Bayesian procedure uses a non-informative prior for the overall mean and the between-study standard deviation that is determined by the Berger and Bernardo reference prior principle. The comparison of these procedures focuses on the frequentist properties of interval estimates for the overall mean. The results of our simulation study reveal that the Bayesian approach is a promising alternative producing more accurate interval estimates than those three conventional procedures for meta-analysis. The Bayesian procedure is also illustrated using three examples of meta-analysis involving real data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>