The Health and Retirement Study was designed to evaluate changes in health and labor force participation during and after the transition from working to retirement. Every 2 years, participants provided information about their self-rated health (SRH), body mass index (BMI), smoking status, and other characteristics. Our goal was to assess the effects of smoking and gender on trajectories of change in BMI and SRH over time. Joint longitudinal analysis of outcome measures is preferable to separate analyses because it allows to account for the correlation between the measures, to test the effects of predictors while controlling type I error, and potentially to improve efficiency. However, because SRH is an ordinal measure while BMI is continuous, formulating a joint model and parameter estimation is challenging. A joint correlated probit model allowed us to seamlessly account for the correlations between the measures over time. Established estimating procedures for such models are based on quasi-likelihood or numerical approximations that may be biased or fail to converge. Therefore, we proposed a novel expectation–maximization algorithm for parameter estimation and a Monte Carlo bootstrap approach for standard errors approximation. Expectation–maximization algorithms have been previously considered for combinations of binary and/or continuous repeated measures; however, modifications were needed to handle combinations of ordinal and continuous responses. A simulation study demonstrated that the algorithm converged and provided approximately unbiased estimates with sufficiently large sample sizes. In the Health and Retirement Study, male gender and smoking were independently associated with steeper deterioration in self-rated health and with lower average BMI. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this paper, an approach to estimating the cumulative mean function for history process with time dependent covariates and right censored time-to-event variable is developed using the combined technique of joint modeling and inverse probability weighting method. The consistency of proposed estimator is derived. Theoretical analysis and simulation studies indicate that the estimator given in this paper is quite recommendable to practical applications because of its simplicity and accuracy. A real data set from a multicenter automatic defibrillator implantation trial is used to illustrate the proposed methodology. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Recent advances in human neuroimaging have shown that it is possible to accurately decode how the brain perceives information based only on non-invasive functional magnetic resonance imaging measurements of brain activity. Two commonly used statistical approaches, namely, univariate analysis and multivariate pattern analysis often lead to distinct patterns of selected voxels. One current debate in brain decoding concerns whether the brain's representation of sound categories is localized or distributed. We hypothesize that the distributed pattern of voxels selected by most multivariate pattern analysis models can be an artifact due to the spatial correlation among voxels. Here, we propose a Bayesian spatially varying coefficient model, where the spatial correlation is modeled through the variance-covariance matrix of the model coefficients. Combined with a proposed region selection strategy, we demonstrate that our approach is effective in identifying the truly localized patterns of the voxels while maintaining robustness to discover truly distributed pattern. In addition, we show that localized or clustered patterns can be artificially identified as distributed if without proper usage of the spatial correlation information in fMRI data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Predicting the occurrence of an adverse event over time is an important issue in clinical medicine. Clinical prediction models and associated points-based risk-scoring systems are popular statistical methods for summarizing the relationship between a multivariable set of patient risk factors and the risk of the occurrence of an adverse event. Points-based risk-scoring systems are popular amongst physicians as they permit a rapid assessment of patient risk without the use of computers or other electronic devices. The use of such points-based risk-scoring systems facilitates evidence-based clinical decision making. There is a growing interest in cause-specific mortality and in non-fatal outcomes. However, when considering these types of outcomes, one must account for competing risks whose occurrence precludes the occurrence of the event of interest. We describe how points-based risk-scoring systems can be developed in the presence of competing events. We illustrate the application of these methods by developing risk-scoring systems for predicting cardiovascular mortality in patients hospitalized with acute myocardial infarction. Code in the R statistical programming language is provided for the implementation of the described methods. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

]]>Continuous predictors are routinely encountered when developing a prognostic model. Investigators, who are often non-statisticians, must decide how to handle continuous predictors in their models. Categorising continuous measurements into two or more categories has been widely discredited, yet is still frequently done because of its simplicity, investigator ignorance of the potential impact and of suitable alternatives, or to facilitate model uptake. We examine three broad approaches for handling continuous predictors on the performance of a prognostic model, including various methods of categorising predictors, modelling a linear relationship between the predictor and outcome and modelling a nonlinear relationship using fractional polynomials or restricted cubic splines. We compare the performance (measured by the *c*-index, calibration and net benefit) of prognostic models built using each approach, evaluating them using separate data from that used to build them. We show that categorising continuous predictors produces models with poor predictive performance and poor clinical usefulness. Categorising continuous predictors is unnecessary, biologically implausible and inefficient and should not be used in prognostic model development. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Several probability-based measures are introduced in order to assess the cost-effectiveness of a treatment. The basic measure consists of the probability that one treatment is less costly and more effective compared with another. Several variants of this measure are suggested as flexible options for cost-effectiveness analysis. The proposed measures are invariant under monotone transformations of the cost and effectiveness measures. Interval estimation of the proposed measures are investigated under a parametric model, assuming bivariate normality, and also non-parametrically. The delta method and a generalized pivotal quantity approach are both investigated under the bivariate normal model. A non-parametric U-statistics-based approach is also investigated for computing confidence intervals. Numerical results show that under bivariate normality, the solution based on generalized pivotal quantities exhibits accurate performance in terms of maintaining the coverage probability of the confidence interval. The non-parametric U-statistics-based solution is accurate for sample sizes that are at least moderately large. The results are illustrated using data from a clinical trial for prostate cancer therapy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Sequentially administered, laboratory-based diagnostic tests or self-reported questionnaires are often used to determine the occurrence of a silent event. In this paper, we consider issues relevant in design of studies aimed at estimating the association of one or more covariates with a non-recurring, time-to-event outcome that is observed using a repeatedly administered, error-prone diagnostic procedure. The problem is motivated by the Women's Health Initiative, in which diabetes incidence among the approximately 160,000 women is obtained from annually collected self-reported data. For settings of imperfect diagnostic tests or self-reports with known sensitivity and specificity, we evaluate the effects of various factors on resulting power and sample size calculations and compare the relative efficiency of different study designs. The methods illustrated in this paper are readily implemented using our freely available R software package *icensmis*, which is available at the Comprehensive R Archive Network website. An important special case is that when diagnostic procedures are perfect, they result in interval-censored, time-to-event outcomes. The proposed methods are applicable for the design of studies in which a time-to-event outcome is interval censored. Copyright © 2016 John Wiley & Sons, Ltd.

A general utility-based testing methodology for design and conduct of randomized comparative clinical trials with categorical outcomes is presented. Numerical utilities of all elementary events are elicited to quantify their desirabilities. These numerical values are used to map the categorical outcome probability vector of each treatment to a mean utility, which is used as a one-dimensional criterion for constructing comparative tests. Bayesian tests are presented, including fixed sample and group sequential procedures, assuming Dirichlet-multinomial models for the priors and likelihoods. Guidelines are provided for establishing priors, eliciting utilities, and specifying hypotheses. Efficient posterior computation is discussed, and algorithms are provided for jointly calibrating test cutoffs and sample size to control overall type I error and achieve specified power. Asymptotic approximations for the power curve are used to initialize the algorithms. The methodology is applied to re-design a completed trial that compared two chemotherapy regimens for chronic lymphocytic leukemia, in which an ordinal efficacy outcome was dichotomized, and toxicity was ignored to construct the trial's design. The Bayesian tests also are illustrated by several types of categorical outcomes arising in common clinical settings. Freely available computer software for implementation is provided. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the ‘cost’ of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

The paradigm of oncology drug development is expanding from developing cytotoxic agents to developing biological or molecularly targeted agents (MTAs). Although it is common for the efficacy and toxicity of cytotoxic agents to increase monotonically with dose escalation, the efficacy of some MTAs may exhibit non-monotonic patterns in their dose–efficacy relationships. Many adaptive dose-finding approaches in the available literature account for the non-monotonic dose–efficacy behavior by including additional model parameters. In this study, we propose a novel adaptive dose-finding approach based on binary efficacy and toxicity outcomes in phase I trials for monotherapy using an MTA. We develop a dose–efficacy model, the parameters of which are allowed to change in the vicinity of the change point of the dose level, in order to consider the non-monotonic pattern of the dose–efficacy relationship. The change point is obtained as the dose that maximizes the log-likelihood of the assumed dose–efficacy and dose-toxicity models. The dose-finding algorithm is based on the weighted Mahalanobis distance, calculated using the posterior probabilities of efficacy and toxicity outcomes. We compare the operating characteristics between the proposed and existing methods and examine the sensitivity of the proposed method by simulation studies under various scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Assessing the magnitude of heterogeneity in a meta-analysis is important for determining the appropriateness of combining results. The most popular measure of heterogeneity, *I*^{2}, was derived under an assumption of homogeneity of the within-study variances, which is almost never true, and the alternative estimator,
, uses the harmonic mean to estimate the average of the within-study variances, which may also lead to bias. This paper thus presents a new measure for quantifying the extent to which the variance of the pooled random-effects estimator is due to between-studies variation,
, that overcomes the limitations of the previous approach. We show that this measure estimates the expected value of the proportion of total variance due to between-studies variation and we present its point and interval estimators. The performance of all three heterogeneity measures is evaluated in an extensive simulation study. A negative bias for
was observed when the number of studies was very small and became negligible as the number of studies increased, while
and *I*^{2} showed a tendency to overestimate the impact of heterogeneity. The coverage of confidence intervals based upon
was good across different simulation scenarios but was substantially lower for
and *I*^{2}, especially for high values of heterogeneity and when a large number of studies were included in the meta-analysis. The proposed measure is implemented in a user-friendly function available for routine use in r and sas.
will be useful in quantifying the magnitude of heterogeneity in meta-analysis and should supplement the *p*-value for the test of heterogeneity obtained from the *Q* test. Copyright © 2016 John Wiley & Sons, Ltd.

When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Marginal structural Cox models are used for quantifying marginal treatment effects on outcome event hazard function. Such models are estimated using inverse probability of treatment and censoring (IPTC) weighting, which properly accounts for the impact of time-dependent confounders, avoiding conditioning on factors on the causal pathway. To estimate the IPTC weights, the treatment assignment mechanism is conventionally modeled in discrete time. While this is natural in situations where treatment information is recorded at scheduled follow-up visits, in other contexts, the events specifying the treatment history can be modeled in continuous time using the tools of event history analysis. This is particularly the case for treatment procedures, such as surgeries. In this paper, we propose a novel approach for flexible parametric estimation of continuous-time IPTC weights and illustrate it in assessing the relationship between metastasectomy and mortality in metastatic renal cell carcinoma patients. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Parametric mixed-effects models are useful in longitudinal data analysis when the sampling frequencies of a response variable and the associated covariates are the same. We propose a three-step estimation procedure using local polynomial smoothing and demonstrate with data where the variables to be assessed are repeatedly sampled with different frequencies within the same time frame. We first insert pseudo data for the less frequently sampled variable based on the observed measurements to create a new dataset. Then standard simple linear regressions are fitted at each time point to obtain raw estimates of the association between dependent and independent variables. Last, local polynomial smoothing is applied to smooth the raw estimates. Rather than use a kernel function to assign weights, only analytical weights that reflect the importance of each raw estimate are used. The standard errors of the raw estimates and the distance between the pseudo data and the observed data are considered as the measure of the importance of the raw estimates. We applied the proposed method to a weight loss clinical trial, and it efficiently estimated the correlation between the inconsistently sampled longitudinal data. Our approach was also evaluated via simulations. The results showed that the proposed method works better when the residual variances of the standard linear regressions are small and the within-subjects correlations are high. Also, using analytic weights instead of kernel function during local polynomial smoothing is important when raw estimates have extreme values, or the association between the dependent and independent variable is nonlinear. Copyright © 2016 John Wiley & Sons, Ltd.

Recent success of immunotherapy and other targeted therapies in cancer treatment has led to an unprecedented surge in the number of novel therapeutic agents that need to be evaluated in clinical trials. Traditional phase II clinical trial designs were developed for evaluating one candidate treatment at a time and thus not efficient for this task. We propose a Bayesian phase II platform design, the multi-candidate iterative design with adaptive selection (MIDAS), which allows investigators to continuously screen a large number of candidate agents in an efficient and seamless fashion. MIDAS consists of one control arm, which contains a standard therapy as the control, and several experimental arms, which contain the experimental agents. Patients are adaptively randomized to the control and experimental agents based on their estimated efficacy. During the trial, we adaptively drop inefficacious or overly toxic agents and ‘graduate’ the promising agents from the trial to the next stage of development. Whenever an experimental agent graduates or is dropped, the corresponding arm opens immediately for testing the next available new agent. Simulation studies show that MIDAS substantially outperforms the conventional approach. The proposed design yields a significantly higher probability for identifying the promising agents and dropping the futile agents. In addition, MIDAS requires only one master protocol, which streamlines trial conduct and substantially decreases the overhead burden. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When there are four or more treatments under comparison, the use of a crossover design with a complete set of treatment-receipt sequences in binary data is of limited use because of too many treatment-receipt sequences. Thus, we may consider use of a 4 × 4 Latin square to reduce the number of treatment-receipt sequences when comparing three experimental treatments with a control treatment. Under a distribution-free random effects logistic regression model, we develop simple procedures for testing non-equality between any of the three experimental treatments and the control treatment in a crossover trial with dichotomous responses. We further derive interval estimators in closed forms for the relative effect between treatments. To evaluate the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. We use the data taken from a crossover trial using a 4 × 4 Latin-square design for studying four-treatments to illustrate the use of test procedures and interval estimators developed here. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Seamless phase II/III clinical trials offer an efficient way to select an experimental treatment and perform confirmatory analysis within a single trial. However, combining the data from both stages in the final analysis can induce bias into the estimates of treatment effects. Methods for bias adjustment developed thus far have made restrictive assumptions about the design and selection rules followed. In order to address these shortcomings, we apply recent methodological advances to derive the uniformly minimum variance conditionally unbiased estimator for two-stage seamless phase II/III trials. Our framework allows for the precision of the treatment arm estimates to take arbitrary values, can be utilised for all treatments that are taken forward to phase III and is applicable when the decision to select or drop treatment arms is driven by a multiplicity-adjusted hypothesis testing procedure. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

]]>We consider the non-inferiority (or equivalence) test of the odds ratio (OR) in a crossover study with binary outcomes to evaluate the treatment effects of two drugs. To solve this problem, Lui and Chang (2011) proposed both an asymptotic method and a conditional method based on a random effects logit model. Kenward and Jones (1987) proposed a likelihood ratio test (*L**R**T*_{M}) based on a log linear model. These existing methods are all subject to model misspecification. In this paper, we propose a likelihood ratio test (*LRT*) and a score test that are independent of model specification. Monte Carlo simulation studies show that, in scenarios considered in this paper, both the *LRT* and the score test have higher power than the asymptotic and conditional methods for the non-inferiority test; the *LRT*, score, and asymptotic methods have similar power, and they all have higher power than the conditional method for the equivalence test. When data can be well described by a log linear model, the *L**R**T*_{M} has the highest power among all the five methods (*L**R**T*_{M}, *LRT*, score, asymptotic, and conditional) for both non-inferiority and equivalence tests. However, in scenarios for which a log linear model does not describe the data well, the *L**R**T*_{M} has the lowest power for the non-inferiority test and has inflated type I error rates for the equivalence test. We provide an example from a clinical trial that illustrates our methods. Copyright © 2016 John Wiley & Sons, Ltd.

Joint modelling of longitudinal and survival data is increasingly used in clinical trials on cancer. In prostate cancer for example, these models permit to account for the link between longitudinal measures of prostate-specific antigen (PSA) and time of clinical recurrence when studying the risk of relapse. In practice, multiple types of relapse may occur successively. Distinguishing these transitions between health states would allow to evaluate, for example, how PSA trajectory and classical covariates impact the risk of dying after a distant recurrence post-radiotherapy, or to predict the risk of one specific type of clinical recurrence post-radiotherapy, from the PSA history. In this context, we present a joint model for a longitudinal process and a multi-state process, which is divided into two sub-models: a linear mixed sub-model for longitudinal data and a multi-state sub-model with proportional hazards for transition times, both linked by a function of shared random effects. Parameters of this joint multi-state model are estimated within the maximum likelihood framework using an EM algorithm coupled with a quasi-Newton algorithm in case of slow convergence. It is implemented under R, by combining and extending mstate and JM packages. The estimation program is validated by simulations and applied on pooled data from two cohorts of men with localized prostate cancer. Thanks to the classical covariates available at baseline and the repeated PSA measurements, we are able to assess the biomarker's trajectory, define the risks of transitions between health states and quantify the impact of the PSA dynamics on each transition intensity. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Meta-analysis of individual participant data (IPD) is increasingly utilised to improve the estimation of treatment effects, particularly among different participant subgroups. An important concern in IPD meta-analysis relates to partially or completely missing outcomes for some studies, a problem exacerbated when interest is on multiple discrete and continuous outcomes. When leveraging information from incomplete correlated outcomes across studies, the fully observed outcomes may provide important information about the incompleteness of the other outcomes. In this paper, we compare two models for handling incomplete continuous and binary outcomes in IPD meta-analysis: a joint hierarchical model and a sequence of full conditional mixed models. We illustrate how these approaches incorporate the correlation across the multiple outcomes and the between-study heterogeneity when addressing the missing data. Simulations characterise the performance of the methods across a range of scenarios which differ according to the proportion and type of missingness, strength of correlation between outcomes and the number of studies. The joint model provided confidence interval coverage consistently closer to nominal levels and lower mean squared error compared with the fully conditional approach across the scenarios considered. Methods are illustrated in a meta-analysis of randomised controlled trials comparing the effectiveness of implantable cardioverter-defibrillator devices alone to implantable cardioverter-defibrillator combined with cardiac resynchronisation therapy for treating patients with chronic heart failure. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Generalized estimating equations (GEE) are often used for the marginal analysis of longitudinal data. Although much work has been performed to improve the validity of GEE for the analysis of data arising from small-sample studies, little attention has been given to power in such settings. Therefore, we propose a valid GEE approach to improve power in small-sample longitudinal study settings in which the temporal spacing of outcomes is the same for each subject. Specifically, we use a modified empirical sandwich covariance matrix estimator within correlation structure selection criteria and test statistics. Use of this estimator can improve the accuracy of selection criteria and increase the degrees of freedom to be used for inference. The resulting impacts on power are demonstrated via a simulation study and application example. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Adaptive, model-based, dose-finding methods, such as the continual reassessment method, have been shown to have good operating characteristics. One school of thought argues in favor of the use of parsimonious models, not modeling all aspects of the problem, and using a strict minimum number of parameters. In particular, for the standard situation of a single homogeneous group, it is common to appeal to a one-parameter model. Other authors argue for a more classical approach that models all aspects of the problem. Here, we show that increasing the dimension of the parameter space, in the context of adaptive dose-finding studies, is usually counter productive and, rather than leading to improvements in operating characteristics, the added dimensionality is likely to result in difficulties. Among these are inconsistency of parameter estimates, lack of coherence in escalation or de-escalation, erratic behavior, getting stuck at the wrong level, and, in almost all cases, poorer performance in terms of correct identification of the targeted dose. Our conclusions are based on both theoretical results and simulations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Testing protocols in large-scale sexually transmitted disease screening applications often involve pooling biospecimens (e.g., blood, urine, and swabs) to lower costs and to increase the number of individuals who can be tested. With the recent development of assays that detect multiple diseases, it is now common to test biospecimen pools for multiple infections simultaneously. Recent work has developed an expectation–maximization algorithm to estimate the prevalence of two infections using a two-stage, Dorfman-type testing algorithm motivated by current screening practices for chlamydia and gonorrhea in the USA. In this article, we have the same goal but instead take a more flexible Bayesian approach. Doing so allows us to incorporate information about assay uncertainty during the testing process, which involves testing both pools and individuals, and also to update information as individuals are tested. Overall, our approach provides reliable inference for disease probabilities and accurately estimates assay sensitivity and specificity even when little or no information is provided in the prior distributions. We illustrate the performance of our estimation methods using simulation and by applying them to chlamydia and gonorrhea data collected in Nebraska. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Progression-free survival is an increasingly popular end point in oncology clinical trials. A complete blinded independent central review (BICR) is often required by regulators in an attempt to reduce the bias in progression-free survival (PFS) assessment. In this paper, we propose a new methodology that uses a sample-based BICR as an audit tool to decide whether a complete BICR is needed. More specifically, we propose a new index, the differential risk, to measure the reading discordance pattern, and develop a corresponding hypothesis testing procedure to decide whether the bias in local evaluation is acceptable. Simulation results demonstrate that our new index is sensitive to the change of discordance pattern; type I error is well controlled in the hypothesis testing procedure, and the calculated sample size provides the desired power. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present a general coregionalization framework for developing coregionalized multivariate Gaussian conditional autoregressive (cMCAR) models for Bayesian analysis of multivariate lattice data in general and multivariate disease mapping data in particular. This framework is inclusive of cMCARs that facilitate flexible modelling of spatially structured symmetric or asymmetric cross-variable local interactions, allowing a wide range of separable or non-separable covariance structures, and symmetric or asymmetric cross-covariances, to be modelled. We present a brief overview of established univariate Gaussian conditional autoregressive (CAR) models for univariate lattice data and develop coregionalized multivariate extensions. Classes of cMCARs are presented by formulating precision structures. The resulting conditional properties of the multivariate spatial models are established, which cast new light on cMCARs with richly structured covariances and cross-covariances of different spatial ranges. The related methods are illustrated via an in-depth Bayesian analysis of a Minnesota county-level cancer data set. We also bring a new dimension to the traditional enterprize of Bayesian disease mapping: estimating and mapping covariances and cross-covariances of the underlying disease risks. Maps of covariances and cross-covariances bring to light spatial characterizations of the cMCARs and inform on spatial risk associations between areas and diseases. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In retrospective studies involving recurrent events, it is common to select individuals based on their event history up to the time of selection. In this case, the ascertained subjects might not be representative for the target population, and the analysis should take the selection mechanism into account. The purpose of this paper is two-fold. First, to study what happens when the data analysis is not adjusted for the selection and second, to propose a corrected analysis. Under the Andersen–Gill and shared frailty regression models, we show that the estimators of covariate effects, incidence, and frailty variance can be biased if the ascertainment is ignored, and we show that with a simple adjustment of the likelihood, unbiased and consistent estimators are obtained. The proposed method is assessed by a simulation study and is illustrated on a data set comprising recurrent pneumothoraces. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In cluster randomized trials, the study units usually are not a simple random sample from some clearly defined target population. Instead, the target population tends to be hypothetical or ill-defined, and the selection of study units tends to be systematic, driven by logistical and practical considerations. As a result, the population average treatment effect (PATE) may be neither well defined nor easily interpretable. In contrast, the sample average treatment effect (SATE) is the mean difference in the counterfactual outcomes for the study units. The sample parameter is easily interpretable and arguably the most relevant when the study units are not sampled from some specific super-population of interest. Furthermore, in most settings, the sample parameter will be estimated more efficiently than the population parameter. To the best of our knowledge, this is the first paper to propose using targeted maximum likelihood estimation (TMLE) for estimation and inference of the sample effect in trials with and without pair-matching. We study the asymptotic and finite sample properties of the TMLE for the sample effect and provide a conservative variance estimator. Finite sample simulations illustrate the potential gains in precision and power from selecting the sample effect as the target of inference. This work is motivated by the Sustainable East Africa Research in Community Health (SEARCH) study, a pair-matched, community randomized trial to estimate the effect of population-based HIV testing and streamlined ART on the 5-year cumulative HIV incidence (NCT01864603). The proposed methodology will be used in the primary analysis for the SEARCH trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

]]>We propose a class of randomized trial designs aimed at gaining the advantages of wider generalizability and faster recruitment while mitigating the risks of including a population for which there is greater a priori uncertainty. We focus on testing null hypotheses for the overall population and a predefined subpopulation. Our designs have preplanned rules for modifying enrollment criteria based on data accrued at interim analyses. For example, enrollment can be restricted if the participants from a predefined subpopulation are not benefiting from the new treatment. Our designs have the following features: the multiple testing procedure fully leverages the correlation among statistics for different populations; the asymptotic familywise Type I error rate is strongly controlled; for outcomes that are binary or normally distributed, the decision rule and multiple testing procedure are functions of the data only through minimal sufficient statistics. Our designs incorporate standard group sequential boundaries for each population of interest; this may be helpful in communicating the designs, because many clinical investigators are familiar with such boundaries, which can be summarized succinctly in a single table or graph. We demonstrate these designs through simulations of a Phase III trial of a new treatment for stroke. User-friendly, free software implementing these designs is described. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We have developed a method, called Meta-STEPP (subpopulation treatment effect pattern plot for meta-analysis), to explore treatment effect heterogeneity across covariate values in the meta-analysis setting for time-to-event data when the covariate of interest is continuous. Meta-STEPP forms overlapping subpopulations from individual patient data containing similar numbers of events with increasing covariate values, estimates subpopulation treatment effects using standard fixed-effects meta-analysis methodology, displays the estimated subpopulation treatment effect as a function of the covariate values, and provides a statistical test to detect possibly complex treatment-covariate interactions. Simulation studies show that this test has adequate type-I error rate recovery as well as power when reasonable window sizes are chosen. When applied to eight breast cancer trials, Meta-STEPP suggests that chemotherapy is less effective for tumors with high estrogen receptor expression compared with those with low expression. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation–conditional maximisation–either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present a cancer phase I clinical trial design of a combination of two drugs with the goal of estimating the maximum tolerated dose curve in the two-dimensional Cartesian plane. A parametric model is used to describe the relationship between the doses of the two agents and the probability of dose limiting toxicity. The model is re-parameterized in terms of the probabilities of toxicities at dose combinations corresponding to the minimum and maximum doses available in the trial and the interaction parameter. Trial design proceeds using cohorts of two patients receiving doses according to univariate escalation with overdose control (EWOC), where at each stage of the trial, we seek a dose of one agent using the current posterior distribution of the MTD of this agent given the current dose of the other agent. The maximum tolerated dose curve is estimated as a function of Bayes estimates of the model parameters. Performance of the trial is studied by evaluating its design operating characteristics in terms of safety of the trial and percent of dose recommendation at dose combination neighborhoods around the true MTD curve and under model misspecifications for the true dose–toxicity relationship. The method is further extended to accommodate discrete dose combinations and compared with previous approaches under several scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

In biomedical studies, it is often of interest to classify/predict a subject's disease status based on a variety of biomarker measurements. A commonly used classification criterion is based on area under the receiver operating characteristic curve (AUC). Many methods have been proposed to optimize approximated empirical AUC criteria, but there are two limitations to the existing methods. First, most methods are only designed to find the best linear combination of biomarkers, which may not perform well when there is strong nonlinearity in the data. Second, many existing linear combination methods use gradient-based algorithms to find the best marker combination, which often result in suboptimal local solutions. In this paper, we address these two problems by proposing a new kernel-based AUC optimization method called ramp AUC (RAUC). This method approximates the empirical AUC loss function with a ramp function and finds the best combination by a difference of convex functions algorithm. We show that as a linear combination method, RAUC leads to a consistent and asymptotically normal estimator of the linear marker combination when the data are generated from a semiparametric generalized linear model, just as the smoothed AUC method. Through simulation studies and real data examples, we demonstrate that RAUC outperforms smooth AUC in finding the best linear marker combinations, and can successfully capture nonlinear pattern in the data to achieve better classification performance. We illustrate our method with a dataset from a recent HIV vaccine trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The number of elderly patients requiring hospitalisation in Europe is rising. With a greater proportion of elderly people in the population comes a greater demand for health services and, in particular, hospital care. Thus, with a growing number of elderly patients requiring hospitalisation competing with non-elderly patients for a fixed (and in some cases, decreasing) number of hospital beds, this results in much longer waiting times for patients, often with a less satisfactory hospital experience. However, if a better understanding of the recurring nature of elderly patient movements between the community and hospital can be developed, then it may be possible for alternative provisions of care in the community to be put in place and thus prevent readmission to hospital. The research in this paper aims to model the multiple patient transitions between hospital and community by utilising a mixture of conditional Coxian phase-type distributions that incorporates Bayes' theorem. For the purpose of demonstration, the results of a simulation study are presented and the model is applied to hospital readmission data from the Lombardy region of Italy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In many clinical settings, improving patient survival is of interest but a practical surrogate, such as time to disease progression, is instead used as a clinical trial's primary endpoint. A time-to-first endpoint (*e.g.,* death or disease progression) is commonly analyzed but may not be adequate to summarize patient outcomes if a subsequent event contains important additional information. We consider a surrogate outcome very generally as one correlated with the true endpoint of interest. Settings of interest include those where the surrogate indicates a beneficial outcome so that the usual time-to-first endpoint of death or surrogate event is nonsensical. We present a new two-sample test for bivariate, interval-censored time-to-event data, where one endpoint is a surrogate for the second, less frequently observed endpoint of true interest. This test examines whether patient groups have equal clinical severity. If the true endpoint rarely occurs, the proposed test acts like a weighted logrank test on the surrogate; if it occurs for most individuals, then our test acts like a weighted logrank test on the true endpoint. If the surrogate is a useful statistical surrogate, our test can have better power than tests based on the surrogate that naively handles the true endpoint. In settings where the surrogate is not valid (treatment affects the surrogate but not the true endpoint), our test incorporates the information regarding the lack of treatment effect from the observed true endpoints and hence is expected to have a dampened treatment effect compared with tests based on the surrogate alone. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

Missing data are a common problem in clinical and epidemiological research, especially in longitudinal studies. Despite many methodological advances in recent decades, many papers on clinical trials and epidemiological studies do not report using principled statistical methods to accommodate missing data or use ineffective or inappropriate techniques. Two refined techniques are presented here: generalized estimating equations (GEEs) and weighted generalized estimating equations (WGEEs). These techniques are an extension of generalized linear models to longitudinal or clustered data, where observations are no longer independent. They can appropriately handle missing data when the missingness is completely at random (GEE and WGEE) or at random (WGEE) and do not require the outcome to be normally distributed. Our aim is to describe and illustrate with a real example, in a simple and accessible way to researchers, these techniques for handling missing data in the context of longitudinal studies subject to dropout and show how to implement them in R. We apply them to assess the evolution of health-related quality of life in coronary patients in a data set subject to dropout. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Incomplete data are generally a challenge to the analysis of most large studies. The current gold standard to account for missing data is multiple imputation, and more specifically multiple imputation with chained equations (MICE). Numerous studies have been conducted to illustrate the performance of MICE for missing covariate data. The results show that the method works well in various situations. However, less is known about its performance in more complex models, specifically when the outcome is multivariate as in longitudinal studies. In current practice, the multivariate nature of the longitudinal outcome is often neglected in the imputation procedure, or only the baseline outcome is used to impute missing covariates. In this work, we evaluate the performance of MICE using different strategies to include a longitudinal outcome into the imputation models and compare it with a fully Bayesian approach that jointly imputes missing values and estimates the parameters of the longitudinal model. Results from simulation and a real data example show that MICE requires the analyst to correctly specify which components of the longitudinal process need to be included in the imputation models in order to obtain unbiased results. The full Bayesian approach, on the other hand, does not require the analyst to explicitly specify how the longitudinal outcome enters the imputation models. It performed well under different scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In a randomized controlled clinical trial that assesses treatment efficacy, a common objective is to assess the association of a measured biomarker response endpoint with the primary study endpoint in the active treatment group, using a case-cohort, case-control, or two-phase sampling design. Methods for power and sample size calculations for such biomarker association analyses typically do not account for the level of treatment efficacy, precluding interpretation of the biomarker association results in terms of biomarker effect modification of treatment efficacy, with detriment that the power calculations may tacitly and inadvertently assume that the treatment harms some study participants. We develop power and sample size methods accounting for this issue, and the methods also account for inter-individual variability of the biomarker that is not biologically relevant (e.g., due to technical measurement error). We focus on a binary study endpoint and on a biomarker subject to measurement error that is normally distributed or categorical with two or three levels. We illustrate the methods with preventive HIV vaccine efficacy trials and include an R package implementing the methods. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The incremental life expectancy, defined as the difference in mean survival times between two treatment groups, is a crucial quantity of interest in cost-effectiveness analyses. Usually, this quantity is very difficult to estimate from censored survival data with a limited follow-up period. The paper develops estimation procedures for a time-shift survival model that, provided model assumptions are met, gives a reliable estimate of incremental life expectancy without extrapolation beyond the study period. Methods for inference are developed both for individual patient data and when only published Kaplan–Meier curves are available. Through simulation, the estimators are shown to be close to unbiased and constructed confidence intervals are shown to have close to nominal coverage for small to moderate sample sizes. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing ‘loss based’ recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new ‘doubly robust’ extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Network meta-analysis (NMA), also known as multiple treatment comparisons, is commonly used to incorporate direct and indirect evidence comparing treatments. With recent advances in methods and software, Bayesian approaches to NMA have become quite popular and allow models of previously unanticipated complexity. However, when direct and indirect evidence differ in an NMA, the model is said to suffer from *inconsistency*. Current inconsistency detection in NMA is usually based on contrast-based (CB) models; however, this approach has certain limitations. In this work, we propose an arm-based random effects model, where we detect discrepancy of direct and indirect evidence for comparing two treatments using the fixed effects in the model while flagging extreme trials using the random effects. We define discrepancy factors to characterize evidence of inconsistency for particular treatment comparisons, which is novel in NMA research. Our approaches permit users to address issues previously tackled via CB models. We compare sources of inconsistency identified by our approach and existing loop-based CB methods using real and simulated datasets and demonstrate that our methods can offer powerful inconsistency detection. Copyright © 2016 John Wiley & Sons, Ltd.

Large sample sizes are required in randomized clinical trials designed to meet typical one-sided 2.5% *α*-level and 80% power. This may not be achievable when the disease is rare. We simulated a series of two-arm superiority trials over a 15-year period. The design parameters examined were the *α*-level and the number of trials conducted over the 15-year period (thus, trial sample size). Different disease severities and accrual rates were considered. The future treatment effect was characterized by its associated hazard rate; different hypotheses of how treatments improve over time were considered. We defined the total survival benefit as the relative difference of the hazard rates at year 15 versus year 0. The optimal design was defined by maximizing the expected total survival benefit, provided that the risk of selecting at year 15 a treatment inferior to the initial control treatment remains below 1%. Compared with two larger trials with typical one-sided 2.5% *α*-level, performing a series of small trials with relaxed *α*-levels leads on average to larger survival benefits over a 15-year research horizon, but also to higher risk of selecting a worse treatment at the end of the research period. Under reasonably optimistic assumptions regarding the future treatment effects, optimal designs outperform traditional ones when the disease is severe (baseline median survival ≤ 1 year) and the accrual is ≥100 patients per year, whereas no major improvement is observed in diseases with better prognosis. Trial designs aiming to maximize survival gain over a long research horizon across a series of trials are worth discussing in the context of rare diseases. Copyright © 2016 John Wiley & Sons, Ltd.

Shared parameter joint models provide a framework under which a longitudinal response and a time to event can be modelled simultaneously. A common assumption in shared parameter joint models has been to assume that the longitudinal response is normally distributed. In this paper, we instead propose a joint model that incorporates a two-part ‘hurdle’ model for the longitudinal response, motivated in part by longitudinal response data that is subject to a detection limit. The first part of the hurdle model estimates the probability that the longitudinal response is observed above the detection limit, whilst the second part of the hurdle model estimates the mean of the response conditional on having exceeded the detection limit. The time-to-event outcome is modelled using a parametric proportional hazards model, assuming a Weibull baseline hazard. We propose a novel association structure whereby the current hazard of the event is assumed to be associated with the current combined (expected) outcome from the two parts of the hurdle model. We estimate our joint model under a Bayesian framework and provide code for fitting the model using the Bayesian software Stan. We use our model to estimate the association between HIV RNA viral load, which is subject to a lower detection limit, and the hazard of stopping or modifying treatment in patients with HIV initiating antiretroviral therapy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Most phase I dose-finding methods in oncology aim to find the maximum-tolerated dose from a set of prespecified doses. However, in practice, because of a lack of understanding of the true dose–toxicity relationship, it is likely that none of these prespecified doses are equal or reasonably close to the true maximum-tolerated dose. To handle this issue, we propose an adaptive dose modification (ADM) method that can be coupled with any existing dose-finding method to adaptively modify the dose, when it is needed, during the course of dose finding. To reflect clinical practice, we divide the toxicity probability into three regions: underdosing, acceptable, and overdosing regions. We adaptively add a new dose whenever the observed data suggest that none of the investigational doses are likely to be located in the acceptable region. The new dose is estimated via a nonparametric dose–toxicity model based on local polynomial regression. The simulation study shows that ADM substantially outperforms the similar existing method. We applied ADM to a phase I cancer trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Propensity score (PS) methods have been used extensively to adjust for confounding factors in the statistical analysis of observational data in comparative effectiveness research. There are four major PS-based adjustment approaches: PS matching, PS stratification, covariate adjustment by PS, and PS-based inverse probability weighting. Though covariate adjustment by PS is one of the most frequently used PS-based methods in clinical research, the conventional variance estimation of the treatment effects estimate under covariate adjustment by PS is biased. As Stampf *et al*. have shown, this bias in variance estimation is likely to lead to invalid statistical inference and could result in erroneous public health conclusions (e.g., food and drug safety and adverse events surveillance). To address this issue, we propose a two-stage analytic procedure to develop a valid variance estimator for the covariate adjustment by PS analysis strategy. We also carry out a simple empirical bootstrap resampling scheme. Both proposed procedures are implemented in an R function for public use. Extensive simulation results demonstrate the bias in the conventional variance estimator and show that both proposed variance estimators offer valid estimates for the true variance, and they are robust to complex confounding structures. The proposed methods are illustrated for a post-surgery pain study. Copyright © 2016 John Wiley & Sons, Ltd.

It is well recognized that sample size determination is challenging because of the uncertainty on the treatment effect size. Several remedies are available in the literature. Group sequential designs start with a sample size based on a conservative (smaller) effect size and allow early stop at interim looks. Sample size re-estimation designs start with a sample size based on an optimistic (larger) effect size and allow sample size increase if the observed effect size is smaller than planned. Different opinions favoring one type over the other exist. We propose an optimal approach using an appropriate optimality criterion to select the best design among all the candidate designs. Our results show that (1) for the same type of designs, for example, group sequential designs, there is room for significant improvement through our optimization approach; (2) optimal promising zone designs appear to have no advantages over optimal group sequential designs; and (3) optimal designs with sample size re-estimation deliver the best adaptive performance. We conclude that to deal with the challenge of sample size determination due to effect size uncertainty, an optimal approach can help to select the best design that provides most robust power across the effect size range of interest. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We develop a multivariate cure survival model to estimate lifetime patterns of colorectal cancer screening. Screening data cover long periods of time, with sparse observations for each person. Some events may occur before the study begins or after the study ends, so the data are both left-censored and right-censored, and some individuals are never screened (the ‘cured’ population). We propose a multivariate parametric cure model that can be used with left-censored and right-censored data. Our model allows for the estimation of the time to screening as well as the average number of times individuals will be screened. We calculate likelihood functions based on the observations for each subject using a distribution that accounts for within-subject correlation and estimate parameters using Markov chain Monte Carlo methods. We apply our methods to the estimation of lifetime colorectal cancer screening behavior in the SEER-Medicare data set. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Markov three-state progressive and illness–death models are often used in biomedicine for describing survival data when an intermediate event of interest may be observed during the follow-up. However, the usual estimators for Markov models (e.g., Aalen–Johansen transition probabilities) may be systematically biased in non-Markovian situations. On the other hand, despite non-Markovian estimators for transition probabilities and related curves are available, including the Markov information in the construction of the estimators allows for variance reduction. Therefore, testing for the Markov condition is a relevant issue in practice. In this paper, we discuss several characterizations of the Markov condition, with special focus on its equivalence with the quasi-independence between left truncation and survival times in standard survival analysis. New methods for testing the Markovianity of an illness–death model are proposed and compared with existing ones by means of an intensive simulation study. We illustrate our findings through the analysis of a data set from stem cell transplant in leukemia. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The Bayesian model averaging continual reassessment method (CRM) is a Bayesian dose-finding design. It improves the robustness and overall performance of the continual reassessment method (CRM) by specifying multiple skeletons (or models) and then using Bayesian model averaging to automatically favor the best-fitting model for better decision making. Specifying multiple skeletons, however, can be challenging for practitioners. In this paper, we propose a default way to specify skeletons for the Bayesian model averaging CRM. We show that skeletons that appear rather different may actually lead to equivalent models. Motivated by this, we define a nonequivalence measure to index the difference among skeletons. Using this measure, we extend the model calibration method of Lee and Cheung (2009) to choose the optimal skeletons that maximize the average percentage of correct selection of the maximum tolerated dose and ensure sufficient nonequivalence among the skeletons. Our simulation study shows that the proposed method has desirable operating characteristics. We provide software to implement the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Resecting bone tumors requires good cutting accuracy to reduce the occurrence of local recurrence. This issue is considerably reduced with a navigated technology. The estimation of extreme proportions is challenging especially with small or moderate sample sizes. When no success is observed, the commonly used binomial proportion confidence interval is not suitable while the rule of three provides a simple solution. Unfortunately, these approaches are unable to differentiate between different unobserved events. Different delta methods and bootstrap procedures are compared in univariate and linear mixed models with simulations and real data by assuming the normality. The delta method on the z-score and parametric bootstrap provide similar results but the delta method requires the estimation of the covariance matrix of the estimates. In mixed models, the observed Fisher information matrix with unbounded variance components should be preferred. The parametric bootstrap, easier to apply, outperforms the delta method for larger sample sizes but it may be time costly. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Joint analysis of longitudinal and survival data has received increasing attention in the recent years, especially for analyzing cancer and AIDS data. As both repeated measurements (longitudinal) and time-to-event (survival) outcomes are observed in an individual, a joint modeling is more appropriate because it takes into account the dependence between the two types of responses, which are often analyzed separately. We propose a Bayesian hierarchical model for jointly modeling longitudinal and survival data considering functional time and spatial frailty effects, respectively. That is, the proposed model deals with non-linear longitudinal effects and spatial survival effects accounting for the unobserved heterogeneity among individuals living in the same region. This joint approach is applied to a cohort study of patients with HIV/AIDS in Brazil during the years 2002–2006. Our Bayesian joint model presents considerable improvements in the estimation of survival times of the Brazilian HIV/AIDS patients when compared with those obtained through a separate survival model and shows that the spatial risk of death is the same across the different Brazilian states. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation.

We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents.

We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design.

Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. Copyright © 2016 John Wiley & Sons, Ltd.

The increase in incidence of obesity and chronic diseases and their health care costs have raised the importance of quality diet on the health policy agendas. The healthy eating index is an important measure for diet quality which consists of 12 components derived from ratios of dependent variables with distributions hard to specify, measurement errors and excessive zero observations difficult to model parametrically. Hypothesis testing involving data of such nature poses challenges because the widely used multiple comparison procedures such as Hotelling's *T*^{2} test and Bonferroni correction may suffer from substantial loss of efficiency. We propose a marginal rank-based inverse normal transformation approach to normalizing the marginal distribution of the data before employing a multivariate test procedure. Extensive simulation was conducted to demonstrate the ability of the proposed approach to adequately control the type I error rate as well as increase the power of the test, with data particularly from non-symmetric or heavy-tailed distributions. The methods are exemplified with data from a dietary intervention study for type I diabetic children. Published 2016. This article is a U.S. Government work and is in the public domain in the USA

Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-*cvl*). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the *cvl* by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.

Controversy over non-reproducible published research reporting a statistically significant result has produced substantial discussion in the literature. *p*-value calibration is a recently proposed procedure for adjusting *p*-values to account for both random and systematic errors that address one aspect of this problem. The method's validity rests on the key assumption that bias in an effect estimate is drawn from a normal distribution whose mean and variance can be correctly estimated. We investigated the method's control of type I and type II error rates using simulated and real-world data. Under mild violations of underlying assumptions, control of the type I error rate can be conservative, while under more extreme departures, it can be anti-conservative. The extent to which the assumption is violated in real-world data analyses is unknown. Barriers to testing the plausibility of the assumption using historical data are discussed. Our studies of the type II error rate using simulated and real-world electronic health care data demonstrated that calibrating *p*-values can substantially increase the type II error rate. The use of calibrated *p*-values may reduce the number of false-positive results, but there will be a commensurate drop in the ability to detect a true safety or efficacy signal. While *p*-value calibration can sometimes offer advantages in controlling the type I error rate, its adoption for routine use in studies of real-world health care datasets is premature. Separate characterizations of random and systematic errors provide a richer context for evaluating uncertainty surrounding effect estimates. Copyright © 2016 John Wiley & Sons, Ltd.

Multiple imputation has become a popular approach for analyzing incomplete data. Many software packages are available to multiply impute the missing values and to analyze the resulting completed data sets. However, diagnostic tools to check the validity of the imputations are limited, and the majority of the currently available methods need considerable knowledge of the imputation model. In many practical settings, however, the imputer and the analyst may be different individuals or from different organizations, and the analyst model may or may not be congenial to the model used by the imputer. This article develops and evaluates a set of graphical and numerical diagnostic tools for two practical purposes: (i) for an analyst to determine whether the imputations are reasonable under his/her model assumptions without actually knowing the imputation model assumptions; and (ii) for an imputer to fine tune the imputation model by checking the key characteristics of the observed and imputed values. The tools are based on the numerical and graphical comparisons of the distributions of the observed and imputed values conditional on the propensity of response. The methodology is illustrated using simulated data sets created under a variety of scenarios. The examples focus on continuous and binary variables, but the principles can be used to extend methods for other types of variables. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A sequential design is proposed to test whether the accuracy of a binary diagnostic biomarker meets the minimal level of acceptance. The accuracy of a binary diagnostic biomarker is a linear combination of the marker's sensitivity and specificity. The objective of the sequential method is to minimize the maximum expected sample size under the null hypothesis that the marker's accuracy is below the minimal level of acceptance. The exact results of two-stage designs based on Youden's index and efficiency indicate that the maximum expected sample sizes are smaller than the sample sizes of the fixed designs. Exact methods are also developed for estimation, confidence interval and p-value concerning the proposed accuracy index upon termination of the sequential testing. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

]]>Peers are often able to provide important additional information to supplement self-reported behavioral measures. The study motivating this work collected data on alcohol in a social network formed by college students living in a freshman dormitory. By using two imperfect sources of information (self-reported and peer-reported alcohol consumption), rather than solely self-reports or peer-reports, we are able to gain insight into alcohol consumption on both the population and the individual level, as well as information on the discrepancy of individual peer-reports. We develop a novel Bayesian comparative calibration model for continuous, count, and binary outcomes that uses covariate information to characterize the joint distribution of both self and peer-reports on the network for estimating peer-reporting discrepancies in network surveys, and apply this to the data for fully Bayesian inference. We use this model to understand the effects of covariates on both drinking behavior and peer-reporting discrepancies. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present here an extension of the classic bivariate random effects meta-analysis for the log-transformed sensitivity and specificity that can be applied for two or more diagnostic tests. The advantage of this method is that a closed-form expression is derived for the calculation of the within-studies covariances. The method allows the direct calculation of sensitivity and specificity, as well as, the diagnostic odds ratio, the area under curve and the parameters of the summary receiver operator's characteristic curve, along with the means for a formal comparison of these quantities for different tests. There is no need for individual patient data or the simultaneous evaluation of both diagnostic tests in all studies. The method is simple and fast; it can be extended for several diagnostic tests and can be fitted in nearly all statistical packages. The method was evaluated in simulations and applied in a meta-analysis for the comparison of anti-cyclic citrullinated peptide antibody and rheumatoid factor for discriminating patients with rheumatoid arthritis, with encouraging results. Simulations suggest that the method is robust and more powerful compared with the standard bivariate approach that ignores the correlation between tests. Copyright © 2016 John Wiley & Sons, Ltd.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>A full independent drug development programme to demonstrate efficacy may not be ethical and/or feasible in small populations such as paediatric populations or orphan indications. Different levels of extrapolation from a larger population to smaller target populations are widely used for supporting decisions in this situation. There are guidance documents in drug regulation, where a weakening of the statistical rigour for trials in the target population is mentioned to be an option for dealing with this problem. To this end, we propose clinical trials designs, which make use of prior knowledge on efficacy for inference. We formulate a framework based on prior beliefs in order to investigate when the significance level for the test of the primary endpoint in confirmatory trials can be relaxed (and thus the sample size can be reduced) in the target population while controlling a certain posterior belief in effectiveness after rejection of the null hypothesis in the corresponding confirmatory statistical test. We show that point-priors may be used in the argumentation because under certain constraints, they have favourable limiting properties among other types of priors. The crucial quantity to be elicited is the prior belief in the possibility of extrapolation from a larger population to the target population. We try to illustrate an existing decision tree for extrapolation to paediatric populations within our framework. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Vitamin D measurements are influenced by seasonal variation and specific assay used. Motivated by multicenter studies of associations of vitamin D with cancer, we formulated an analytic framework for matched case–control data that accounts for seasonal variation and calibrates to a reference assay. Calibration data were obtained from controls sampled within decile strata of the uncalibrated vitamin D values. Seasonal sine–cosine series were fit to control data. Practical findings included the following: (1) failure to adjust for season and calibrate increased variance, bias, and mean square error and (2) analysis of continuous vitamin D requires a variance adjustment for variation in the calibration estimate. An advantage of the continuous linear risk model is that results are independent of the reference date for seasonal adjustment. (3) For categorical risk models, procedures based on categorizing the seasonally adjusted and calibrated vitamin D have near nominal operating characteristics; estimates of log odds ratios are not robust to choice of seasonal reference date, however. Thus, public health recommendations based on categories of vitamin D should also define the time of year to which they refer. This work supports the use of simple methods for calibration and seasonal adjustment and is informing analytic approaches for the multicenter Vitamin D Pooling Project for Breast and Colorectal Cancer. Published 2016. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

]]>In stepped cluster designs the intervention is introduced into some (or all) clusters at different times and persists until the end of the study. Instances include traditional parallel cluster designs and the more recent stepped-wedge designs. We consider the precision offered by such designs under mixed-effects models with fixed time and random subject and cluster effects (including interactions with time), and explore the optimal choice of uptake times. The results apply both to cross-sectional studies where new subjects are observed at each time-point, and longitudinal studies with repeat observations on the same subjects.

The efficiency of the design is expressed in terms of a ‘cluster-mean correlation’ which carries information about the dependency-structure of the data, and two design coefficients which reflect the pattern of uptake-times. In cross-sectional studies the cluster-mean correlation combines information about the cluster-size and the intra-cluster correlation coefficient. A formula is given for the ‘design effect’ in both cross-sectional and longitudinal studies.

An algorithm for optimising the choice of uptake times is described and specific results obtained for the best balanced stepped designs. In large studies we show that the best design is a hybrid mixture of parallel and stepped-wedge components, with the proportion of stepped wedge clusters equal to the cluster-mean correlation. The impact of prior uncertainty in the cluster-mean correlation is considered by simulation. Some specific hybrid designs are proposed for consideration when the cluster-mean correlation cannot be reliably estimated, using a minimax principle to ensure acceptable performance across the whole range of unknown values. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Dynamic prediction uses longitudinal biomarkers for real-time prediction of an individual patient's prognosis. This is critical for patients with an incurable disease such as cancer. Biomarker trajectories are usually not linear, nor even monotone, and vary greatly across individuals. Therefore, it is difficult to fit them with parametric models. With this consideration, we propose an approach for dynamic prediction that does not need to model the biomarker trajectories. Instead, as a trade-off, we assume that the biomarker effects on the risk of disease recurrence are smooth functions over time. This approach turns out to be computationally easier. Simulation studies show that the proposed approach achieves stable estimation of biomarker effects over time, has good predictive performance, and is robust against model misspecification. It is a good compromise between two major approaches, namely, (i) joint modeling of longitudinal and survival data and (ii) landmark analysis. The proposed method is applied to patients with chronic myeloid leukemia. At any time following their treatment with tyrosine kinase inhibitors, longitudinally measured *BCR-ABL* gene expression levels are used to predict the risk of disease progression. Copyright © 2016 John Wiley & Sons, Ltd.

We propose a prediction model for the cumulative incidence functions of competing risks, based on a logit link. Because of a concern about censoring potentially depending on time-varying covariates in our motivating human immunodeficiency virus (HIV) application, we describe an approach for estimating the parameters in the prediction models using inverse probability of censoring weighting under a missingness at random assumption. We then illustrate the application of this methodology to identify predictors of the competing outcomes of virologic failure, an efficacy outcome, and treatment limiting adverse event, a safety outcome, among human immunodeficiency virus-infected patients first starting antiretroviral treatment. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This paper considers the analysis of a repeat event outcome in clinical trials of chronic diseases in the context of dependent censoring (e.g. mortality). It has particular application in the context of recurrent heart failure hospitalisations in trials of heart failure. Semi-parametric joint frailty models (JFMs) simultaneously analyse recurrent heart failure hospitalisations and time to cardiovascular death, estimating distinct hazard ratios whilst individual-specific latent variables induce associations between the two processes. A simulation study was carried out to assess the suitability of the JFM versus marginal analyses of recurrent events and cardiovascular death using standard methods. Hazard ratios were consistently overestimated when marginal models were used, whilst the JFM produced good, well-estimated results. An application to the Candesartan in Heart failure: Assessment of Reduction in Mortality and morbidity programme was considered. The JFM gave unbiased estimates of treatment effects in the presence of dependent censoring. We advocate the use of the JFM for future trials that consider recurrent events as the primary outcome. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

This paper introduces a method of surveillance using deviations from probabilistic forecasts. Realised observations are compared with probabilistic forecasts, and the “deviation” metric is based on low probability events. If an alert is declared, the algorithm continues to monitor until an all-clear is announced. Specifically, this article addresses the problem of syndromic surveillance for influenza (flu) with the intention of detecting outbreaks, due to new strains of viruses, over and above the normal seasonal pattern. The syndrome is hospital admissions for flu-like illness, and hence, the data are low counts. In accordance with the count properties of the observations, an integer-valued autoregressive process is used to model flu occurrences. Monte Carlo evidence suggests the method works well in stylised but somewhat realistic situations. An application to real flu data indicates that the ideas may have promise. The model estimated on a short run of training data did not declare false alarms when used with new observations deemed in control, ex post. The model easily detected the 2009 *H*1*N*1 outbreak. Copyright © 2016 John Wiley & Sons, Ltd.

Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non-scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution-free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data-adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC).

In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd.

Identification of the latency period and age-related susceptibility, if any, is an important aspect of assessing risks of environmental, nutritional, and occupational exposures. We consider estimation and inference for latency and age-related susceptibility in relative risk and excess risk models. We focus on likelihood-based methods for point and interval estimation of the latency period and age-related windows of susceptibility coupled with several commonly considered exposure metrics. The method is illustrated in a study of the timing of the effects of constituents of air pollution on mortality in the Nurses' Health Study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>