A new testing approach is described for improving statistical tests of independence in sets of tables stratified on one or more relevant factors in case of categorical (nominal or ordinal) variables. Common tests of independence that exploit the ordinality of one of the variables use a restricted-alternative approach. A different, relaxed-null method is presented. Specifically, the M-moment score tests and the correlation tests are introduced. Using multinomial-Poisson homogeneous modeling theory, it is shown that these tests are computationally and conceptually simple, and simulation results suggest that they can perform better than other common tests of conditional independence. To illustrate, the proposed tests are used to better understand the human papillomavirus type-specific infection by exploring the intention to vaccinate. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We describe and evaluate a regression tree algorithm for finding subgroups with differential treatments effects in randomized trials with multivariate outcomes. The data may contain missing values in the outcomes and covariates, and the treatment variable is not limited to two levels. Simulation results show that the regression tree models have unbiased variable selection and the estimates of subgroup treatment effects are approximately unbiased. A bootstrap calibration technique is proposed for constructing confidence intervals for the treatment effects. The method is illustrated with data from a longitudinal study comparing two diabetes drugs and a mammography screening trial comparing two treatments and a control. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Multi-state models generalize survival or duration time analysis to the estimation of transition-specific hazard rate functions for multiple transitions. When each of the transition-specific risk functions is parametrized with several distinct covariate effect coefficients, this leads to a model of potentially high dimension. To decrease the parameter space dimensionality and to work out a clear image of the underlying multi-state model structure, one can either aim at setting some coefficients to zero or to make coefficients for the same covariate but two different transitions equal. The first issue can be approached by penalizing the absolute values of the covariate coefficients as in lasso regularization. If, instead, absolute differences between coefficients of the same covariate on different transitions are penalized, this leads to sparse competing risk relations within a multi-state model, that is, equality of covariate effect coefficients. In this paper, a new estimation approach providing sparse multi-state modelling by the aforementioned principles is established, based on the estimation of multi-state models and a simultaneous penalization of the L_{1}-norm of covariate coefficients and their differences in a structured way. The new multi-state modelling approach is illustrated on peritoneal dialysis study data and implemented in the R package penMSM. Copyright © 2016 John Wiley & Sons, Ltd.

The receiver operating characteristic (ROC) curve is the most popular statistical tool for evaluating the discriminatory capability of a given continuous biomarker. The need to compare two correlated ROC curves arises when individuals are measured with two biomarkers, which induces paired and thus correlated measurements. Many researchers have focused on comparing two correlated ROC curves in terms of the area under the curve (AUC), which summarizes the overall performance of the marker. However, particular values of specificity may be of interest. We focus on comparing two correlated ROC curves at a given specificity level. We propose parametric approaches, transformations to normality, and nonparametric kernel-based approaches. Our methods can be straightforwardly extended for inference in terms of *R**O**C*^{−1}(*t*). This is of particular interest for comparing the accuracy of two correlated biomarkers at a given sensitivity level. Extensions also involve inference for the AUC and accommodating covariates. We evaluate the robustness of our techniques through simulations, compare them with other known approaches, and present a real-data application involving prostate cancer screening. Copyright © 2016 John Wiley & Sons, Ltd.

We propose statistical definitions of the individual benefit of a medical or behavioral treatment and of the severity of a chronic illness. These definitions are used to develop a graphical method that can be used by statisticians and clinicians in the data analysis of clinical trials from the perspective of personalized medicine. The method focuses on assessing and comparing individual effects of treatments rather than average effects and can be used with continuous and discrete responses, including dichotomous and count responses. The method is based on new developments in generalized linear mixed-effects models, which are introduced in this article. To illustrate, analyses of data from the Sequenced Treatment Alternatives to Relieve Depression clinical trial of sequences of treatments for depression and data from a clinical trial of respiratory treatments are presented. The estimation of individual benefits is also explained. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The focus of this paper is dietary intervention trials. We explore the statistical issues involved when the response variable, intake of a food or nutrient, is based on self-report data that are subject to inherent measurement error. There has been little work on handling error in this context. A particular feature of self-reported dietary intake data is that the error may be differential by intervention group. Measurement error methods require information on the nature of the errors in the self-report data. We assume that there is a calibration sub-study in which unbiased biomarker data are available. We outline methods for handling measurement error in this setting and use theory and simulations to investigate how self-report and biomarker data may be combined to estimate the intervention effect. Methods are illustrated using data from the Trial of Nonpharmacologic Intervention in the Elderly, in which the intervention was a sodium-lowering diet and the response was sodium intake. Simulations are used to investigate the methods under differential error, differing reliability of self-reports relative to biomarkers and different proportions of individuals in the calibration sub-study. When the reliability of self-report measurements is comparable with that of the biomarker, it is advantageous to use the self-report data in addition to the biomarker to estimate the intervention effect. If, however, the reliability of the self-report data is low compared with that in the biomarker, then, there is little to be gained by using the self-report data. Our findings have important implications for the design of dietary intervention trials. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

]]>Understanding the impact of concurrency, defined as overlapping sexual partnerships, on the spread of HIV within various communities has been complicated by difficulties in measuring concurrency. Retrospective sexual history data consisting of first and last dates of sexual intercourse for each previous and ongoing partnership is often obtained through use of cross-sectional surveys. Previous attempts to empirically estimate the magnitude and extent of concurrency among these surveyed populations have inadequately accounted for the dependence between partnerships and used only a snapshot of the available data. We introduce a joint multistate and point process model in which states are defined as the number of ongoing partnerships an individual is engaged in at a given time. Sexual partnerships starting and ending on the same date are referred to as one-offs and modeled as discrete events. The proposed method treats each individual's continuation in and transition through various numbers of ongoing partnerships as a separate stochastic process and allows the occurrence of one-offs to impact subsequent rates of partnership formation and dissolution. Estimators for the concurrent partnership distribution and mean sojourn times during which a person has ** k** ongoing partnerships are presented. We demonstrate this modeling approach using epidemiological data collected from a sample of men having sex with men and seeking HIV testing at a Los Angeles clinic. Among this sample, the estimated point prevalence of concurrency was higher among men later diagnosed HIV positive. One-offs were associated with increased rates of subsequent partnership dissolution. Copyright © 2016 John Wiley & Sons, Ltd.

Epidermal nerve fibre (ENF) density and morphology are used to study small fibre involvement in diabetic, HIV, chemotherapy induced and other neuropathies. ENF density and summed length of ENFs per epidermal surface area are reduced, and ENFs may appear more clustered within the epidermis in subjects with small fibre neuropathy than in healthy subjects. Therefore, it is important to understand the spatial structure of ENFs. In this paper, we compare the ENF patterns between healthy subjects and subjects suffering from mild diabetic neuropathy. The study is based on suction skin blister specimens from the right foot of 32 healthy subjects and eight subjects with mild diabetic neuropathy. We regard the ENF entry point (location where the trunks of a nerve enters the epidermis) and ENF end point (termination of the nerve fibres) patterns as realizations of spatial point processes, and develop tools that can be used in the analysis and modelling of ENF patterns. We use spatial summary statistics and shift plots and define a new tool, reactive territory, to study the spatial patterns and to compare the patterns of the two groups. We will also introduce a simple model for these data in order to understand the growth process of the nerve fibres. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The stepped wedge design is a unique clinical trial design that allows for a sequential introduction of an intervention. However, the statistical analysis is unclear when this design is applied in survival data. The time-dependent introduction of the intervention in combination with terminal endpoints and interval censoring makes the analysis more complicated. In this paper, a time-on-study scale discrete survival model was constructed. Simulations were conducted primarily to study the performance of our model for different settings of the stepped wedge design. Secondary, we compared our approach to continuous Cox proportional hazard model. The results show that the discrete survival model estimates the intervention effects unbiasedly. If the length of the censoring interval is increased, the precision of the estimates is decreased. Without left truncation and late entry, the number of steps improves the precision of the estimates, whereas in combination of left truncation and late entry, the number of steps decreases the precision. Given the same number of participants and clusters, a parallel group design has higher precision than a stepped wedge design. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When studies in meta-analysis include different sets of confounders, simple analyses can cause a bias (omitting confounders that are missing in certain studies) or precision loss (omitting studies with incomplete confounders, i.e. a complete-case meta-analysis). To overcome these types of issues, a previous study proposed modelling the high correlation between partially and fully adjusted regression coefficient estimates in a bivariate meta-analysis. When multiple differently adjusted regression coefficient estimates are available, we propose exploiting such correlations in a graphical model. Compared with a previously suggested bivariate meta-analysis method, such a graphical model approach is likely to reduce the number of parameters in complex missing data settings by omitting the direct relationships between some of the estimates. We propose a structure-learning rule whose justification relies on the missingness pattern being monotone. This rule was tested using epidemiological data from a multi-centre survey. In the analysis of risk factors for early retirement, the method showed a smaller difference from a complete data odds ratio and greater precision than a commonly used complete-case meta-analysis. Three real-world applications with monotone missing patterns are provided, namely, the association between (1) the fibrinogen level and coronary heart disease, (2) the intima media thickness and vascular risk and (3) allergic asthma and depressive episodes. The proposed method allows for the inclusion of published summary data, which makes it particularly suitable for applications involving both microdata and summary data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a flexible cure rate model that accommodates different censoring distributions for the cured and uncured groups and also allows for some individuals to be observed as cured when their survival time exceeds a known threshold. We model the survival times for the uncured group using an accelerated failure time model with errors distributed according to the seminonparametric distribution, potentially truncated at a known threshold. We suggest a straightforward extension of the usual expectation–maximization algorithm approach for obtaining estimates in cure rate models to accommodate the cure threshold and dependent censoring. We additionally suggest a likelihood ratio test for testing for the presence of dependent censoring in the proposed cure rate model. We show through numerical studies that our model has desirable properties and leads to approximately unbiased parameter estimates in a variety of scenarios. To demonstrate how our method performs in practice, we analyze data from a bone marrow transplantation study and a liver transplant study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a two-step procedure to personalize drug dosage over time under the framework of a log-linear mixed-effect model. We model patients' heterogeneity using subject-specific random effects, which are treated as the realizations of an unspecified stochastic process. We extend the conditional quadratic inference function to estimate both fixed-effect coefficients and individual random effects on a longitudinal training data sample in the first step and propose an adaptive procedure to estimate new patients' random effects and provide dosage recommendations for new patients in the second step. An advantage of our approach is that we do not impose any distribution assumption on estimating random effects. Moreover, the new approach can accommodate more general time-varying covariates corresponding to random effects. We show in theory and numerical studies that the proposed method is more efficient compared with existing approaches, especially when covariates are time varying. In addition, a real data example of a clozapine study confirms that our two-step procedure leads to more accurate drug dosage recommendations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This paper presents a new goodness-of-fit test for an ordered stereotype model used for an ordinal response variable. The proposed test is based on the well-known Hosmer–Lemeshow test and its version for the proportional odds regression model. The latter test statistic is calculated from a grouping scheme assuming that the levels of the ordinal response are equally spaced which might be not true. One of the main advantages of the ordered stereotype model is that it allows us to determine a new uneven spacing of the ordinal response categories, dictated by the data. The proposed test takes the use of this new adjusted spacing to partition data. A simulation study shows good performance of the proposed test under a variety of scenarios. Finally, the results of the application in two examples are presented. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Unmeasured confounding is the fundamental obstacle to drawing causal conclusions about the impact of an intervention from observational data. Typically, covariates are measured to eliminate or ameliorate confounding, but they may be insufficient or unavailable. In the special setting where a transient intervention or exposure varies over time within each individual and confounding is time constant, a different tack is possible. The key idea is to condition on either the overall outcome or the proportion of time in the intervention. These measures can eliminate the unmeasured confounding either by conditioning or by use of a proxy covariate. We evaluate existing methods and develop new models from which causal conclusions can be drawn from such observational data even if no baseline covariates are measured. Our motivation for this work was to determine the causal effect of *Streptococcus* bacteria in the throat on pharyngitis (sore throat) in Indian schoolchildren. Using our models, we show that existing methods can be badly biased and that sick children who are rarely colonized have a high probability that the *Streptococcus* bacteria are causing their disease. Published 2016. This article is a U.S. Government work and is in the public domain in the USA

Unmeasured confounding remains an important problem in observational studies, including pharmacoepidemiological studies of large administrative databases. Several recently developed methods utilize smaller validation samples, with information on additional confounders, to control for confounders unmeasured in the main, larger database. However, up-to-date applications of these methods to survival analyses seem to be limited to propensity score calibration, which relies on a strong surrogacy assumption. We propose a new method, specifically designed for time-to-event analyses, which uses martingale residuals, in addition to measured covariates, to enhance imputation of the unmeasured confounders in the main database. The method is applicable for analyses with both time-invariant data and time-varying exposure/confounders. In simulations, our method consistently eliminated bias because of unmeasured confounding, regardless of surrogacy violation and other relevant design parameters, and almost always yielded lower mean squared errors than other methods applicable for survival analyses, outperforming propensity score calibration in several scenarios. We apply the method to a real-life pharmacoepidemiological database study of the association between glucocorticoid therapy and risk of type II diabetes mellitus in patients with rheumatoid arthritis, with additional potential confounders available in an external validation sample. Compared with conventional analyses, which adjust only for confounders measured in the main database, our estimates suggest a considerably weaker association. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Typically, clusters and individuals in cluster randomized trials are allocated across treatment conditions in a balanced fashion. This is optimal under homogeneous costs and outcome variances. However, both the costs and the variances may be heterogeneous. Then, an unbalanced allocation is more efficient but impractical as the outcome variance is unknown in the design stage of a study. A practical alternative to the balanced design could be a design optimal for known and possibly heterogeneous costs and homogeneous variances. However, when costs and variances are heterogeneous, both designs suffer from loss of efficiency, compared with the optimal design. Focusing on cluster randomized trials with a 2 × 2 design, the relative efficiency of the balanced design and of the design optimal for heterogeneous costs and homogeneous variances is evaluated, relative to the optimal design. We consider two heterogeneous scenarios (two treatment arms with small, and two with large, costs or variances, or one small, two intermediate, and one large costs or variances) at each design level (cluster, individual, and both). Within these scenarios, we compute the relative efficiency of the two designs as a function of the extents of heterogeneity of the costs and variances, and the congruence (the cheapest treatment has the smallest variance) and incongruence (the cheapest treatment has the largest variance) between costs and variances. We find that the design optimal for heterogeneous costs and homogeneous variances is generally more efficient than the balanced design and we illustrate this theory on a trial that examines methods to reduce radiological referrals from general practices. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Motivated by a genetic application, this paper addresses the problem of fitting regression models when the predictor is a proportion measured with error. While the problem of dealing with additive measurement error in fitting regression models has been extensively studied, the problem where the additive error is of a binomial nature has not been addressed. The measurement errors here are heteroscedastic for two reasons; dependence on the underlying true value and changing sampling effort over observations. While some of the previously developed methods for treating additive measurement error with heteroscedasticity can be used in this setting, other methods need modification. A new version of simulation extrapolation is developed, and we also explore a variation on the standard regression calibration method that uses a beta-binomial model based on the fact that the true value is a proportion. Although most of the methods introduced here can be used for fitting non-linear models, this paper will focus primarily on their use in fitting a linear model. While previous work has focused mainly on estimation of the coefficients, we will, with motivation from our example, also examine estimation of the variance around the regression line. In addressing these problems, we also discuss the appropriate manner in which to bootstrap for both inferences and bias assessment. The various methods are compared via simulation, and the results are illustrated using our motivating data, for which the goal is to relate the methylation rate of a blood sample to the age of the individual providing the sample. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Optimal timing of initiating antiretroviral therapy has been a controversial topic in HIV research. Two highly publicized studies applied different analytical approaches, a dynamic marginal structural model and a multiple imputation method, to different observational databases and came up with different conclusions. Discrepancies between the two studies' results could be due to differences between patient populations, fundamental differences between statistical methods, or differences between implementation details. For example, the two studies adjusted for different covariates, compared different thresholds, and had different criteria for qualifying measurements. If both analytical approaches were applied to the same cohort holding technical details constant, would their results be similar? In this study, we applied both statistical approaches using observational data from 12,708 HIV-infected persons throughout the USA. We held technical details constant between the two methods and then repeated analyses varying technical details to understand what impact they had on findings. We also present results applying both approaches to simulated data. Results were similar, although not identical, when technical details were held constant between the two statistical methods. Confidence intervals for the dynamic marginal structural model tended to be wider than those from the imputation approach, although this may have been due in part to additional external data used in the imputation analysis. We also consider differences in the estimands, required data, and assumptions of the two statistical methods. Our study provides insights into assessing optimal dynamic treatment regimes in the context of starting antiretroviral therapy and in more general settings. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Concordance measures are frequently used for assessing the discriminative ability of risk prediction models. The interpretation of estimated concordance at external validation is difficult if the case-mix differs from the model development setting. We aimed to develop a concordance measure that provides insight into the influence of case-mix heterogeneity and is robust to censoring of time-to-event data.

We first derived a model-based concordance (*mbc*) measure that allows for quantification of the influence of case-mix heterogeneity on discriminative ability of proportional hazards and logistic regression models. This *mbc* can also be calculated including a regression slope that calibrates the predictions at external validation (*c-mbc*), hence assessing the influence of overall regression coefficient validity on discriminative ability. We derived variance formulas for both *mbc* and *c-mbc*. We compared the *mbc* and the *c-mbc* with commonly used concordance measures in a simulation study and in two external validation settings.

The *mbc* was asymptotically equivalent to a previously proposed resampling-based case-mix corrected c-index. The *c-mbc* remained stable at the true value with increasing proportions of censoring, while Harrell's c-index and to a lesser extent Uno's concordance measure increased unfavorably. Variance estimates of *mbc* and *c-mbc* were well in agreement with the simulated empirical variances.

We conclude that the *mbc* is an attractive closed-form measure that allows for a straightforward quantification of the expected change in a model's discriminative ability due to case-mix heterogeneity. The *c-mbc* also reflects regression coefficient validity and is a censoring-robust alternative for the c-index when the proportional hazards assumption holds. Copyright © 2016 John Wiley & Sons, Ltd.

The case-control study is a common design for assessing the association between genetic exposures and a disease phenotype. Though association with a given (case-control) phenotype is always of primary interest, there is often considerable interest in assessing relationships between genetic exposures and other (secondary) phenotypes. However, the case-control sample represents a biased sample from the general population. As a result, if this sampling framework is not correctly taken into account, analyses estimating the effect of exposures on secondary phenotypes can be biased leading to incorrect inference. In this paper, we address this problem and propose a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype. Our approach is based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype. We show that, though slightly less efficient than a full likelihood-based analysis when the likelihood is correctly specified, it is substantially more robust to model misspecification, and can out-perform likelihood-based analysis, both in terms of validity and power, when the model is misspecified. We illustrate our approach with an application to a case-control study extracted from the Framingham Heart Study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Measures of explained variation are useful in scientific research, as they quantify the amount of variation in an outcome variable of interest that is explained by one or more other variables. We develop such measures for correlated survival data, under the proportional hazards mixed-effects model. Because different approaches have been studied in the literature outside the classical linear regression model, we investigate three measures *R*^{2},
, and *ρ*^{2} that quantify three different population coefficients. We show that although the three population measures are not the same, they reflect similar amounts of variation explained by the predictors. Among the three measures, we show that *R*^{2}, which is the simplest to compute, is also consistent for the first population measure under the usual asymptotic scenario when the number of clusters tends to infinity. The other two measures, on the other hand, all require that in addition the cluster sizes be large. We study the properties of the measures both analytically and through simulation studies. We illustrate their different usage on a multi-center clinical trial and a recurrent events data set. Copyright © 2016 John Wiley & Sons, Ltd.

Recurrent event data are quite common in biomedical and epidemiological studies. A significant portion of these data also contain additional longitudinal information on surrogate markers. Previous studies have shown that popular methods using a Cox model with longitudinal outcomes as time-dependent covariates may lead to biased results, especially when longitudinal outcomes are measured with error. Hence, it is important to incorporate longitudinal information into the analysis properly. To achieve this, we model the correlation between longitudinal and recurrent event processes using latent random effect terms. We then propose a two-stage conditional estimating equation approach to model the rate function of recurrent event process conditioned on the observed longitudinal information. The performance of our proposed approach is evaluated through simulation. We also apply the approach to analyze cocaine addiction data collected by the University of Connecticut Health Center. The data include recurrent event information on cocaine relapse and longitudinal cocaine craving scores. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this paper, we propose a Bayesian method to address misclassification errors in both independent and dependent variables. Our work is motivated by a study of women who have experienced new breast cancers on two separate occasions. We call both cancers *primary*, because the second is usually not considered as the result of a metastasis spreading from the first. Hormone receptors (HRs) are important in breast cancer biology, and it is well recognized that the measurement of HR status is subject to errors. This discordance in HR status for two primary breast cancers is of concern and might be an important reason for treatment failure. To sort out the information on *true* concordance rate from the observed concordance rate, we consider a logistic regression model for the association between the HR status of the two cancers and introduce the misclassification parameters (i.e., sensitivity and specificity) accounting for the misclassification in HR status. The prior distribution for sensitivity and specificity is based on how HR status is actually assessed in laboratory procedures. To account for the nonlinear effect of one error-free covariate, we introduce the *B*-spline terms in the logistic regression model. Our findings indicate that the true concordance rate of HR status between two primary cancers is greater than the observed value. Copyright © 2016 John Wiley & Sons, Ltd.

We consider a class of semiparametric marginal rate models for analyzing recurrent event data. In these models, both time-varying and time-free effects are present, and the estimation of time-varying effects may result in non-smooth regression functions. A typical approach for avoiding this problem and producing smooth functions is based on kernel methods. The traditional kernel-based approach, however, assumes a common degree of smoothness for all time-varying regression functions, which may result in suboptimal estimators if the functions have different levels of smoothness. In this paper, we extend the traditional approach by introducing different bandwidths for different regression functions. First, we establish the asymptotic properties of the suggested estimators. Next, we demonstrate the superiority of our proposed method using two finite-sample simulation studies. Finally, we illustrate our methodology by analyzing a real-world heart disease dataset. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Natural direct and indirect effects decompose the effect of a treatment into the part that is mediated by a covariate (the mediator) and the part that is not. Their definitions rely on the concept of outcomes under treatment with the mediator ‘set’ to its value without treatment. Typically, the mechanism through which the mediator is set to this value is left unspecified, and in many applications, it may be challenging to fix the mediator to particular values for each unit or patient. Moreover, how one sets the mediator may affect the distribution of the outcome. This article introduces ‘organic’ direct and indirect effects, which can be defined and estimated without relying on setting the mediator to specific values. Organic direct and indirect effects can be applied, for example, to estimate how much of the effect of some treatments for HIV/AIDS on mother-to-child transmission of HIV infection is mediated by the effect of the treatment on the HIV viral load in the blood of the mother. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A key objective of Phase II dose finding studies in clinical drug development is to adequately characterize the dose response relationship of a new drug. An important decision is then on the choice of a suitable dose response function to support dose selection for the subsequent Phase III studies. In this paper, we compare different approaches for model selection and model averaging using mathematical properties as well as simulations. We review and illustrate asymptotic properties of model selection criteria and investigate their behavior when changing the sample size but keeping the effect size constant. In a simulation study, we investigate how the various approaches perform in realistically chosen settings. Finally, the different methods are illustrated with a recently conducted Phase II dose finding study in patients with chronic obstructive pulmonary disease. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Graphical approaches to multiple testing procedures are very flexible and easy to communicate with non-statisticians. The availability of the R package gMCP further propelled the application of graphical approaches in randomized clinical trials. Bretz *et al.* (*Biometrical Journal* 2011; 53:894–913) introduced a class of nonparametric testing procedures based on a Bonferroni mixture of weighted Simes tests for intersection hypotheses. Such approaches are extremely useful when the conditions for the Simes test are known to hold for hypotheses within certain subsets but may not hold for hypotheses across subsets. We describe the calculation of adjusted *p*-values for such approaches, which is currently not available in the gMCP package. We also optimize the generation of the weights for each intersection hypothesis in the closure of a graph-based multiple testing procedure, which can dramatically reduce the computing time for simulation-based power calculations. We show the validity of the Simes test for comparing several treatments with a control, performing noninferiority and superiority tests, or testing the treatment effect in an overall and a subpopulation for the normal, binary, count, and time-to-event data. The proposed method is illustrated using an example for designing a confirmatory clinical trial. Copyright © 2016 John Wiley & Sons, Ltd.

The Health and Retirement Study was designed to evaluate changes in health and labor force participation during and after the transition from working to retirement. Every 2 years, participants provided information about their self-rated health (SRH), body mass index (BMI), smoking status, and other characteristics. Our goal was to assess the effects of smoking and gender on trajectories of change in BMI and SRH over time. Joint longitudinal analysis of outcome measures is preferable to separate analyses because it allows to account for the correlation between the measures, to test the effects of predictors while controlling type I error, and potentially to improve efficiency. However, because SRH is an ordinal measure while BMI is continuous, formulating a joint model and parameter estimation is challenging. A joint correlated probit model allowed us to seamlessly account for the correlations between the measures over time. Established estimating procedures for such models are based on quasi-likelihood or numerical approximations that may be biased or fail to converge. Therefore, we proposed a novel expectation–maximization algorithm for parameter estimation and a Monte Carlo bootstrap approach for standard errors approximation. Expectation–maximization algorithms have been previously considered for combinations of binary and/or continuous repeated measures; however, modifications were needed to handle combinations of ordinal and continuous responses. A simulation study demonstrated that the algorithm converged and provided approximately unbiased estimates with sufficiently large sample sizes. In the Health and Retirement Study, male gender and smoking were independently associated with steeper deterioration in self-rated health and with lower average BMI. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this paper, an approach to estimating the cumulative mean function for history process with time dependent covariates and right censored time-to-event variable is developed using the combined technique of joint modeling and inverse probability weighting method. The consistency of proposed estimator is derived. Theoretical analysis and simulation studies indicate that the estimator given in this paper is quite recommendable to practical applications because of its simplicity and accuracy. A real data set from a multicenter automatic defibrillator implantation trial is used to illustrate the proposed methodology. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Recent advances in human neuroimaging have shown that it is possible to accurately decode how the brain perceives information based only on non-invasive functional magnetic resonance imaging measurements of brain activity. Two commonly used statistical approaches, namely, univariate analysis and multivariate pattern analysis often lead to distinct patterns of selected voxels. One current debate in brain decoding concerns whether the brain's representation of sound categories is localized or distributed. We hypothesize that the distributed pattern of voxels selected by most multivariate pattern analysis models can be an artifact due to the spatial correlation among voxels. Here, we propose a Bayesian spatially varying coefficient model, where the spatial correlation is modeled through the variance-covariance matrix of the model coefficients. Combined with a proposed region selection strategy, we demonstrate that our approach is effective in identifying the truly localized patterns of the voxels while maintaining robustness to discover truly distributed pattern. In addition, we show that localized or clustered patterns can be artificially identified as distributed if without proper usage of the spatial correlation information in fMRI data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Predicting the occurrence of an adverse event over time is an important issue in clinical medicine. Clinical prediction models and associated points-based risk-scoring systems are popular statistical methods for summarizing the relationship between a multivariable set of patient risk factors and the risk of the occurrence of an adverse event. Points-based risk-scoring systems are popular amongst physicians as they permit a rapid assessment of patient risk without the use of computers or other electronic devices. The use of such points-based risk-scoring systems facilitates evidence-based clinical decision making. There is a growing interest in cause-specific mortality and in non-fatal outcomes. However, when considering these types of outcomes, one must account for competing risks whose occurrence precludes the occurrence of the event of interest. We describe how points-based risk-scoring systems can be developed in the presence of competing events. We illustrate the application of these methods by developing risk-scoring systems for predicting cardiovascular mortality in patients hospitalized with acute myocardial infarction. Code in the R statistical programming language is provided for the implementation of the described methods. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

]]>Continuous predictors are routinely encountered when developing a prognostic model. Investigators, who are often non-statisticians, must decide how to handle continuous predictors in their models. Categorising continuous measurements into two or more categories has been widely discredited, yet is still frequently done because of its simplicity, investigator ignorance of the potential impact and of suitable alternatives, or to facilitate model uptake. We examine three broad approaches for handling continuous predictors on the performance of a prognostic model, including various methods of categorising predictors, modelling a linear relationship between the predictor and outcome and modelling a nonlinear relationship using fractional polynomials or restricted cubic splines. We compare the performance (measured by the *c*-index, calibration and net benefit) of prognostic models built using each approach, evaluating them using separate data from that used to build them. We show that categorising continuous predictors produces models with poor predictive performance and poor clinical usefulness. Categorising continuous predictors is unnecessary, biologically implausible and inefficient and should not be used in prognostic model development. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Several probability-based measures are introduced in order to assess the cost-effectiveness of a treatment. The basic measure consists of the probability that one treatment is less costly and more effective compared with another. Several variants of this measure are suggested as flexible options for cost-effectiveness analysis. The proposed measures are invariant under monotone transformations of the cost and effectiveness measures. Interval estimation of the proposed measures are investigated under a parametric model, assuming bivariate normality, and also non-parametrically. The delta method and a generalized pivotal quantity approach are both investigated under the bivariate normal model. A non-parametric U-statistics-based approach is also investigated for computing confidence intervals. Numerical results show that under bivariate normality, the solution based on generalized pivotal quantities exhibits accurate performance in terms of maintaining the coverage probability of the confidence interval. The non-parametric U-statistics-based solution is accurate for sample sizes that are at least moderately large. The results are illustrated using data from a clinical trial for prostate cancer therapy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Sequentially administered, laboratory-based diagnostic tests or self-reported questionnaires are often used to determine the occurrence of a silent event. In this paper, we consider issues relevant in design of studies aimed at estimating the association of one or more covariates with a non-recurring, time-to-event outcome that is observed using a repeatedly administered, error-prone diagnostic procedure. The problem is motivated by the Women's Health Initiative, in which diabetes incidence among the approximately 160,000 women is obtained from annually collected self-reported data. For settings of imperfect diagnostic tests or self-reports with known sensitivity and specificity, we evaluate the effects of various factors on resulting power and sample size calculations and compare the relative efficiency of different study designs. The methods illustrated in this paper are readily implemented using our freely available R software package *icensmis*, which is available at the Comprehensive R Archive Network website. An important special case is that when diagnostic procedures are perfect, they result in interval-censored, time-to-event outcomes. The proposed methods are applicable for the design of studies in which a time-to-event outcome is interval censored. Copyright © 2016 John Wiley & Sons, Ltd.

A general utility-based testing methodology for design and conduct of randomized comparative clinical trials with categorical outcomes is presented. Numerical utilities of all elementary events are elicited to quantify their desirabilities. These numerical values are used to map the categorical outcome probability vector of each treatment to a mean utility, which is used as a one-dimensional criterion for constructing comparative tests. Bayesian tests are presented, including fixed sample and group sequential procedures, assuming Dirichlet-multinomial models for the priors and likelihoods. Guidelines are provided for establishing priors, eliciting utilities, and specifying hypotheses. Efficient posterior computation is discussed, and algorithms are provided for jointly calibrating test cutoffs and sample size to control overall type I error and achieve specified power. Asymptotic approximations for the power curve are used to initialize the algorithms. The methodology is applied to re-design a completed trial that compared two chemotherapy regimens for chronic lymphocytic leukemia, in which an ordinal efficacy outcome was dichotomized, and toxicity was ignored to construct the trial's design. The Bayesian tests also are illustrated by several types of categorical outcomes arising in common clinical settings. Freely available computer software for implementation is provided. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the ‘cost’ of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

The paradigm of oncology drug development is expanding from developing cytotoxic agents to developing biological or molecularly targeted agents (MTAs). Although it is common for the efficacy and toxicity of cytotoxic agents to increase monotonically with dose escalation, the efficacy of some MTAs may exhibit non-monotonic patterns in their dose–efficacy relationships. Many adaptive dose-finding approaches in the available literature account for the non-monotonic dose–efficacy behavior by including additional model parameters. In this study, we propose a novel adaptive dose-finding approach based on binary efficacy and toxicity outcomes in phase I trials for monotherapy using an MTA. We develop a dose–efficacy model, the parameters of which are allowed to change in the vicinity of the change point of the dose level, in order to consider the non-monotonic pattern of the dose–efficacy relationship. The change point is obtained as the dose that maximizes the log-likelihood of the assumed dose–efficacy and dose-toxicity models. The dose-finding algorithm is based on the weighted Mahalanobis distance, calculated using the posterior probabilities of efficacy and toxicity outcomes. We compare the operating characteristics between the proposed and existing methods and examine the sensitivity of the proposed method by simulation studies under various scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Assessing the magnitude of heterogeneity in a meta-analysis is important for determining the appropriateness of combining results. The most popular measure of heterogeneity, *I*^{2}, was derived under an assumption of homogeneity of the within-study variances, which is almost never true, and the alternative estimator,
, uses the harmonic mean to estimate the average of the within-study variances, which may also lead to bias. This paper thus presents a new measure for quantifying the extent to which the variance of the pooled random-effects estimator is due to between-studies variation,
, that overcomes the limitations of the previous approach. We show that this measure estimates the expected value of the proportion of total variance due to between-studies variation and we present its point and interval estimators. The performance of all three heterogeneity measures is evaluated in an extensive simulation study. A negative bias for
was observed when the number of studies was very small and became negligible as the number of studies increased, while
and *I*^{2} showed a tendency to overestimate the impact of heterogeneity. The coverage of confidence intervals based upon
was good across different simulation scenarios but was substantially lower for
and *I*^{2}, especially for high values of heterogeneity and when a large number of studies were included in the meta-analysis. The proposed measure is implemented in a user-friendly function available for routine use in r and sas.
will be useful in quantifying the magnitude of heterogeneity in meta-analysis and should supplement the *p*-value for the test of heterogeneity obtained from the *Q* test. Copyright © 2016 John Wiley & Sons, Ltd.

When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Marginal structural Cox models are used for quantifying marginal treatment effects on outcome event hazard function. Such models are estimated using inverse probability of treatment and censoring (IPTC) weighting, which properly accounts for the impact of time-dependent confounders, avoiding conditioning on factors on the causal pathway. To estimate the IPTC weights, the treatment assignment mechanism is conventionally modeled in discrete time. While this is natural in situations where treatment information is recorded at scheduled follow-up visits, in other contexts, the events specifying the treatment history can be modeled in continuous time using the tools of event history analysis. This is particularly the case for treatment procedures, such as surgeries. In this paper, we propose a novel approach for flexible parametric estimation of continuous-time IPTC weights and illustrate it in assessing the relationship between metastasectomy and mortality in metastatic renal cell carcinoma patients. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Parametric mixed-effects models are useful in longitudinal data analysis when the sampling frequencies of a response variable and the associated covariates are the same. We propose a three-step estimation procedure using local polynomial smoothing and demonstrate with data where the variables to be assessed are repeatedly sampled with different frequencies within the same time frame. We first insert pseudo data for the less frequently sampled variable based on the observed measurements to create a new dataset. Then standard simple linear regressions are fitted at each time point to obtain raw estimates of the association between dependent and independent variables. Last, local polynomial smoothing is applied to smooth the raw estimates. Rather than use a kernel function to assign weights, only analytical weights that reflect the importance of each raw estimate are used. The standard errors of the raw estimates and the distance between the pseudo data and the observed data are considered as the measure of the importance of the raw estimates. We applied the proposed method to a weight loss clinical trial, and it efficiently estimated the correlation between the inconsistently sampled longitudinal data. Our approach was also evaluated via simulations. The results showed that the proposed method works better when the residual variances of the standard linear regressions are small and the within-subjects correlations are high. Also, using analytic weights instead of kernel function during local polynomial smoothing is important when raw estimates have extreme values, or the association between the dependent and independent variable is nonlinear. Copyright © 2016 John Wiley & Sons, Ltd.

Recent success of immunotherapy and other targeted therapies in cancer treatment has led to an unprecedented surge in the number of novel therapeutic agents that need to be evaluated in clinical trials. Traditional phase II clinical trial designs were developed for evaluating one candidate treatment at a time and thus not efficient for this task. We propose a Bayesian phase II platform design, the multi-candidate iterative design with adaptive selection (MIDAS), which allows investigators to continuously screen a large number of candidate agents in an efficient and seamless fashion. MIDAS consists of one control arm, which contains a standard therapy as the control, and several experimental arms, which contain the experimental agents. Patients are adaptively randomized to the control and experimental agents based on their estimated efficacy. During the trial, we adaptively drop inefficacious or overly toxic agents and ‘graduate’ the promising agents from the trial to the next stage of development. Whenever an experimental agent graduates or is dropped, the corresponding arm opens immediately for testing the next available new agent. Simulation studies show that MIDAS substantially outperforms the conventional approach. The proposed design yields a significantly higher probability for identifying the promising agents and dropping the futile agents. In addition, MIDAS requires only one master protocol, which streamlines trial conduct and substantially decreases the overhead burden. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When there are four or more treatments under comparison, the use of a crossover design with a complete set of treatment-receipt sequences in binary data is of limited use because of too many treatment-receipt sequences. Thus, we may consider use of a 4 × 4 Latin square to reduce the number of treatment-receipt sequences when comparing three experimental treatments with a control treatment. Under a distribution-free random effects logistic regression model, we develop simple procedures for testing non-equality between any of the three experimental treatments and the control treatment in a crossover trial with dichotomous responses. We further derive interval estimators in closed forms for the relative effect between treatments. To evaluate the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. We use the data taken from a crossover trial using a 4 × 4 Latin-square design for studying four-treatments to illustrate the use of test procedures and interval estimators developed here. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Seamless phase II/III clinical trials offer an efficient way to select an experimental treatment and perform confirmatory analysis within a single trial. However, combining the data from both stages in the final analysis can induce bias into the estimates of treatment effects. Methods for bias adjustment developed thus far have made restrictive assumptions about the design and selection rules followed. In order to address these shortcomings, we apply recent methodological advances to derive the uniformly minimum variance conditionally unbiased estimator for two-stage seamless phase II/III trials. Our framework allows for the precision of the treatment arm estimates to take arbitrary values, can be utilised for all treatments that are taken forward to phase III and is applicable when the decision to select or drop treatment arms is driven by a multiplicity-adjusted hypothesis testing procedure. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

]]>We consider the non-inferiority (or equivalence) test of the odds ratio (OR) in a crossover study with binary outcomes to evaluate the treatment effects of two drugs. To solve this problem, Lui and Chang (2011) proposed both an asymptotic method and a conditional method based on a random effects logit model. Kenward and Jones (1987) proposed a likelihood ratio test (*L**R**T*_{M}) based on a log linear model. These existing methods are all subject to model misspecification. In this paper, we propose a likelihood ratio test (*LRT*) and a score test that are independent of model specification. Monte Carlo simulation studies show that, in scenarios considered in this paper, both the *LRT* and the score test have higher power than the asymptotic and conditional methods for the non-inferiority test; the *LRT*, score, and asymptotic methods have similar power, and they all have higher power than the conditional method for the equivalence test. When data can be well described by a log linear model, the *L**R**T*_{M} has the highest power among all the five methods (*L**R**T*_{M}, *LRT*, score, asymptotic, and conditional) for both non-inferiority and equivalence tests. However, in scenarios for which a log linear model does not describe the data well, the *L**R**T*_{M} has the lowest power for the non-inferiority test and has inflated type I error rates for the equivalence test. We provide an example from a clinical trial that illustrates our methods. Copyright © 2016 John Wiley & Sons, Ltd.

Joint modelling of longitudinal and survival data is increasingly used in clinical trials on cancer. In prostate cancer for example, these models permit to account for the link between longitudinal measures of prostate-specific antigen (PSA) and time of clinical recurrence when studying the risk of relapse. In practice, multiple types of relapse may occur successively. Distinguishing these transitions between health states would allow to evaluate, for example, how PSA trajectory and classical covariates impact the risk of dying after a distant recurrence post-radiotherapy, or to predict the risk of one specific type of clinical recurrence post-radiotherapy, from the PSA history. In this context, we present a joint model for a longitudinal process and a multi-state process, which is divided into two sub-models: a linear mixed sub-model for longitudinal data and a multi-state sub-model with proportional hazards for transition times, both linked by a function of shared random effects. Parameters of this joint multi-state model are estimated within the maximum likelihood framework using an EM algorithm coupled with a quasi-Newton algorithm in case of slow convergence. It is implemented under R, by combining and extending mstate and JM packages. The estimation program is validated by simulations and applied on pooled data from two cohorts of men with localized prostate cancer. Thanks to the classical covariates available at baseline and the repeated PSA measurements, we are able to assess the biomarker's trajectory, define the risks of transitions between health states and quantify the impact of the PSA dynamics on each transition intensity. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Meta-analysis of individual participant data (IPD) is increasingly utilised to improve the estimation of treatment effects, particularly among different participant subgroups. An important concern in IPD meta-analysis relates to partially or completely missing outcomes for some studies, a problem exacerbated when interest is on multiple discrete and continuous outcomes. When leveraging information from incomplete correlated outcomes across studies, the fully observed outcomes may provide important information about the incompleteness of the other outcomes. In this paper, we compare two models for handling incomplete continuous and binary outcomes in IPD meta-analysis: a joint hierarchical model and a sequence of full conditional mixed models. We illustrate how these approaches incorporate the correlation across the multiple outcomes and the between-study heterogeneity when addressing the missing data. Simulations characterise the performance of the methods across a range of scenarios which differ according to the proportion and type of missingness, strength of correlation between outcomes and the number of studies. The joint model provided confidence interval coverage consistently closer to nominal levels and lower mean squared error compared with the fully conditional approach across the scenarios considered. Methods are illustrated in a meta-analysis of randomised controlled trials comparing the effectiveness of implantable cardioverter-defibrillator devices alone to implantable cardioverter-defibrillator combined with cardiac resynchronisation therapy for treating patients with chronic heart failure. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

Generalized estimating equations (GEE) are often used for the marginal analysis of longitudinal data. Although much work has been performed to improve the validity of GEE for the analysis of data arising from small-sample studies, little attention has been given to power in such settings. Therefore, we propose a valid GEE approach to improve power in small-sample longitudinal study settings in which the temporal spacing of outcomes is the same for each subject. Specifically, we use a modified empirical sandwich covariance matrix estimator within correlation structure selection criteria and test statistics. Use of this estimator can improve the accuracy of selection criteria and increase the degrees of freedom to be used for inference. The resulting impacts on power are demonstrated via a simulation study and application example. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Adaptive, model-based, dose-finding methods, such as the continual reassessment method, have been shown to have good operating characteristics. One school of thought argues in favor of the use of parsimonious models, not modeling all aspects of the problem, and using a strict minimum number of parameters. In particular, for the standard situation of a single homogeneous group, it is common to appeal to a one-parameter model. Other authors argue for a more classical approach that models all aspects of the problem. Here, we show that increasing the dimension of the parameter space, in the context of adaptive dose-finding studies, is usually counter productive and, rather than leading to improvements in operating characteristics, the added dimensionality is likely to result in difficulties. Among these are inconsistency of parameter estimates, lack of coherence in escalation or de-escalation, erratic behavior, getting stuck at the wrong level, and, in almost all cases, poorer performance in terms of correct identification of the targeted dose. Our conclusions are based on both theoretical results and simulations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Testing protocols in large-scale sexually transmitted disease screening applications often involve pooling biospecimens (e.g., blood, urine, and swabs) to lower costs and to increase the number of individuals who can be tested. With the recent development of assays that detect multiple diseases, it is now common to test biospecimen pools for multiple infections simultaneously. Recent work has developed an expectation–maximization algorithm to estimate the prevalence of two infections using a two-stage, Dorfman-type testing algorithm motivated by current screening practices for chlamydia and gonorrhea in the USA. In this article, we have the same goal but instead take a more flexible Bayesian approach. Doing so allows us to incorporate information about assay uncertainty during the testing process, which involves testing both pools and individuals, and also to update information as individuals are tested. Overall, our approach provides reliable inference for disease probabilities and accurately estimates assay sensitivity and specificity even when little or no information is provided in the prior distributions. We illustrate the performance of our estimation methods using simulation and by applying them to chlamydia and gonorrhea data collected in Nebraska. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Progression-free survival is an increasingly popular end point in oncology clinical trials. A complete blinded independent central review (BICR) is often required by regulators in an attempt to reduce the bias in progression-free survival (PFS) assessment. In this paper, we propose a new methodology that uses a sample-based BICR as an audit tool to decide whether a complete BICR is needed. More specifically, we propose a new index, the differential risk, to measure the reading discordance pattern, and develop a corresponding hypothesis testing procedure to decide whether the bias in local evaluation is acceptable. Simulation results demonstrate that our new index is sensitive to the change of discordance pattern; type I error is well controlled in the hypothesis testing procedure, and the calculated sample size provides the desired power. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present a general coregionalization framework for developing coregionalized multivariate Gaussian conditional autoregressive (cMCAR) models for Bayesian analysis of multivariate lattice data in general and multivariate disease mapping data in particular. This framework is inclusive of cMCARs that facilitate flexible modelling of spatially structured symmetric or asymmetric cross-variable local interactions, allowing a wide range of separable or non-separable covariance structures, and symmetric or asymmetric cross-covariances, to be modelled. We present a brief overview of established univariate Gaussian conditional autoregressive (CAR) models for univariate lattice data and develop coregionalized multivariate extensions. Classes of cMCARs are presented by formulating precision structures. The resulting conditional properties of the multivariate spatial models are established, which cast new light on cMCARs with richly structured covariances and cross-covariances of different spatial ranges. The related methods are illustrated via an in-depth Bayesian analysis of a Minnesota county-level cancer data set. We also bring a new dimension to the traditional enterprize of Bayesian disease mapping: estimating and mapping covariances and cross-covariances of the underlying disease risks. Maps of covariances and cross-covariances bring to light spatial characterizations of the cMCARs and inform on spatial risk associations between areas and diseases. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In retrospective studies involving recurrent events, it is common to select individuals based on their event history up to the time of selection. In this case, the ascertained subjects might not be representative for the target population, and the analysis should take the selection mechanism into account. The purpose of this paper is two-fold. First, to study what happens when the data analysis is not adjusted for the selection and second, to propose a corrected analysis. Under the Andersen–Gill and shared frailty regression models, we show that the estimators of covariate effects, incidence, and frailty variance can be biased if the ascertainment is ignored, and we show that with a simple adjustment of the likelihood, unbiased and consistent estimators are obtained. The proposed method is assessed by a simulation study and is illustrated on a data set comprising recurrent pneumothoraces. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In cluster randomized trials, the study units usually are not a simple random sample from some clearly defined target population. Instead, the target population tends to be hypothetical or ill-defined, and the selection of study units tends to be systematic, driven by logistical and practical considerations. As a result, the population average treatment effect (PATE) may be neither well defined nor easily interpretable. In contrast, the sample average treatment effect (SATE) is the mean difference in the counterfactual outcomes for the study units. The sample parameter is easily interpretable and arguably the most relevant when the study units are not sampled from some specific super-population of interest. Furthermore, in most settings, the sample parameter will be estimated more efficiently than the population parameter. To the best of our knowledge, this is the first paper to propose using targeted maximum likelihood estimation (TMLE) for estimation and inference of the sample effect in trials with and without pair-matching. We study the asymptotic and finite sample properties of the TMLE for the sample effect and provide a conservative variance estimator. Finite sample simulations illustrate the potential gains in precision and power from selecting the sample effect as the target of inference. This work is motivated by the Sustainable East Africa Research in Community Health (SEARCH) study, a pair-matched, community randomized trial to estimate the effect of population-based HIV testing and streamlined ART on the 5-year cumulative HIV incidence (NCT01864603). The proposed methodology will be used in the primary analysis for the SEARCH trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

]]>We propose a class of randomized trial designs aimed at gaining the advantages of wider generalizability and faster recruitment while mitigating the risks of including a population for which there is greater a priori uncertainty. We focus on testing null hypotheses for the overall population and a predefined subpopulation. Our designs have preplanned rules for modifying enrollment criteria based on data accrued at interim analyses. For example, enrollment can be restricted if the participants from a predefined subpopulation are not benefiting from the new treatment. Our designs have the following features: the multiple testing procedure fully leverages the correlation among statistics for different populations; the asymptotic familywise Type I error rate is strongly controlled; for outcomes that are binary or normally distributed, the decision rule and multiple testing procedure are functions of the data only through minimal sufficient statistics. Our designs incorporate standard group sequential boundaries for each population of interest; this may be helpful in communicating the designs, because many clinical investigators are familiar with such boundaries, which can be summarized succinctly in a single table or graph. We demonstrate these designs through simulations of a Phase III trial of a new treatment for stroke. User-friendly, free software implementing these designs is described. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We have developed a method, called Meta-STEPP (subpopulation treatment effect pattern plot for meta-analysis), to explore treatment effect heterogeneity across covariate values in the meta-analysis setting for time-to-event data when the covariate of interest is continuous. Meta-STEPP forms overlapping subpopulations from individual patient data containing similar numbers of events with increasing covariate values, estimates subpopulation treatment effects using standard fixed-effects meta-analysis methodology, displays the estimated subpopulation treatment effect as a function of the covariate values, and provides a statistical test to detect possibly complex treatment-covariate interactions. Simulation studies show that this test has adequate type-I error rate recovery as well as power when reasonable window sizes are chosen. When applied to eight breast cancer trials, Meta-STEPP suggests that chemotherapy is less effective for tumors with high estrogen receptor expression compared with those with low expression. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Rate differences are an important effect measure in biostatistics and provide an alternative perspective to rate ratios. When the data are event counts observed during an exposure period, adjusted rate differences may be estimated using an identity-link Poisson generalised linear model, also known as additive Poisson regression. A problem with this approach is that the assumption of equality of mean and variance rarely holds in real data, which often show overdispersion. An additive negative binomial model is the natural alternative to account for this; however, standard model-fitting methods are often unable to cope with the constrained parameter space arising from the non-negativity restrictions of the additive model. In this paper, we propose a novel solution to this problem using a variant of the expectation–conditional maximisation–either algorithm. Our method provides a reliable way to fit an additive negative binomial regression model and also permits flexible generalisations using semi-parametric regression functions. We illustrate the method using a placebo-controlled clinical trial of fenofibrate treatment in patients with type II diabetes, where the outcome is the number of laser therapy courses administered to treat diabetic retinopathy. An R package is available that implements the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present a cancer phase I clinical trial design of a combination of two drugs with the goal of estimating the maximum tolerated dose curve in the two-dimensional Cartesian plane. A parametric model is used to describe the relationship between the doses of the two agents and the probability of dose limiting toxicity. The model is re-parameterized in terms of the probabilities of toxicities at dose combinations corresponding to the minimum and maximum doses available in the trial and the interaction parameter. Trial design proceeds using cohorts of two patients receiving doses according to univariate escalation with overdose control (EWOC), where at each stage of the trial, we seek a dose of one agent using the current posterior distribution of the MTD of this agent given the current dose of the other agent. The maximum tolerated dose curve is estimated as a function of Bayes estimates of the model parameters. Performance of the trial is studied by evaluating its design operating characteristics in terms of safety of the trial and percent of dose recommendation at dose combination neighborhoods around the true MTD curve and under model misspecifications for the true dose–toxicity relationship. The method is further extended to accommodate discrete dose combinations and compared with previous approaches under several scenarios. Copyright © 2016 John Wiley & Sons, Ltd.

In biomedical studies, it is often of interest to classify/predict a subject's disease status based on a variety of biomarker measurements. A commonly used classification criterion is based on area under the receiver operating characteristic curve (AUC). Many methods have been proposed to optimize approximated empirical AUC criteria, but there are two limitations to the existing methods. First, most methods are only designed to find the best linear combination of biomarkers, which may not perform well when there is strong nonlinearity in the data. Second, many existing linear combination methods use gradient-based algorithms to find the best marker combination, which often result in suboptimal local solutions. In this paper, we address these two problems by proposing a new kernel-based AUC optimization method called ramp AUC (RAUC). This method approximates the empirical AUC loss function with a ramp function and finds the best combination by a difference of convex functions algorithm. We show that as a linear combination method, RAUC leads to a consistent and asymptotically normal estimator of the linear marker combination when the data are generated from a semiparametric generalized linear model, just as the smoothed AUC method. Through simulation studies and real data examples, we demonstrate that RAUC outperforms smooth AUC in finding the best linear marker combinations, and can successfully capture nonlinear pattern in the data to achieve better classification performance. We illustrate our method with a dataset from a recent HIV vaccine trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>When efficacy of a treatment is measured by co-primary endpoints, efficacy is claimed only if for each endpoint an individual statistical test is significant at level *α*. While such a strategy controls the family-wise type I error rate (FWER), it is often strictly conservative and allows for no inference if not all null hypotheses can be rejected. In this paper, we investigate fallback tests, which are defined as uniform improvements of the classical test for co-primary endpoints. They reject whenever the classical test rejects but allow for inference also in settings where only a subset of endpoints show a significant effect. Similarly to the fallback tests for hierarchical testing procedures, these fallback tests for co-primary endpoints allow one to continue testing even if the primary objective of the trial was not met. We propose examples of fallback tests for two and three co-primary endpoints that control the FWER in the strong sense under the assumption of multivariate normal test statistics with arbitrary correlation matrix and investigate their power in a simulation study. The fallback procedures for co-primary endpoints are illustrated with a clinical trial in a rare disease and a diagnostic trial. © 2016 The Authors. *Statistics in Medicine* published by John Wiley & Sons Ltd.

Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least *r* null hypotheses are rejected among *m* that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the *q*-generalized family-wise error rate has been introduced to control the probability of making at least *q* false rejections. For procedures controlling this global type-I error rate, we define a type-II *r*-generalized family-wise error rate, which is directly related to the *r*-power defined as the probability of rejecting at least *r* false null hypotheses. We obtain very general power formulas that can be used to compute the sample size for single-step and step-wise procedures. These are implemented in our R package rPowerSampleSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute sample sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd.

The investigation of the treatment-covariate interaction is of considerable interest in the design and analysis of clinical trials. With potentially censored data observed, non-parametric and semi-parametric estimates and associated confidence intervals are proposed in this paper to quantify the interactions between the treatment and a binary covariate. In addition, comparison of interactions between the treatment and two covariates are also considered. The proposed approaches are evaluated and compared by Monte Carlo simulations and applied to a real data set from a cancer clinical trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We review and develop pointwise confidence intervals for a survival distribution with right-censored data for small samples, assuming only independence of censoring and survival. When there is no censoring, at each fixed time point, the problem reduces to making inferences about a binomial parameter. In this case, the recently developed beta product confidence procedure (BPCP) gives the standard exact central binomial confidence intervals of Clopper and Pearson. Additionally, the BPCP has been shown to be exact (gives guaranteed coverage at the nominal level) for progressive type II censoring and has been shown by simulation to be exact for general independent right censoring. In this paper, we modify the BPCP to create a ‘mid-p’ version, which reduces to the mid-p confidence interval for a binomial parameter when there is no censoring. We perform extensive simulations on both the standard and mid-p BPCP using a method of moments implementation that enforces monotonicity over time. All simulated scenarios suggest that the standard BPCP is exact. The mid-p BPCP, like other mid-p confidence intervals, has simulated coverage closer to the nominal level but may not be exact for all survival times, especially in very low censoring scenarios. In contrast, the two asymptotically-based approximations have lower than nominal coverage in many scenarios. This poor coverage is due to the extreme inflation of the lower error rates, although the upper limits are very conservative. Both the standard and the mid-p BPCP methods are available in our bpcp R package. Published 2016. This article is US Government work and is in the public domain in the USA.

]]>Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a Cochran–Armitage-type and a score-free global test that can be used to assess the presence of an association between a set of ordinally scaled covariates and an outcome variable within the range of generalized linear models. Both tests are developed within the framework of the well-established ‘global test’ methodology and as such are feasible in high-dimensional data situations under any correlation and enable adjustment for covariates. The Cochran–Armitage-type test, for which an intimate connection with the traditional score-based Cochran–Armitage test is shown, rests upon explicit assumptions on the distances between the covariates' ordered categories. The score-free test, in contrast, parametrizes these distances and thus keeps them flexible, rendering it ideally suited for covariates measured on an ordinal scale. As confirmed by means of simulations, the Cochran–Armitage-type test focuses its power on set-outcome relationships where the distances between the covariates' categories are equal or close to those assumed, whereas the score-free test spreads its power over a wide range of possible set-outcome relationships, putting more emphasis on monotonic than on non-monotonic ones. Based on the tests' power properties, it is discussed when to favour one or the other, and the practical merits of both of them are illustrated by an application in the field of rehabilitation medicine. Our proposed tests are implemented in the R package globaltest. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Epidemiologic studies suggest that maternal ambient air pollution exposure during critical periods of pregnancy is associated with adverse effects on fetal development. In this work, we introduce new methodology for identifying critical periods of development during post-conception gestational weeks 2–8 where elevated exposure to particulate matter less than 2.5 µm (PM_{2.5}) adversely impacts development of the heart. Past studies have focused on highly aggregated temporal levels of exposure during the pregnancy and have failed to account for anatomical similarities between the considered congenital heart defects. We introduce a multinomial probit model in the Bayesian setting that allows for joint identification of susceptible daily periods during pregnancy for 12 types of congenital heart defects with respect to maternal PM_{2.5} exposure. We apply the model to a dataset of mothers from the National Birth Defect Prevention Study where daily PM_{2.5} exposures from post-conception gestational weeks 2–8 are assigned using predictions from the downscaler pollution model. This approach is compared with two aggregated exposure models that define exposure as the average value over post-conception gestational weeks 2–8 and the average over individual weeks, respectively. Results suggest an association between increased PM_{2.5} exposure on post-conception gestational day 53 with the development of pulmonary valve stenosis and exposures during days 50 and 51 with tetralogy of Fallot. Significant associations are masked when using the aggregated exposure models. Simulation study results suggest that the findings are robust to multiple sources of error. The general form of the model allows for different exposures and health outcomes to be considered in future applications. Copyright © 2016 John Wiley & Sons, Ltd.

Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., *CYP3A5* and *IKBKB*) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd.

Survival bias is difficult to detect and adjust for in case–control genetic association studies but can invalidate findings when only surviving cases are studied and survival is associated with the genetic variants under study. Here, we propose a design where one genotypes genetically informative family members (such as offspring, parents, and spouses) of deceased cases and incorporates that surrogate genetic information into a retrospective maximum likelihood analysis. We show that inclusion of genotype data from first-degree relatives permits unbiased estimation of genotype association parameters. We derive closed-form maximum likelihood estimates for association parameters under the widely used log-additive and dominant association models. Our proposed design not only permits a valid analysis but also enhances statistical power by augmenting the sample with indirectly studied individuals. Gene variants associated with poor prognosis can also be identified under this design. We provide simulation results to assess performance of the methods. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The wide availability of multi-dimensional genomic data has spurred increasing interests in integrating multi-platform genomic data. Integrative analysis of cancer genome landscape can potentially lead to deeper understanding of the biological process of cancer. We integrate epigenetics (DNA methylation and microRNA expression) and gene expression data in tumor genome to delineate the association between different aspects of the biological processes and brain tumor survival. To model the association, we employ a flexible semiparametric linear transformation model that incorporates both the main effects of these genomic measures as well as the possible interactions among them. We develop variance component tests to examine different coordinated effects by testing various subsets of model coefficients for the genomic markers. A Monte Carlo perturbation procedure is constructed to approximate the null distribution of the proposed test statistics. We further propose omnibus testing procedures to synthesize information from fitting various parsimonious sub-models to improve power. Simulation results suggest that our proposed testing procedures maintain proper size under the null and outperform standard score tests. We further illustrate the utility of our procedure in two genomic analyses for survival of glioblastoma multiforme patients. Copyright © 2016 John Wiley & Sons, Ltd.

]]>