Outliers, measurement error, and missing data are commonly seen in longitudinal data because of its data collection process. However, no method can address all three of these issues simultaneously. This paper focuses on the robust estimation of partially linear models for longitudinal data with dropouts and measurement error. A new robust estimating equation, simultaneously tackling outliers, measurement error, and missingness, is proposed. The asymptotic properties of the proposed estimator are established under some regularity conditions. The proposed method is easy to implement in practice by utilizing the existing standard generalized estimating equations algorithms. The comprehensive simulation studies show the strength of the proposed method in dealing with longitudinal data with all three features. Finally, the proposed method is applied to data from the Lifestyle Education for Activity and Nutrition study and confirms the effectiveness of the intervention in producing weight loss at month 9. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Odds ratio, risk ratio, and prevalence ratio are some of the measures of association which are often reported in research studies quantifying the relationship between an independent variable and the outcome of interest. There has been much debate on the issue of which measure is appropriate to report depending on the study design. However, the literature on selecting a particular category of the outcome to be modeled and/or change in reference group for categorical independent variables and the effect on statistical significance, although known, is scantly discussed nor published with examples. In this article, we provide an example of a cross-sectional study wherein prevalence ratio was chosen over (Prevalence) odds ratio and demonstrate the analytic implications of the choice of category to be modeled and choice of reference level for independent variables. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Glaucoma is the second leading cause of blindness in the USA. A visual field test (perimetry) is used to sample and quantitate visual field function in preselected regions in the eye. These regions can be considered a spatial field with replications across independently measured individuals. At return visits, a new set of visual field measurements is obtained producing a subject specific spatio-temporal dataset. We develop a Bayesian hierarchical modeling framework to analyze these spatio-temporal datasets both for individual level spread and as aggregate population level trends. Our model extends previous research utilizing a dimension reduction matrix and individual specific latent variables. Human characteristics are incorporated into the model to help explain glaucoma progression. One beneficial product of our model is smoothed estimates for individuals. We also specify how progression rates are computed for monitoring purposes so that clinicians can track changes and predict forward in time. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We describe a mathematical decision model for identifying dynamic health policies for controlling epidemics. These dynamic policies aim to select the best current intervention based on accumulating epidemic data and the availability of resources at each decision point. We propose an algorithm to approximate dynamic policies that optimize the population's net health benefit, a performance measure which accounts for both health and monetary outcomes. We further illustrate how dynamic policies can be defined and optimized for the control of a novel viral pathogen, where a policy maker must decide (i) when to employ or lift a transmission-reducing intervention (e.g. school closure) and (ii) how to prioritize population members for vaccination when a limited quantity of vaccines first become available. Within the context of this application, we demonstrate that dynamic policies can produce higher net health benefit than more commonly described static policies that specify a pre-determined sequence of interventions to employ throughout epidemics. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the ‘*Average Predicted Value*’ method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss–Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd.

This paper conducts a Monte Carlo simulation study to evaluate the performance of multivariate matching methods that select a subset of treatment and control observations. The matching methods studied are the widely used nearest neighbor matching with propensity score calipers and the more recently proposed methods, optimal matching of an optimally chosen subset and optimal cardinality matching. The main findings are: (i) covariate balance, as measured by differences in means, variance ratios, Kolmogorov–Smirnov distances, and cross-match test statistics, is better with cardinality matching because by construction it satisfies balance requirements; (ii) for given levels of covariate balance, the matched samples are larger with cardinality matching than with the other methods; (iii) in terms of covariate distances, optimal subset matching performs best; (iv) treatment effect estimates from cardinality matching have lower root-mean-square errors, provided strong requirements for balance, specifically, fine balance, or strength-*k* balance, plus close mean balance. In standard practice, a matched sample is considered to be balanced if the absolute differences in means of the covariates across treatment groups are smaller than 0.1 standard deviations. However, the simulation results suggest that stronger forms of balance should be pursued in order to remove systematic biases due to observed covariates when a difference in means treatment effect estimator is used. In particular, if the true outcome model is additive, then marginal distributions should be balanced, and if the true outcome model is additive with interactions, then low-dimensional joints should be balanced. Copyright © 2016 John Wiley & Sons, Ltd.

Diagnostic evaluation of suspected breast cancer due to abnormal screening mammography results is common, creates anxiety for women and is costly for the healthcare system. Timely evaluation with minimal use of additional diagnostic testing is key to minimizing anxiety and cost. In this paper, we propose a Bayesian semi-Markov model that allows for flexible, semi-parametric specification of the sojourn time distributions and apply our model to an investigation of the process of diagnostic evaluation with mammography, ultrasound and biopsy following an abnormal screening mammogram. We also investigate risk factors associated with the sojourn time between diagnostic tests. By utilizing semi-Markov processes, we expand on prior work that described the timing of the first test received by providing additional information such as the mean time to resolution and proportion of women with unresolved mammograms after 90 days for women requiring different sequences of tests in order to reach a definitive diagnosis. Overall, we found that older women were more likely to have unresolved positive mammograms after 90 days. Differences in the timing of imaging evaluation and biopsy were generally on the order of days and thus did not represent clinically important differences in diagnostic delay. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects ‘cured’, and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification. We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A new nonparametric approach is developed to estimate the time-dependent accuracy measure curves, which are defined on the cumulative cases and dynamic controls, for censored survival data. Based on an estimable survival process, the main intention of this study is to reduce the finite-sample biases of nearest neighbor estimators. The asymptotic variances of some retrospective accuracy measure estimators are further reduced by applying a smoothing technique to the underlying process of a marker. Meanwhile, practically feasible and theoretically valid procedures are proposed for bandwidth selection in the presented estimators. In addition, the proposed methodology can be reasonably extended to accommodate stratified survival data and survival data with multiple markers. As shown in the simulations, our new estimators outperform the nearest neighbor and inverse censoring weighted estimators. Data from the AIDS Clinical Trials Group study 175 and an angiographic coronary artery disease study are also used to illustrate the proposed methodology. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Defining the scientific questions of interest in a clinical trial is crucial to align its planning, design, conduct, analysis, and interpretation. However, practical experience shows that oftentimes specific choices in the statistical analysis blur the scientific question either in part or even completely, resulting in misalignment between trial objectives, conduct, analysis, and confusion in interpretation. The need for more clarity was highlighted by the Steering Committee of the International Council for Harmonization (ICH) in 2014, which endorsed a Concept Paper with the goal of developing a new regulatory guidance, suggested to be an addendum to ICH guideline E9. Triggered by these developments, we elaborate in this paper what the relevant questions in drug development are and how they fit with the current practice of intention-to-treat analyses. To this end, we consider the perspectives of patients, physicians, regulators, and payers. We argue that despite the different backgrounds and motivations of the various stakeholders, they all have similar interests in what the clinical trial estimands should be. Broadly, these can be classified into estimands addressing (a) lack of adherence to treatment due to different reasons and (b) efficacy and safety profiles when patients, in fact, are able to adhere to the treatment for its intended duration. We conclude that disentangling adherence to treatment and the efficacy and safety of treatment in patients that adhere leads to a transparent and clinical meaningful assessment of treatment risks and benefits. We touch upon statistical considerations and offer a discussion of additional implications. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Toxicity probability interval designs have received increasing attention as a dose-finding method in recent years. In this study, we compared the two-stage, likelihood-based continual reassessment method (CRM), modified toxicity probability interval (mTPI), and the Bayesian optimal interval design (BOIN) in order to evaluate each method's performance in dose selection for phase I trials. We use several summary measures to compare the performance of these methods, including percentage of correct selection (PCS) of the true maximum tolerable dose (MTD), allocation of patients to doses at and around the true MTD, and an accuracy index. This index is an efficiency measure that describes the entire distribution of MTD selection and patient allocation by taking into account the distance between the true probability of toxicity at each dose level and the target toxicity rate. The simulation study considered a broad range of toxicity curves and various sample sizes. When considering PCS, we found that CRM outperformed the two competing methods in most scenarios, followed by BOIN, then mTPI. We observed a similar trend when considering the accuracy index for dose allocation, where CRM most often outperformed both mTPI and BOIN. These trends were more pronounced with increasing number of dose levels. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In randomized trials, adjustment for measured covariates during the analysis can reduce variance and increase power. To avoid misleading inference, the analysis plan must be pre-specified. However, it is often unclear *a priori* which baseline covariates (if any) should be adjusted for in the analysis. Consider, for example, the Sustainable East Africa Research in Community Health (SEARCH) trial for HIV prevention and treatment. There are 16 matched pairs of communities and many potential adjustment variables, including region, HIV prevalence, male circumcision coverage, and measures of community-level viral load. In this paper, we propose a rigorous procedure to data-adaptively select the adjustment set, which maximizes the efficiency of the analysis. Specifically, we use cross-validation to select from a pre-specified library the candidate targeted maximum likelihood estimator (TMLE) that minimizes the estimated variance. For further gains in precision, we also propose a collaborative procedure for estimating the known exposure mechanism. Our small sample simulations demonstrate the promise of the methodology to maximize study power, while maintaining nominal confidence interval coverage. We show how our procedure can be tailored to the scientific question (intervention effect for the study sample vs. for the target population) and study design (pair-matched or not). Copyright © 2016 John Wiley & Sons, Ltd.

The net reclassification improvement (NRI) is an attractively simple summary measure quantifying improvement in performance because of addition of new risk marker(s) to a prediction model. Originally proposed for settings with well-established classification thresholds, it quickly extended into applications with no thresholds in common use. Here we aim to explore properties of the NRI at event rate. We express this NRI as a difference in performance measures for the new versus old model and show that the quantity underlying this difference is related to several global as well as decision analytic measures of model performance. It maximizes the relative utility (standardized net benefit) across all classification thresholds and can be viewed as the Kolmogorov–Smirnov distance between the distributions of risk among events and non-events. It can be expressed as a special case of the continuous NRI, measuring reclassification from the ‘null’ model with no predictors. It is also a criterion based on the value of information and quantifies the reduction in expected regret for a given regret function, casting the NRI at event rate as a measure of incremental reduction in expected regret. More generally, we find it informative to present plots of standardized net benefit/relative utility for the new versus old model across the domain of classification thresholds. Then, these plots can be summarized with their maximum values, and the increment in model performance can be described by the NRI at event rate. We provide theoretical examples and a clinical application on the evaluation of prognostic biomarkers for atrial fibrillation. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We examine two posterior predictive distribution based approaches to assess model fit for incomplete longitudinal data. The first approach assesses fit based on replicated complete data as advocated in Gelman *et al*. (2005). The second approach assesses fit based on replicated observed data. Differences between the two approaches are discussed and an analytic example is presented for illustration and understanding. Both checks are applied to data from a longitudinal clinical trial. The proposed checks can easily be implemented in standard software like (Win)BUGS/JAGS/Stan. Copyright © 2016 John Wiley & Sons, Ltd.

Propensity score methods, such as subclassification, are a common approach to control for confounding when estimating causal effects in non-randomized studies. Propensity score subclassification groups individuals into subclasses based on their propensity score values. Effect estimates are obtained within each subclass and then combined by weighting by the proportion of observations in each subclass. Combining subclass-specific estimates by weighting by the inverse variance is a promising alternative approach; a similar strategy is used in meta-analysis for its efficiency. We use simulation to compare performance of each of the two methods while varying (i) the number of subclasses, (ii) extent of propensity score overlap between the treatment and control groups (i.e., positivity), (iii) incorporation of survey weighting, and (iv) presence of heterogeneous treatment effects across subclasses. Both methods perform well in the absence of positivity violations and with a constant treatment effect with weighting by the inverse variance performing slightly better. Weighting by the proportion in subclass performs better in the presence of heterogeneous treatment effects across subclasses. We apply these methods to an illustrative example estimating the effect of living in a disadvantaged neighborhood on risk of past-year anxiety and depressive disorders among U.S. urban adolescents. This example entails practical positivity violations but no evidence of treatment effect heterogeneity. In this case, weighting by the inverse variance when combining across propensity score subclasses results in more efficient estimates that ultimately change inference. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Longitudinal data allow direct comparison of the change in patient outcomes associated with treatment or exposure. Frequently, several longitudinal measures are collected that either reflect a common underlying health status, or characterize processes that are influenced in a similar way by covariates such as exposure or demographic characteristics. Statistical methods that can combine multivariate response variables into common measures of covariate effects have been proposed in the literature. Current methods for characterizing the relationship between covariates and the rate of change in multivariate outcomes are limited to select models. For example, ‘accelerated time’ methods have been developed which assume that covariates rescale time in longitudinal models for disease progression. In this manuscript, we detail an alternative multivariate model formulation that directly structures longitudinal rates of change and that permits a common covariate effect across multiple outcomes. We detail maximum likelihood estimation for a multivariate longitudinal mixed model. We show via asymptotic calculations the potential gain in power that may be achieved with a common analysis of multiple outcomes. We apply the proposed methods to the analysis of a trivariate outcome for infant growth and compare rates of change for HIV infected and uninfected infants. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Potential predictive biomarkers are often measured on a continuous scale, but in practice, a threshold value to divide the patient population into biomarker ‘positive’ and ‘negative’ is desirable. Early phase clinical trials are increasingly using biomarkers for patient selection, but at this stage, it is likely that little will be known about the relationship between the biomarker and the treatment outcome. We describe a single-arm trial design with adaptive enrichment, which can increase power to demonstrate efficacy within a patient subpopulation, the parameters of which are also estimated. Our design enables us to learn about the biomarker and optimally adjust the threshold during the study, using a combination of generalised linear modelling and Bayesian prediction. At the final analysis, a binomial exact test is carried out, allowing the hypothesis that ‘no population subset exists in which the novel treatment has a desirable response rate’ to be tested. Through extensive simulations, we are able to show increased power over fixed threshold methods in many situations without increasing the type-I error rate. We also show that estimates of the threshold, which defines the population subset, are unbiased and often more precise than those from fixed threshold studies. We provide an example of the method applied (retrospectively) to publically available data from a study of the use of tamoxifen after mastectomy by the German Breast Study Group, where progesterone receptor is the biomarker of interest. © 2016 The Authors. *Statistics in Medicine* published by John Wiley & Sons Ltd.

Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins *et al.* proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd.

An important goal across the biomedical and social sciences is the quantification of the role of intermediate factors in explaining how an exposure exerts an effect on an outcome. Selection bias has the potential to severely undermine the validity of inferences on direct and indirect causal effects in observational as well as in randomized studies. The phenomenon of selection may arise through several mechanisms, and we here focus on instances of missing data. We study the sign and magnitude of selection bias in the estimates of direct and indirect effects when data on any of the factors involved in the analysis is either missing at random or not missing at random. Under some simplifying assumptions, the bias formulae can lead to nonparametric sensitivity analyses. These sensitivity analyses can be applied to causal effects on the risk difference and risk-ratio scales irrespectively of the estimation approach employed. To incorporate parametric assumptions, we also develop a sensitivity analysis for selection bias in mediation analysis in the spirit of the expectation–maximization algorithm. The approaches are applied to data from a health disparities study investigating the role of stage at diagnosis on racial disparities in colorectal cancer survival. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In longitudinal studies, it is sometimes of interest to estimate the distribution of the time a longitudinal process takes to traverse from one threshold to another. For example, the distribution of the time it takes a woman's cervical dilation to progress from 3 to 4 cm can aid the decision-making of obstetricians as to whether a stalled labor should be allowed to proceed or stopped in favor of other options. Often researchers treat this type of data structure as interval censored and employ traditional survival analysis methods. However, the traditional interval censoring approaches are inefficient in that they do not use all of the available data. In this paper, we propose utilizing a longitudinal threshold model to estimate the distribution of the elapsed time between two thresholds of the longitudinal process from repeated measurements. We extend this modeling framework to be used with multiple thresholds. A Wiener process under the first hitting time framework is used to represent survival distribution. We demonstrate our model through simulation studies and an analysis of data from the Consortium on Safe Labor study. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

]]>Resting-state functional magnetic resonance image is a useful technique for investigating brain functional connectivity at rest. In this work, we develop flexible regression models and methods for determining differences in resting-state functional connectivity as a function of age, gender, drug intervention, or neuropsychiatric disorders. We propose two complementary methods for identifying changes of edges and subgraphs. (i) For detecting changes of edges, we select the optimal model at each edge and then conduct contrast tests to identify the effects of the important variables while controlling the familywise error rate. (ii) We adopt the network-based statistics method to improve power by incorporating the graph topological structure. Both methods have wide applications for low signal-to-noise ratio data. We propose stability criteria for the choice of threshold in the network-based statistics procedure and utilize efficient massive parallel procedure to speed up the estimation and inference procedure. Results from our simulation studies show that the thresholds chosen by the proposed stability criteria outperform the Bonferroni threshold. To demonstrate applicability, we use both methods in the context of the Oxytocin and Aging Study to determine effects of age, gender, and drug treatment on resting-state functional connectivity, as well as in the context of the Autism Brain Imaging Data Exchange Study to determine effects of autism spectrum disorder on functional connectivity at rest. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This paper proposes a risk prediction model using semi-varying coefficient multinomial logistic regression. We use a penalized local likelihood method to do the model selection and estimate both functional and constant coefficients in the selected model. The model can be used to improve predictive modelling when non-linear interactions between predictors are present. We conduct a simulation study to assess our method's performance, and the results show that the model selection procedure works well with small average numbers of wrong-selection or missing-selection. We illustrate the use of our method by applying it to classify the patients with early rheumatoid arthritis at baseline into different risk groups in future disease progression. We use a leave-one-out cross-validation method to assess its correct prediction rate and propose a recalibration framework to evaluate how reliable are the predicted risks. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this work, we deal with correlated under-reported data through INAR(1)-hidden Markov chain models. These models are very flexible and can be identified through its autocorrelation function, which has a very simple form. A naïve method of parameter estimation is proposed, jointly with the maximum likelihood method based on a revised version of the forward algorithm. The most-probable unobserved time series is reconstructed by means of the Viterbi algorithm. Several examples of application in the field of public health are discussed illustrating the utility of the models. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Prognostic studies are widely conducted to examine whether biomarkers are associated with patient's prognoses and play important roles in medical decisions. Because findings from one prognostic study may be very limited, meta-analyses may be useful to obtain sound evidence. However, prognostic studies are often analyzed by relying on a study-specific cut-off value, which can lead to difficulty in applying the standard meta-analysis techniques. In this paper, we propose two methods to estimate a time-dependent version of the summary receiver operating characteristics curve for meta-analyses of prognostic studies with a right-censored time-to-event outcome. We introduce a bivariate normal model for the pair of time-dependent sensitivity and specificity and propose a method to form inferences based on summary statistics reported in published papers. This method provides a valid inference asymptotically. In addition, we consider a bivariate binomial model. To draw inferences from this bivariate binomial model, we introduce a multiple imputation method. The multiple imputation is found to be approximately proper multiple imputation, and thus the standard Rubin's variance formula is justified from a Bayesian view point. Our simulation study and application to a real dataset revealed that both methods work well with a moderate or large number of studies and the bivariate binomial model coupled with the multiple imputation outperforms the bivariate normal model with a small number of studies. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Heteroscedasticity is commonly encountered when fitting nonlinear regression models in practice. We discuss eight different variance estimation methods for nonlinear regression models with heterogeneous response variances, and present a simulation study to compare the performance of the eight methods in terms of estimating the standard errors of the fitted model parameters. The simulation study suggests that when the true variance is a function of the mean model, the power of the mean variance function estimation method and the transform-both-sides method are the best choices for estimating the standard errors of the estimated model parameters. In general, the wild bootstrap estimator and two modified versions of the standard sandwich variance estimator are reasonably accurate with relatively small bias, especially when the heterogeneity is nonsystematic across values of the covariate. Furthermore, we note that the two modified sandwich estimators are appealing choices in practice, considering the computational advantage of these two estimation methods relative to the variance function estimation method and the transform-both-sides approach. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The joint modeling of longitudinal and survival data has recently received much attention. Several extensions of the standard joint model that consists of one longitudinal and one survival outcome have been proposed including the use of different association structures between the longitudinal and the survival outcomes. However, in general, relatively little attention has been given to the selection of the most appropriate functional form to link the two outcomes.

In common practice, it is assumed that the underlying value of the longitudinal outcome is associated with the survival outcome. However, it could be that different characteristics of the patients' longitudinal profiles influence the hazard. For example, not only the current value but also the slope or the area under the curve of the longitudinal outcome. The choice of which functional form to use is an important decision that needs to be investigated because it could influence the results.

In this paper, we use a Bayesian shrinkage approach in order to determine the most appropriate functional forms. We propose a joint model that includes different association structures of different biomarkers and assume informative priors for the regression coefficients that correspond to the terms of the longitudinal process. Specifically, we assume Bayesian lasso, Bayesian ridge, Bayesian elastic net, and horseshoe. These methods are applied to a dataset consisting of patients with a chronic liver disease, where it is important to investigate which characteristics of the biomarkers have an influence on survival. Copyright © 2016 John Wiley & Sons, Ltd.

Infant skull deformation is analyzed using the distribution of head normal vector directions computed from a 3D image. Severity of flatness and asymmetry are quantified by functionals of the kernel estimate of the normal vector direction density. Using image data from 99 infants and clinical deformation ratings made by experts, our approach is compared with some recently suggested methods. The results show that the proposed method performs competitively. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Motivated by a study for soft tissue sarcoma, this article considers the analysis of diseases recurrence and survival. A multivariate frailty hazard model is established for joint modeling of three correlated time-to-event outcomes: local disease recurrence, distant disease recurrence (metastasis), and death. The goals are to find out (i) the effects of treatments on local and distant disease recurrences, and death, (ii) the effects of local and distant disease recurrences on death, and (iii) the correlation between local and distant recurrences. By our approach, all these three important questions, which are commonly asked in similar medical research studies, can be answered by a single model. We put the proposed joint frailty model in a Bayesian framework and use a hybrid Monte Carlo algorithm for the computation of posterior distributions. This hybrid algorithm relies on the evaluation of the gradient of target log density and a guided walk progress, and it combines these two strategies to suppress random walk behavior. A further distinction is that the hybrid algorithm can update all the components of a multivariate state vector simultaneously. Simulation studies are conducted to assess the proposed joint frailty model and the computation algorithm. The motivating soft tissue sarcoma data set is analyzed for illustration purpose. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this paper, we analyze a two-level latent variable model for longitudinal data from the National Growth and Health Study where surrogate outcomes or biomarkers and covariates are subject to missingness at any of the levels. A conventional method for efficient handling of missing data is to re-express the desired model as a joint distribution of variables, including the biomarkers, that are subject to missingness conditional on all of the covariates that are completely observed, and estimate the joint model by maximum likelihood, which is then transformed to the desired model. The joint model, however, identifies more parameters than desired, in general. We show that the over-identified joint model produces biased estimation of the latent variable model and describe how to impose constraints on the joint model so that it has a one-to-one correspondence with the desired model for unbiased estimation. The constrained joint model handles missing data efficiently under the assumption of ignorable missing data and is estimated by a modified application of the expectation-maximization algorithm. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The objective of diagnostic studies or prognostic studies is to evaluate and compare predictive capacities of biomarkers. Suppose we are interested in evaluation and comparison of predictive capacities of continuous biomarkers for a binary outcome based on research synthesis. In analysis of each study, subjects are often classified into two groups of the high-expression and low-expression groups according to a cut-off value, and statistical analysis is based on a 2 × 2 table defined by the response and the high expression or low expression of the biomarker. Because the cut-off is study specific, it is difficult to interpret a combined summary measure such as an odds ratio based on the standard meta-analysis techniques. The summary receiver operating characteristic curve is a useful method for meta-analysis of diagnostic studies in the presence of heterogeneity of cut-off values to examine discriminative capacities of biomarkers. We develop a method to estimate positive or negative predictive curves, which are alternative to the receiver operating characteristic curve based on information reported in published papers of each study. These predictive curves provide a useful graphical presentation of pairs of positive and negative predictive values and allow us to compare predictive capacities of biomarkers of different scales in the presence of heterogeneity in cut-off values among studies. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two-period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Prior research indicates that 10–15 cases or controls, whichever fewer, are required per parameter to reliably estimate regression coefficients in multivariable logistic regression models. This condition may be difficult to meet even in a well-designed study when the number of potential confounders is large, the outcome is rare, and/or interactions are of interest. Various propensity score approaches have been implemented when the exposure is binary. Recent work on shrinkage approaches like lasso were motivated by the critical need to develop methods for the *p* >> *n* situation, where *p* is the number of parameters and *n* is the sample size. Those methods, however, have been less frequently used when *p*≈*n*, and in this situation, there is no guidance on choosing among regular logistic regression models, propensity score methods, and shrinkage approaches. To fill this gap, we conducted extensive simulations mimicking our motivating clinical data, estimating vaccine effectiveness for preventing influenza hospitalizations in the 2011–2012 influenza season. Ridge regression and penalized logistic regression models that penalize all but the coefficient of the exposure may be considered in these types of studies. Copyright © 2016 John Wiley & Sons, Ltd.

The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Observational comparative effectiveness and safety studies are often subject to immortal person-time, a period of follow-up during which outcomes cannot occur because of the treatment definition. Common approaches, like excluding immortal time from the analysis or naïvely including immortal time in the analysis, are known to result in biased estimates of treatment effect. Other approaches, such as the Mantel–Byar and landmark methods, have been proposed to handle immortal time. Little is known about the performance of the landmark method in different scenarios. We conducted extensive Monte Carlo simulations to assess the performance of the landmark method compared with other methods in settings that reflect realistic scenarios. We considered four landmark times for the landmark method. We found that the Mantel–Byar method provided unbiased estimates in all scenarios, whereas the exclusion and naïve methods resulted in substantial bias when the hazard of the event was constant or decreased over time. The landmark method performed well in correcting immortal person-time bias in all scenarios when the treatment effect was small, and provided unbiased estimates when there was no treatment effect. The bias associated with the landmark method tended to be small when the treatment rate was higher in the early follow-up period than it was later. These findings were confirmed in a case study of chronic obstructive pulmonary disease. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A new testing approach is described for improving statistical tests of independence in sets of tables stratified on one or more relevant factors in case of categorical (nominal or ordinal) variables. Common tests of independence that exploit the ordinality of one of the variables use a restricted-alternative approach. A different, relaxed-null method is presented. Specifically, the M-moment score tests and the correlation tests are introduced. Using multinomial-Poisson homogeneous modeling theory, it is shown that these tests are computationally and conceptually simple, and simulation results suggest that they can perform better than other common tests of conditional independence. To illustrate, the proposed tests are used to better understand the human papillomavirus type-specific infection by exploring the intention to vaccinate. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We describe and evaluate a regression tree algorithm for finding subgroups with differential treatments effects in randomized trials with multivariate outcomes. The data may contain missing values in the outcomes and covariates, and the treatment variable is not limited to two levels. Simulation results show that the regression tree models have unbiased variable selection and the estimates of subgroup treatment effects are approximately unbiased. A bootstrap calibration technique is proposed for constructing confidence intervals for the treatment effects. The method is illustrated with data from a longitudinal study comparing two diabetes drugs and a mammography screening trial comparing two treatments and a control. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Multi-state models generalize survival or duration time analysis to the estimation of transition-specific hazard rate functions for multiple transitions. When each of the transition-specific risk functions is parametrized with several distinct covariate effect coefficients, this leads to a model of potentially high dimension. To decrease the parameter space dimensionality and to work out a clear image of the underlying multi-state model structure, one can either aim at setting some coefficients to zero or to make coefficients for the same covariate but two different transitions equal. The first issue can be approached by penalizing the absolute values of the covariate coefficients as in lasso regularization. If, instead, absolute differences between coefficients of the same covariate on different transitions are penalized, this leads to sparse competing risk relations within a multi-state model, that is, equality of covariate effect coefficients. In this paper, a new estimation approach providing sparse multi-state modelling by the aforementioned principles is established, based on the estimation of multi-state models and a simultaneous penalization of the L_{1}-norm of covariate coefficients and their differences in a structured way. The new multi-state modelling approach is illustrated on peritoneal dialysis study data and implemented in the R package penMSM. Copyright © 2016 John Wiley & Sons, Ltd.

The receiver operating characteristic (ROC) curve is the most popular statistical tool for evaluating the discriminatory capability of a given continuous biomarker. The need to compare two correlated ROC curves arises when individuals are measured with two biomarkers, which induces paired and thus correlated measurements. Many researchers have focused on comparing two correlated ROC curves in terms of the area under the curve (AUC), which summarizes the overall performance of the marker. However, particular values of specificity may be of interest. We focus on comparing two correlated ROC curves at a given specificity level. We propose parametric approaches, transformations to normality, and nonparametric kernel-based approaches. Our methods can be straightforwardly extended for inference in terms of *R**O**C*^{−1}(*t*). This is of particular interest for comparing the accuracy of two correlated biomarkers at a given sensitivity level. Extensions also involve inference for the AUC and accommodating covariates. We evaluate the robustness of our techniques through simulations, compare them with other known approaches, and present a real-data application involving prostate cancer screening. Copyright © 2016 John Wiley & Sons, Ltd.

We propose statistical definitions of the individual benefit of a medical or behavioral treatment and of the severity of a chronic illness. These definitions are used to develop a graphical method that can be used by statisticians and clinicians in the data analysis of clinical trials from the perspective of personalized medicine. The method focuses on assessing and comparing individual effects of treatments rather than average effects and can be used with continuous and discrete responses, including dichotomous and count responses. The method is based on new developments in generalized linear mixed-effects models, which are introduced in this article. To illustrate, analyses of data from the Sequenced Treatment Alternatives to Relieve Depression clinical trial of sequences of treatments for depression and data from a clinical trial of respiratory treatments are presented. The estimation of individual benefits is also explained. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The focus of this paper is dietary intervention trials. We explore the statistical issues involved when the response variable, intake of a food or nutrient, is based on self-report data that are subject to inherent measurement error. There has been little work on handling error in this context. A particular feature of self-reported dietary intake data is that the error may be differential by intervention group. Measurement error methods require information on the nature of the errors in the self-report data. We assume that there is a calibration sub-study in which unbiased biomarker data are available. We outline methods for handling measurement error in this setting and use theory and simulations to investigate how self-report and biomarker data may be combined to estimate the intervention effect. Methods are illustrated using data from the Trial of Nonpharmacologic Intervention in the Elderly, in which the intervention was a sodium-lowering diet and the response was sodium intake. Simulations are used to investigate the methods under differential error, differing reliability of self-reports relative to biomarkers and different proportions of individuals in the calibration sub-study. When the reliability of self-report measurements is comparable with that of the biomarker, it is advantageous to use the self-report data in addition to the biomarker to estimate the intervention effect. If, however, the reliability of the self-report data is low compared with that in the biomarker, then, there is little to be gained by using the self-report data. Our findings have important implications for the design of dietary intervention trials. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

]]>Understanding the impact of concurrency, defined as overlapping sexual partnerships, on the spread of HIV within various communities has been complicated by difficulties in measuring concurrency. Retrospective sexual history data consisting of first and last dates of sexual intercourse for each previous and ongoing partnership is often obtained through use of cross-sectional surveys. Previous attempts to empirically estimate the magnitude and extent of concurrency among these surveyed populations have inadequately accounted for the dependence between partnerships and used only a snapshot of the available data. We introduce a joint multistate and point process model in which states are defined as the number of ongoing partnerships an individual is engaged in at a given time. Sexual partnerships starting and ending on the same date are referred to as one-offs and modeled as discrete events. The proposed method treats each individual's continuation in and transition through various numbers of ongoing partnerships as a separate stochastic process and allows the occurrence of one-offs to impact subsequent rates of partnership formation and dissolution. Estimators for the concurrent partnership distribution and mean sojourn times during which a person has ** k** ongoing partnerships are presented. We demonstrate this modeling approach using epidemiological data collected from a sample of men having sex with men and seeking HIV testing at a Los Angeles clinic. Among this sample, the estimated point prevalence of concurrency was higher among men later diagnosed HIV positive. One-offs were associated with increased rates of subsequent partnership dissolution. Copyright © 2016 John Wiley & Sons, Ltd.

Epidermal nerve fibre (ENF) density and morphology are used to study small fibre involvement in diabetic, HIV, chemotherapy induced and other neuropathies. ENF density and summed length of ENFs per epidermal surface area are reduced, and ENFs may appear more clustered within the epidermis in subjects with small fibre neuropathy than in healthy subjects. Therefore, it is important to understand the spatial structure of ENFs. In this paper, we compare the ENF patterns between healthy subjects and subjects suffering from mild diabetic neuropathy. The study is based on suction skin blister specimens from the right foot of 32 healthy subjects and eight subjects with mild diabetic neuropathy. We regard the ENF entry point (location where the trunks of a nerve enters the epidermis) and ENF end point (termination of the nerve fibres) patterns as realizations of spatial point processes, and develop tools that can be used in the analysis and modelling of ENF patterns. We use spatial summary statistics and shift plots and define a new tool, reactive territory, to study the spatial patterns and to compare the patterns of the two groups. We will also introduce a simple model for these data in order to understand the growth process of the nerve fibres. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The stepped wedge design is a unique clinical trial design that allows for a sequential introduction of an intervention. However, the statistical analysis is unclear when this design is applied in survival data. The time-dependent introduction of the intervention in combination with terminal endpoints and interval censoring makes the analysis more complicated. In this paper, a time-on-study scale discrete survival model was constructed. Simulations were conducted primarily to study the performance of our model for different settings of the stepped wedge design. Secondary, we compared our approach to continuous Cox proportional hazard model. The results show that the discrete survival model estimates the intervention effects unbiasedly. If the length of the censoring interval is increased, the precision of the estimates is decreased. Without left truncation and late entry, the number of steps improves the precision of the estimates, whereas in combination of left truncation and late entry, the number of steps decreases the precision. Given the same number of participants and clusters, a parallel group design has higher precision than a stepped wedge design. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When studies in meta-analysis include different sets of confounders, simple analyses can cause a bias (omitting confounders that are missing in certain studies) or precision loss (omitting studies with incomplete confounders, i.e. a complete-case meta-analysis). To overcome these types of issues, a previous study proposed modelling the high correlation between partially and fully adjusted regression coefficient estimates in a bivariate meta-analysis. When multiple differently adjusted regression coefficient estimates are available, we propose exploiting such correlations in a graphical model. Compared with a previously suggested bivariate meta-analysis method, such a graphical model approach is likely to reduce the number of parameters in complex missing data settings by omitting the direct relationships between some of the estimates. We propose a structure-learning rule whose justification relies on the missingness pattern being monotone. This rule was tested using epidemiological data from a multi-centre survey. In the analysis of risk factors for early retirement, the method showed a smaller difference from a complete data odds ratio and greater precision than a commonly used complete-case meta-analysis. Three real-world applications with monotone missing patterns are provided, namely, the association between (1) the fibrinogen level and coronary heart disease, (2) the intima media thickness and vascular risk and (3) allergic asthma and depressive episodes. The proposed method allows for the inclusion of published summary data, which makes it particularly suitable for applications involving both microdata and summary data. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a flexible cure rate model that accommodates different censoring distributions for the cured and uncured groups and also allows for some individuals to be observed as cured when their survival time exceeds a known threshold. We model the survival times for the uncured group using an accelerated failure time model with errors distributed according to the seminonparametric distribution, potentially truncated at a known threshold. We suggest a straightforward extension of the usual expectation–maximization algorithm approach for obtaining estimates in cure rate models to accommodate the cure threshold and dependent censoring. We additionally suggest a likelihood ratio test for testing for the presence of dependent censoring in the proposed cure rate model. We show through numerical studies that our model has desirable properties and leads to approximately unbiased parameter estimates in a variety of scenarios. To demonstrate how our method performs in practice, we analyze data from a bone marrow transplantation study and a liver transplant study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We propose a two-step procedure to personalize drug dosage over time under the framework of a log-linear mixed-effect model. We model patients' heterogeneity using subject-specific random effects, which are treated as the realizations of an unspecified stochastic process. We extend the conditional quadratic inference function to estimate both fixed-effect coefficients and individual random effects on a longitudinal training data sample in the first step and propose an adaptive procedure to estimate new patients' random effects and provide dosage recommendations for new patients in the second step. An advantage of our approach is that we do not impose any distribution assumption on estimating random effects. Moreover, the new approach can accommodate more general time-varying covariates corresponding to random effects. We show in theory and numerical studies that the proposed method is more efficient compared with existing approaches, especially when covariates are time varying. In addition, a real data example of a clozapine study confirms that our two-step procedure leads to more accurate drug dosage recommendations. Copyright © 2016 John Wiley & Sons, Ltd.

]]>This paper presents a new goodness-of-fit test for an ordered stereotype model used for an ordinal response variable. The proposed test is based on the well-known Hosmer–Lemeshow test and its version for the proportional odds regression model. The latter test statistic is calculated from a grouping scheme assuming that the levels of the ordinal response are equally spaced which might be not true. One of the main advantages of the ordered stereotype model is that it allows us to determine a new uneven spacing of the ordinal response categories, dictated by the data. The proposed test takes the use of this new adjusted spacing to partition data. A simulation study shows good performance of the proposed test under a variety of scenarios. Finally, the results of the application in two examples are presented. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Unmeasured confounding is the fundamental obstacle to drawing causal conclusions about the impact of an intervention from observational data. Typically, covariates are measured to eliminate or ameliorate confounding, but they may be insufficient or unavailable. In the special setting where a transient intervention or exposure varies over time within each individual and confounding is time constant, a different tack is possible. The key idea is to condition on either the overall outcome or the proportion of time in the intervention. These measures can eliminate the unmeasured confounding either by conditioning or by use of a proxy covariate. We evaluate existing methods and develop new models from which causal conclusions can be drawn from such observational data even if no baseline covariates are measured. Our motivation for this work was to determine the causal effect of *Streptococcus* bacteria in the throat on pharyngitis (sore throat) in Indian schoolchildren. Using our models, we show that existing methods can be badly biased and that sick children who are rarely colonized have a high probability that the *Streptococcus* bacteria are causing their disease. Published 2016. This article is a U.S. Government work and is in the public domain in the USA

Unmeasured confounding remains an important problem in observational studies, including pharmacoepidemiological studies of large administrative databases. Several recently developed methods utilize smaller validation samples, with information on additional confounders, to control for confounders unmeasured in the main, larger database. However, up-to-date applications of these methods to survival analyses seem to be limited to propensity score calibration, which relies on a strong surrogacy assumption. We propose a new method, specifically designed for time-to-event analyses, which uses martingale residuals, in addition to measured covariates, to enhance imputation of the unmeasured confounders in the main database. The method is applicable for analyses with both time-invariant data and time-varying exposure/confounders. In simulations, our method consistently eliminated bias because of unmeasured confounding, regardless of surrogacy violation and other relevant design parameters, and almost always yielded lower mean squared errors than other methods applicable for survival analyses, outperforming propensity score calibration in several scenarios. We apply the method to a real-life pharmacoepidemiological database study of the association between glucocorticoid therapy and risk of type II diabetes mellitus in patients with rheumatoid arthritis, with additional potential confounders available in an external validation sample. Compared with conventional analyses, which adjust only for confounders measured in the main database, our estimates suggest a considerably weaker association. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Typically, clusters and individuals in cluster randomized trials are allocated across treatment conditions in a balanced fashion. This is optimal under homogeneous costs and outcome variances. However, both the costs and the variances may be heterogeneous. Then, an unbalanced allocation is more efficient but impractical as the outcome variance is unknown in the design stage of a study. A practical alternative to the balanced design could be a design optimal for known and possibly heterogeneous costs and homogeneous variances. However, when costs and variances are heterogeneous, both designs suffer from loss of efficiency, compared with the optimal design. Focusing on cluster randomized trials with a 2 × 2 design, the relative efficiency of the balanced design and of the design optimal for heterogeneous costs and homogeneous variances is evaluated, relative to the optimal design. We consider two heterogeneous scenarios (two treatment arms with small, and two with large, costs or variances, or one small, two intermediate, and one large costs or variances) at each design level (cluster, individual, and both). Within these scenarios, we compute the relative efficiency of the two designs as a function of the extents of heterogeneity of the costs and variances, and the congruence (the cheapest treatment has the smallest variance) and incongruence (the cheapest treatment has the largest variance) between costs and variances. We find that the design optimal for heterogeneous costs and homogeneous variances is generally more efficient than the balanced design and we illustrate this theory on a trial that examines methods to reduce radiological referrals from general practices. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Motivated by a genetic application, this paper addresses the problem of fitting regression models when the predictor is a proportion measured with error. While the problem of dealing with additive measurement error in fitting regression models has been extensively studied, the problem where the additive error is of a binomial nature has not been addressed. The measurement errors here are heteroscedastic for two reasons; dependence on the underlying true value and changing sampling effort over observations. While some of the previously developed methods for treating additive measurement error with heteroscedasticity can be used in this setting, other methods need modification. A new version of simulation extrapolation is developed, and we also explore a variation on the standard regression calibration method that uses a beta-binomial model based on the fact that the true value is a proportion. Although most of the methods introduced here can be used for fitting non-linear models, this paper will focus primarily on their use in fitting a linear model. While previous work has focused mainly on estimation of the coefficients, we will, with motivation from our example, also examine estimation of the variance around the regression line. In addressing these problems, we also discuss the appropriate manner in which to bootstrap for both inferences and bias assessment. The various methods are compared via simulation, and the results are illustrated using our motivating data, for which the goal is to relate the methylation rate of a blood sample to the age of the individual providing the sample. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Optimal timing of initiating antiretroviral therapy has been a controversial topic in HIV research. Two highly publicized studies applied different analytical approaches, a dynamic marginal structural model and a multiple imputation method, to different observational databases and came up with different conclusions. Discrepancies between the two studies' results could be due to differences between patient populations, fundamental differences between statistical methods, or differences between implementation details. For example, the two studies adjusted for different covariates, compared different thresholds, and had different criteria for qualifying measurements. If both analytical approaches were applied to the same cohort holding technical details constant, would their results be similar? In this study, we applied both statistical approaches using observational data from 12,708 HIV-infected persons throughout the USA. We held technical details constant between the two methods and then repeated analyses varying technical details to understand what impact they had on findings. We also present results applying both approaches to simulated data. Results were similar, although not identical, when technical details were held constant between the two statistical methods. Confidence intervals for the dynamic marginal structural model tended to be wider than those from the imputation approach, although this may have been due in part to additional external data used in the imputation analysis. We also consider differences in the estimands, required data, and assumptions of the two statistical methods. Our study provides insights into assessing optimal dynamic treatment regimes in the context of starting antiretroviral therapy and in more general settings. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Concordance measures are frequently used for assessing the discriminative ability of risk prediction models. The interpretation of estimated concordance at external validation is difficult if the case-mix differs from the model development setting. We aimed to develop a concordance measure that provides insight into the influence of case-mix heterogeneity and is robust to censoring of time-to-event data.

We first derived a model-based concordance (*mbc*) measure that allows for quantification of the influence of case-mix heterogeneity on discriminative ability of proportional hazards and logistic regression models. This *mbc* can also be calculated including a regression slope that calibrates the predictions at external validation (*c-mbc*), hence assessing the influence of overall regression coefficient validity on discriminative ability. We derived variance formulas for both *mbc* and *c-mbc*. We compared the *mbc* and the *c-mbc* with commonly used concordance measures in a simulation study and in two external validation settings.

The *mbc* was asymptotically equivalent to a previously proposed resampling-based case-mix corrected c-index. The *c-mbc* remained stable at the true value with increasing proportions of censoring, while Harrell's c-index and to a lesser extent Uno's concordance measure increased unfavorably. Variance estimates of *mbc* and *c-mbc* were well in agreement with the simulated empirical variances.

We conclude that the *mbc* is an attractive closed-form measure that allows for a straightforward quantification of the expected change in a model's discriminative ability due to case-mix heterogeneity. The *c-mbc* also reflects regression coefficient validity and is a censoring-robust alternative for the c-index when the proportional hazards assumption holds. Copyright © 2016 John Wiley & Sons, Ltd.

The case-control study is a common design for assessing the association between genetic exposures and a disease phenotype. Though association with a given (case-control) phenotype is always of primary interest, there is often considerable interest in assessing relationships between genetic exposures and other (secondary) phenotypes. However, the case-control sample represents a biased sample from the general population. As a result, if this sampling framework is not correctly taken into account, analyses estimating the effect of exposures on secondary phenotypes can be biased leading to incorrect inference. In this paper, we address this problem and propose a general approach for estimating and testing the population effect of a genetic variant on a secondary phenotype. Our approach is based on inverse probability weighted estimating equations, where the weights depend on genotype and the secondary phenotype. We show that, though slightly less efficient than a full likelihood-based analysis when the likelihood is correctly specified, it is substantially more robust to model misspecification, and can out-perform likelihood-based analysis, both in terms of validity and power, when the model is misspecified. We illustrate our approach with an application to a case-control study extracted from the Framingham Heart Study. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Measures of explained variation are useful in scientific research, as they quantify the amount of variation in an outcome variable of interest that is explained by one or more other variables. We develop such measures for correlated survival data, under the proportional hazards mixed-effects model. Because different approaches have been studied in the literature outside the classical linear regression model, we investigate three measures *R*^{2},
, and *ρ*^{2} that quantify three different population coefficients. We show that although the three population measures are not the same, they reflect similar amounts of variation explained by the predictors. Among the three measures, we show that *R*^{2}, which is the simplest to compute, is also consistent for the first population measure under the usual asymptotic scenario when the number of clusters tends to infinity. The other two measures, on the other hand, all require that in addition the cluster sizes be large. We study the properties of the measures both analytically and through simulation studies. We illustrate their different usage on a multi-center clinical trial and a recurrent events data set. Copyright © 2016 John Wiley & Sons, Ltd.

Recurrent event data are quite common in biomedical and epidemiological studies. A significant portion of these data also contain additional longitudinal information on surrogate markers. Previous studies have shown that popular methods using a Cox model with longitudinal outcomes as time-dependent covariates may lead to biased results, especially when longitudinal outcomes are measured with error. Hence, it is important to incorporate longitudinal information into the analysis properly. To achieve this, we model the correlation between longitudinal and recurrent event processes using latent random effect terms. We then propose a two-stage conditional estimating equation approach to model the rate function of recurrent event process conditioned on the observed longitudinal information. The performance of our proposed approach is evaluated through simulation. We also apply the approach to analyze cocaine addiction data collected by the University of Connecticut Health Center. The data include recurrent event information on cocaine relapse and longitudinal cocaine craving scores. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In this paper, we propose a Bayesian method to address misclassification errors in both independent and dependent variables. Our work is motivated by a study of women who have experienced new breast cancers on two separate occasions. We call both cancers *primary*, because the second is usually not considered as the result of a metastasis spreading from the first. Hormone receptors (HRs) are important in breast cancer biology, and it is well recognized that the measurement of HR status is subject to errors. This discordance in HR status for two primary breast cancers is of concern and might be an important reason for treatment failure. To sort out the information on *true* concordance rate from the observed concordance rate, we consider a logistic regression model for the association between the HR status of the two cancers and introduce the misclassification parameters (i.e., sensitivity and specificity) accounting for the misclassification in HR status. The prior distribution for sensitivity and specificity is based on how HR status is actually assessed in laboratory procedures. To account for the nonlinear effect of one error-free covariate, we introduce the *B*-spline terms in the logistic regression model. Our findings indicate that the true concordance rate of HR status between two primary cancers is greater than the observed value. Copyright © 2016 John Wiley & Sons, Ltd.

We consider a class of semiparametric marginal rate models for analyzing recurrent event data. In these models, both time-varying and time-free effects are present, and the estimation of time-varying effects may result in non-smooth regression functions. A typical approach for avoiding this problem and producing smooth functions is based on kernel methods. The traditional kernel-based approach, however, assumes a common degree of smoothness for all time-varying regression functions, which may result in suboptimal estimators if the functions have different levels of smoothness. In this paper, we extend the traditional approach by introducing different bandwidths for different regression functions. First, we establish the asymptotic properties of the suggested estimators. Next, we demonstrate the superiority of our proposed method using two finite-sample simulation studies. Finally, we illustrate our methodology by analyzing a real-world heart disease dataset. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Natural direct and indirect effects decompose the effect of a treatment into the part that is mediated by a covariate (the mediator) and the part that is not. Their definitions rely on the concept of outcomes under treatment with the mediator ‘set’ to its value without treatment. Typically, the mechanism through which the mediator is set to this value is left unspecified, and in many applications, it may be challenging to fix the mediator to particular values for each unit or patient. Moreover, how one sets the mediator may affect the distribution of the outcome. This article introduces ‘organic’ direct and indirect effects, which can be defined and estimated without relying on setting the mediator to specific values. Organic direct and indirect effects can be applied, for example, to estimate how much of the effect of some treatments for HIV/AIDS on mother-to-child transmission of HIV infection is mediated by the effect of the treatment on the HIV viral load in the blood of the mother. Copyright © 2016 John Wiley & Sons, Ltd.

]]>A key objective of Phase II dose finding studies in clinical drug development is to adequately characterize the dose response relationship of a new drug. An important decision is then on the choice of a suitable dose response function to support dose selection for the subsequent Phase III studies. In this paper, we compare different approaches for model selection and model averaging using mathematical properties as well as simulations. We review and illustrate asymptotic properties of model selection criteria and investigate their behavior when changing the sample size but keeping the effect size constant. In a simulation study, we investigate how the various approaches perform in realistically chosen settings. Finally, the different methods are illustrated with a recently conducted Phase II dose finding study in patients with chronic obstructive pulmonary disease. Copyright © 2016 John Wiley & Sons, Ltd.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. *Statistics in Medicine* Published by John Wiley & Sons Ltd.

We consider the non-inferiority (or equivalence) test of the odds ratio (OR) in a crossover study with binary outcomes to evaluate the treatment effects of two drugs. To solve this problem, Lui and Chang (2011) proposed both an asymptotic method and a conditional method based on a random effects logit model. Kenward and Jones (1987) proposed a likelihood ratio test (*L**R**T*_{M}) based on a log linear model. These existing methods are all subject to model misspecification. In this paper, we propose a likelihood ratio test (*LRT*) and a score test that are independent of model specification. Monte Carlo simulation studies show that, in scenarios considered in this paper, both the *LRT* and the score test have higher power than the asymptotic and conditional methods for the non-inferiority test; the *LRT*, score, and asymptotic methods have similar power, and they all have higher power than the conditional method for the equivalence test. When data can be well described by a log linear model, the *L**R**T*_{M} has the highest power among all the five methods (*L**R**T*_{M}, *LRT*, score, asymptotic, and conditional) for both non-inferiority and equivalence tests. However, in scenarios for which a log linear model does not describe the data well, the *L**R**T*_{M} has the lowest power for the non-inferiority test and has inflated type I error rates for the equivalence test. We provide an example from a clinical trial that illustrates our methods. Copyright © 2016 John Wiley & Sons, Ltd.

Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation.

We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents.

We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design.

Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Most phase I dose-finding methods in oncology aim to find the maximum-tolerated dose from a set of prespecified doses. However, in practice, because of a lack of understanding of the true dose–toxicity relationship, it is likely that none of these prespecified doses are equal or reasonably close to the true maximum-tolerated dose. To handle this issue, we propose an adaptive dose modification (ADM) method that can be coupled with any existing dose-finding method to adaptively modify the dose, when it is needed, during the course of dose finding. To reflect clinical practice, we divide the toxicity probability into three regions: underdosing, acceptable, and overdosing regions. We adaptively add a new dose whenever the observed data suggest that none of the investigational doses are likely to be located in the acceptable region. The new dose is estimated via a nonparametric dose–toxicity model based on local polynomial regression. The simulation study shows that ADM substantially outperforms the similar existing method. We applied ADM to a phase I cancer trial. Copyright © 2016 John Wiley & Sons, Ltd.

]]>We present here an extension of the classic bivariate random effects meta-analysis for the log-transformed sensitivity and specificity that can be applied for two or more diagnostic tests. The advantage of this method is that a closed-form expression is derived for the calculation of the within-studies covariances. The method allows the direct calculation of sensitivity and specificity, as well as, the diagnostic odds ratio, the area under curve and the parameters of the summary receiver operator's characteristic curve, along with the means for a formal comparison of these quantities for different tests. There is no need for individual patient data or the simultaneous evaluation of both diagnostic tests in all studies. The method is simple and fast; it can be extended for several diagnostic tests and can be fitted in nearly all statistical packages. The method was evaluated in simulations and applied in a meta-analysis for the comparison of anti-cyclic citrullinated peptide antibody and rheumatoid factor for discriminating patients with rheumatoid arthritis, with encouraging results. Simulations suggest that the method is robust and more powerful compared with the standard bivariate approach that ignores the correlation between tests. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Network meta-analysis (NMA), also known as multiple treatment comparisons, is commonly used to incorporate direct and indirect evidence comparing treatments. With recent advances in methods and software, Bayesian approaches to NMA have become quite popular and allow models of previously unanticipated complexity. However, when direct and indirect evidence differ in an NMA, the model is said to suffer from *inconsistency*. Current inconsistency detection in NMA is usually based on contrast-based (CB) models; however, this approach has certain limitations. In this work, we propose an arm-based random effects model, where we detect discrepancy of direct and indirect evidence for comparing two treatments using the fixed effects in the model while flagging extreme trials using the random effects. We define discrepancy factors to characterize evidence of inconsistency for particular treatment comparisons, which is novel in NMA research. Our approaches permit users to address issues previously tackled via CB models. We compare sources of inconsistency identified by our approach and existing loop-based CB methods using real and simulated datasets and demonstrate that our methods can offer powerful inconsistency detection. Copyright © 2016 John Wiley & Sons, Ltd.

Propensity score (PS) methods have been used extensively to adjust for confounding factors in the statistical analysis of observational data in comparative effectiveness research. There are four major PS-based adjustment approaches: PS matching, PS stratification, covariate adjustment by PS, and PS-based inverse probability weighting. Though covariate adjustment by PS is one of the most frequently used PS-based methods in clinical research, the conventional variance estimation of the treatment effects estimate under covariate adjustment by PS is biased. As Stampf *et al*. have shown, this bias in variance estimation is likely to lead to invalid statistical inference and could result in erroneous public health conclusions (e.g., food and drug safety and adverse events surveillance). To address this issue, we propose a two-stage analytic procedure to develop a valid variance estimator for the covariate adjustment by PS analysis strategy. We also carry out a simple empirical bootstrap resampling scheme. Both proposed procedures are implemented in an R function for public use. Extensive simulation results demonstrate the bias in the conventional variance estimator and show that both proposed variance estimators offer valid estimates for the true variance, and they are robust to complex confounding structures. The proposed methods are illustrated for a post-surgery pain study. Copyright © 2016 John Wiley & Sons, Ltd.

Markov three-state progressive and illness–death models are often used in biomedicine for describing survival data when an intermediate event of interest may be observed during the follow-up. However, the usual estimators for Markov models (e.g., Aalen–Johansen transition probabilities) may be systematically biased in non-Markovian situations. On the other hand, despite non-Markovian estimators for transition probabilities and related curves are available, including the Markov information in the construction of the estimators allows for variance reduction. Therefore, testing for the Markov condition is a relevant issue in practice. In this paper, we discuss several characterizations of the Markov condition, with special focus on its equivalence with the quasi-independence between left truncation and survival times in standard survival analysis. New methods for testing the Markovianity of an illness–death model are proposed and compared with existing ones by means of an intensive simulation study. We illustrate our findings through the analysis of a data set from stem cell transplant in leukemia. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Resecting bone tumors requires good cutting accuracy to reduce the occurrence of local recurrence. This issue is considerably reduced with a navigated technology. The estimation of extreme proportions is challenging especially with small or moderate sample sizes. When no success is observed, the commonly used binomial proportion confidence interval is not suitable while the rule of three provides a simple solution. Unfortunately, these approaches are unable to differentiate between different unobserved events. Different delta methods and bootstrap procedures are compared in univariate and linear mixed models with simulations and real data by assuming the normality. The delta method on the z-score and parametric bootstrap provide similar results but the delta method requires the estimation of the covariance matrix of the estimates. In mixed models, the observed Fisher information matrix with unbounded variance components should be preferred. The parametric bootstrap, easier to apply, outperforms the delta method for larger sample sizes but it may be time costly. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Shared parameter joint models provide a framework under which a longitudinal response and a time to event can be modelled simultaneously. A common assumption in shared parameter joint models has been to assume that the longitudinal response is normally distributed. In this paper, we instead propose a joint model that incorporates a two-part ‘hurdle’ model for the longitudinal response, motivated in part by longitudinal response data that is subject to a detection limit. The first part of the hurdle model estimates the probability that the longitudinal response is observed above the detection limit, whilst the second part of the hurdle model estimates the mean of the response conditional on having exceeded the detection limit. The time-to-event outcome is modelled using a parametric proportional hazards model, assuming a Weibull baseline hazard. We propose a novel association structure whereby the current hazard of the event is assumed to be associated with the current combined (expected) outcome from the two parts of the hurdle model. We estimate our joint model under a Bayesian framework and provide code for fitting the model using the Bayesian software Stan. We use our model to estimate the association between HIV RNA viral load, which is subject to a lower detection limit, and the hazard of stopping or modifying treatment in patients with HIV initiating antiretroviral therapy. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing ‘loss based’ recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new ‘doubly robust’ extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Parametric mixed-effects models are useful in longitudinal data analysis when the sampling frequencies of a response variable and the associated covariates are the same. We propose a three-step estimation procedure using local polynomial smoothing and demonstrate with data where the variables to be assessed are repeatedly sampled with different frequencies within the same time frame. We first insert pseudo data for the less frequently sampled variable based on the observed measurements to create a new dataset. Then standard simple linear regressions are fitted at each time point to obtain raw estimates of the association between dependent and independent variables. Last, local polynomial smoothing is applied to smooth the raw estimates. Rather than use a kernel function to assign weights, only analytical weights that reflect the importance of each raw estimate are used. The standard errors of the raw estimates and the distance between the pseudo data and the observed data are considered as the measure of the importance of the raw estimates. We applied the proposed method to a weight loss clinical trial, and it efficiently estimated the correlation between the inconsistently sampled longitudinal data. Our approach was also evaluated via simulations. The results showed that the proposed method works better when the residual variances of the standard linear regressions are small and the within-subjects correlations are high. Also, using analytic weights instead of kernel function during local polynomial smoothing is important when raw estimates have extreme values, or the association between the dependent and independent variable is nonlinear. Copyright © 2016 John Wiley & Sons, Ltd.

Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd.

]]>The incremental life expectancy, defined as the difference in mean survival times between two treatment groups, is a crucial quantity of interest in cost-effectiveness analyses. Usually, this quantity is very difficult to estimate from censored survival data with a limited follow-up period. The paper develops estimation procedures for a time-shift survival model that, provided model assumptions are met, gives a reliable estimate of incremental life expectancy without extrapolation beyond the study period. Methods for inference are developed both for individual patient data and when only published Kaplan–Meier curves are available. Through simulation, the estimators are shown to be close to unbiased and constructed confidence intervals are shown to have close to nominal coverage for small to moderate sample sizes. Copyright © 2016 John Wiley & Sons, Ltd.

]]>