In this paper, we considered different methods to test the interaction between treatment and a potentially large number (*p*) of covariates in randomized clinical trials. The simplest approach was to fit univariate (marginal) models and to combine the univariate statistics or *p*-values (e.g., minimum *p*-value). Another possibility was to reduce the dimension of the covariates using the principal components (PCs) and to test the interaction between treatment and PCs. Finally, we considered the Goeman global test applied to the high-dimensional interaction matrix, adjusted for the main (treatment and covariates) effects. These tests can be used for personalized medicine to test if a large set of biomarkers can be useful to identify a subset of patients who may be more responsive to treatment. We evaluated the performance of these methods on simulated data and we applied them on data from two early phases oncology clinical trials.

We discuss group-sequential designs in superiority clinical trials with multiple co-primary endpoints, that is, when trials are designed to evaluate if the test intervention is superior to the control on all primary endpoints. We consider several decision-making frameworks for evaluating efficacy or futility, based on boundaries using group-sequential methodology. We incorporate the correlations among the endpoints into the calculations for futility boundaries and sample sizes as a function of other design parameters, including mean differences, the number of analyses, and efficacy boundaries. We investigate the operating characteristics of the proposed decision-making frameworks in terms of efficacy/futility boundaries, power, the Type I error rate, and sample sizes, while varying the number of analyses, the correlations among the endpoints, and the mean differences. We provide an example to illustrate the methods and discuss practical considerations when designing efficient group-sequential designs in clinical trials with co-primary endpoints.

]]>Random-effects meta-analyses are used to combine evidence of treatment effects from multiple studies. Since treatment effects may vary across trials due to differences in study characteristics, heterogeneity in treatment effects between studies must be accounted for to achieve valid inference. The standard model for random-effects meta-analysis assumes approximately normal effect estimates and a normal random-effects model. However, standard methods based on this model ignore the uncertainty in estimating the between-trial heterogeneity. In the special setting of only two studies and in the presence of heterogeneity, we investigate here alternatives such as the Hartung-Knapp-Sidik-Jonkman method (HKSJ), the modified Knapp-Hartung method (mKH, a variation of the HKSJ method) and Bayesian random-effects meta-analyses with priors covering plausible heterogeneity values; code to reproduce the examples is presented in an appendix. The properties of these methods are assessed by applying them to five examples from various rare diseases and by a simulation study. Whereas the standard method based on normal quantiles has poor coverage, the HKSJ and mKH generally lead to very long, and therefore inconclusive, confidence intervals. The Bayesian intervals on the whole show satisfying properties and offer a reasonable compromise between these two extremes.

]]>Science can be seen as a sequential process where each new study augments evidence to the existing knowledge. To have the best prospects to make an impact in this process, a new study should be designed optimally taking into account the previous studies and other prior information. We propose a formal approach for the covariate prioritization, that is the decision about the covariates to be measured in a new study. The decision criteria can be based on conditional power, change of the *p*-value, change in lower confidence limit, Kullback–Leibler divergence, Bayes factors, Bayesian false discovery rate or difference between prior and posterior expectation. The criteria can be also used for decisions on the sample size. As an illustration, we consider covariate prioritization based on genome-wide association studies for C-reactive protein levels and make suggestions on the genes to be studied further.

Measurement error in exposure variables is a serious impediment in epidemiological studies that relate exposures to health outcomes. In nutritional studies, interest could be in the association between long-term dietary intake and disease occurrence. Long-term intake is usually assessed with food frequency questionnaire (FFQ), which is prone to recall bias. Measurement error in FFQ-reported intakes leads to bias in parameter estimate that quantifies the association. To adjust for bias in the association, a calibration study is required to obtain unbiased intake measurements using a short-term instrument such as 24-hour recall (24HR). The 24HR intakes are used as response in regression calibration to adjust for bias in the association. For foods not consumed daily, 24HR-reported intakes are usually characterized by excess zeroes, right skewness, and heteroscedasticity posing serious challenge in regression calibration modeling. We proposed a zero-augmented calibration model to adjust for measurement error in reported intake, while handling excess zeroes, skewness, and heteroscedasticity simultaneously without transforming 24HR intake values. We compared the proposed calibration method with the standard method and with methods that ignore measurement error by estimating long-term intake with 24HR and FFQ-reported intakes. The comparison was done in real and simulated datasets. With the 24HR, the mean increase in mercury level per ounce fish intake was about 0.4; with the FFQ intake, the increase was about 1.2. With both calibration methods, the mean increase was about 2.0. Similar trend was observed in the simulation study. In conclusion, the proposed calibration method performs at least as good as the standard method.

]]>Randomized clinical trials comparing several treatments to a common control are often reported in the medical literature. For example, multiple experimental treatments may be compared with placebo, or in combination therapy trials, a combination therapy may be compared with each of its constituent monotherapies. Such trials are typically designed using a balanced approach in which equal numbers of individuals are randomized to each arm, however, this can result in an inefficient use of resources. We provide a unified framework and new theoretical results for optimal design of such single-control multiple-comparator studies. We consider variance optimal designs based on *D*-, *A*-, and *E*-optimality criteria, using a general model that allows for heteroscedasticity and a range of effect measures that include both continuous and binary outcomes. We demonstrate the sensitivity of these designs to the type of optimality criterion by showing that the optimal allocation ratios are systematically ordered according to the optimality criterion. Given this sensitivity to the optimality criterion, we argue that power optimality is a more suitable approach when designing clinical trials where testing is the objective. Weighted variance optimal designs are also discussed, which, like power optimal designs, allow the treatment difference to play a major role in determining allocation ratios. We illustrate our methods using two real clinical trial examples taken from the medical literature. Some recommendations on the use of optimal designs in single-control multiple-comparator trials are also provided.

This paper presents a novel semiparametric joint model for multivariate longitudinal and survival data (SJMLS) by relaxing the normality assumption of the longitudinal outcomes, leaving the baseline hazard functions unspecified and allowing the history of the longitudinal response having an effect on the risk of dropout. Using Bayesian penalized splines to approximate the unspecified baseline hazard function and combining the Gibbs sampler and the Metropolis–Hastings algorithm, we propose a Bayesian Lasso (BLasso) method to simultaneously estimate unknown parameters and select important covariates in SJMLS. Simulation studies are conducted to investigate the finite sample performance of the proposed techniques. An example from the International Breast Cancer Study Group (IBCSG) is used to illustrate the proposed methodologies.

]]>We present the one-inflated zero-truncated negative binomial (OIZTNB) model, and propose its use as the truncated count distribution in Horvitz–Thompson estimation of an unknown population size. In the presence of unobserved heterogeneity, the zero-truncated negative binomial (ZTNB) model is a natural choice over the positive Poisson (PP) model; however, when one-inflation is present the ZTNB model either suffers from a boundary problem, or provides extremely biased population size estimates. Monte Carlo evidence suggests that in the presence of one-inflation, the Horvitz–Thompson estimator under the ZTNB model can converge in probability to infinity. The OIZTNB model gives markedly different population size estimates compared to some existing truncated count distributions, when applied to several capture–recapture data that exhibit both one-inflation and unobserved heterogeneity.

]]>The use of control charts for monitoring schemes in medical context should consider adjustments to incorporate the specific risk for each individual. Some authors propose the use of a risk-adjusted survival time cumulative sum (RAST CUSUM) control chart to monitor a time-to-event outcome, possibly right censored, using conventional survival models, which do not contemplate the possibility of cure of a patient. We propose to extend this approach considering a risk-adjusted CUSUM chart, based on a cure rate model. We consider a regression model in which the covariates affect the cure fraction. The CUSUM scores are obtained for Weibull and log-logistic promotion time model to monitor a scale parameter for nonimmune individuals. A simulation study was conducted to evaluate and compare the performance of the proposed chart (RACUF CUSUM) with RAST CUSUM, based on optimal control limits and average run length in different situations. As a result, we note that the RAST CUSUM chart is inappropriate when applied to data with a cure rate, while the proposed RACUF CUSUM chart seems to have similar performance if applied to data without a cure rate. The proposed chart is illustrated with simulated data and with a real data set of patients with heart failure treated at the Heart Institute (InCor), at the University of São Paulo, Brazil.

]]>We use data from an ongoing cohort study of chronic kidney patients at Salford Royal NHS Foundation Trust, Greater Manchester, United Kingdom, to investigate the influence of acute kidney injury (AKI) on the subsequent rate of change of kidney function amongst patients already diagnosed with chronic kidney disease (CKD). We use a linear mixed effects modelling framework to enable estimation of both acute and chronic effects of AKI events on kidney function. We model the fixed effects by a piece-wise linear function with three change-points to capture the acute changes in kidney function that characterise an AKI event, and the random effects by the sum of three components: a random intercept, a stationary stochastic process with Matérn correlation structure, and measurement error. We consider both multivariate Normal and multivariate *t* versions of the random effects. For either specification, we estimate model parameters by maximum likelihood and evaluate the plug-in predictive distributions of the random effects given the data. We find that following an AKI event the average long-term rate of decline in kidney function is almost doubled, regardless of the severity of the event. We also identify and present examples of individual patients whose kidney function trajectories diverge substantially from the population-average.

Standard optimization algorithms for maximizing likelihood may not be applicable to the estimation of those flexible multivariable models that are nonlinear in their parameters. For applications where the model's structure permits separating estimation of mutually exclusive subsets of parameters into distinct steps, we propose the alternating conditional estimation (ACE) algorithm. We validate the algorithm, in simulations, for estimation of two flexible extensions of Cox's proportional hazards model where the standard maximum partial likelihood estimation does not apply, with simultaneous modeling of (1) nonlinear and time-dependent effects of continuous covariates on the hazard, and (2) nonlinear interaction and main effects of the same variable. We also apply the algorithm in real-life analyses to estimate nonlinear and time-dependent effects of prognostic factors for mortality in colon cancer. Analyses of both simulated and real-life data illustrate good statistical properties of the ACE algorithm and its ability to yield new potentially useful insights about the data structure.

]]>The food frequency questionnaire (FFQ) is known to be prone to measurement error. Researchers have suggested excluding implausible energy reporters (IERs) of FFQ total energy when examining the relationship between a health outcome and FFQ-reported intake to obtain less biased estimates of the effect of the error-prone measure of exposure; however, the statistical properties of stratifying by IER status have not been studied. Under certain assumptions, including nondifferential error, we show that when stratifying by IER status, the attenuation of the estimated relative risk in the stratified models will be either greater or less in both strata (implausible and plausible reporters) than for the nonstratified model, contrary to the common belief that the attenuation will be less among plausible reporters and greater among IERs. Whether there is more or less attenuation depends on the pairwise correlations between true exposure, observed exposure, and the stratification variable. Thus exclusion of IERs is inadvisable but stratification by IER status can sometimes help. We also address the case of differential error. Examples from the Observing Protein and Energy Nutrition Study and simulations illustrate these results.

]]>For the calculation of relative measures such as risk ratio (RR) and odds ratio (OR) in a single study, additional approaches are required for the case of zero events. In the case of zero events in one treatment arm, the Peto odds ratio (POR) can be calculated without continuity correction, and is currently the relative effect estimation method of choice for binary data with rare events. The aim of this simulation study is a variegated comparison of the estimated OR and estimated POR with the true OR in a single study with two parallel groups without confounders in data situations where the POR is currently recommended. This comparison was performed by means of several performance measures, that is the coverage, confidence interval (CI) width, mean squared error (MSE), and mean percentage error (MPE). We demonstrated that the estimator for the POR does not outperform the estimator for the OR for all the performance measures investigated. In the case of rare events, small treatment effects and similar group sizes, we demonstrated that the estimator for the POR performed better than the estimator for the OR only regarding the coverage and MPE, but not the CI width and MSE. For larger effects and unbalanced group size ratios, the coverage and MPE of the estimator for the POR were inappropriate. As in practice the true effect is unknown, the POR method should be applied only with the utmost caution.

]]>Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a *calibration sample*. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative.

A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified *a priori*, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.

We investigate rank-based studentized permutation methods for the nonparametric Behrens–Fisher problem, that is, inference methods for the area under the ROC curve. We hereby prove that the studentized permutation distribution of the Brunner-Munzel rank statistic is asymptotically standard normal, even under the alternative. Thus, incidentally providing the hitherto missing theoretical foundation for the Neubert and Brunner studentized permutation test. In particular, we do not only show its consistency, but also that confidence intervals for the underlying treatment effects can be computed by inverting this permutation test. In addition, we derive permutation-based range-preserving confidence intervals. Extensive simulation studies show that the permutation-based confidence intervals appear to maintain the preassigned coverage probability quite accurately (even for rather small sample sizes). For a convenient application of the proposed methods, a freely available software package for the statistical software R has been developed. A real data example illustrates the application.

]]>Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland.

]]>The negative binomial distribution is a common model for the analysis of count data in biology and ecology. In many applications, we may not observe the complete frequency count in a quadrat but only that a species occurred in the quadrat. If only occurrence data are available then the two parameters of the negative binomial distribution, the aggregation index and the mean, are not identifiable. This can be overcome by data augmentation or through modeling the dependence between quadrat occupancies. Here, we propose to record the (first) detection time while collecting occurrence data in a quadrat. We show that under what we call proportionate sampling, where the time to survey a region is proportional to the area of the region, that both negative binomial parameters are estimable. When the mean parameter is larger than two, our proposed approach is more efficient than the data augmentation method developed by Solow and Smith (, *Am. Nat*. **176**, 96–98), and in general is cheaper to conduct. We also investigate the effect of misidentification when collecting negative binomially distributed data, and conclude that, in general, the effect can be simply adjusted for provided that the mean and variance of misidentification probabilities are known. The results are demonstrated in a simulation study and illustrated in several real examples.

We consider models for hierarchical count data, subject to overdispersion and/or excess zeros. Molenberghs et al. () and Molenberghs et al. () extend the Poisson-normal generalized linear-mixed model by including gamma random effects to accommodate overdispersion. Excess zeros are handled using either a zero-inflation or a hurdle component. These models were studied by Kassahun et al. (). While flexible, they are quite elaborate in parametric specification and therefore model assessment is imperative. We derive local influence measures to detect and examine influential subjects, that is subjects who have undue influence on either the fit of the model as a whole, or on specific important sub-vectors of the parameter vector. The latter include the fixed effects for the Poisson and for the excess-zeros components, the variance components for the normal random effects, and the parameters describing gamma random effects, included to accommodate overdispersion. Interpretable influence components are derived. The method is applied to data from a longitudinal clinical trial involving patients with epileptic seizures. Even though the data were extensively analyzed in earlier work, the insight gained from the proposed diagnostics, statistically and clinically, is considerable. Possibly, a small but important subgroup of patients has been identified.

]]>Characterization of a subpopulation by the difference in marginal means of the outcome under the intervention and control may not be sufficient to provide informative guidance for individual decision and public policy making. Specifically, often we are interested in the treatment benefit rate (TBR), that is, the probability of benefitting an intervention in a meaningful way. For binary outcomes, TBR is the proportion that has “unfavorable” outcome under the control and “favorable” outcome under the intervention. Identification of subpopulations with distinct TBR by baseline characteristics will have significant implications in clinical setting where a medical intervention with potential negative health impact is under consideration for a given patient. In addition, these subpopulations with unique TBR set the basis for guidance in implementing the intervention toward a more personalized scheme of treatment. In this article, we propose a Bayesian tree based latent variable model to seek subpopulations with distinct TBR. Our method offers a nonparametric Bayesian framework that accounts for the uncertainty in estimating potential outcomes and allows more exhaustive search of the partitions of the baseline covariates space. The method is evaluated through a simulation study and applied to a randomized clinical trial of implantable cardioverter defibrillators to reduce mortality.

]]>To optimize resources, randomized clinical trials with multiple arms can be an attractive option to simultaneously test various treatment regimens in pharmaceutical drug development. The motivation for this work was the successful conduct and positive final outcome of a three-arm randomized clinical trial primarily assessing whether obinutuzumab plus chlorambucil in patients with chronic lympocytic lymphoma and coexisting conditions is superior to chlorambucil alone based on a time-to-event endpoint. The inference strategy of this trial was based on a closed testing procedure. We compare this strategy to three potential alternatives to run a three-arm clinical trial with a time-to-event endpoint. The primary goal is to quantify the differences between these strategies in terms of the time it takes until the first analysis and thus potential approval of a new drug, number of required events, and power. Operational aspects of implementing the various strategies are discussed. In conclusion, using a closed testing procedure results in the shortest time to the first analysis with a minimal loss in power. Therefore, closed testing procedures should be part of the statistician's standard clinical trials toolbox when planning multiarm clinical trials.

]]>For clinical studies in which two coprimary endpoints are necessary for assuring efficacy of the treatment of interest, it is important to determine the minimal sample size needed to attain a certain conjunctive power (i.e., power to reject false null hypothesis for both endpoints). The traditional method of assigning the square root of the targeted overall power to each of the two hypothesis tests is optimal only when the standardized treatment effect sizes of the two endpoints are equal. In spite of this limitation the square root method is applied routinely, resulting in frequent overestimation of the overall sample size. A new, iterative method is presented to find the two individual power values for the two endpoints so that the targeted overall power is attained with the smallest possible overall sample size. The principle is to assign more power to the endpoint for which a larger standardized effect size is likely to occur based on prior information. The main assumption of the new method is the independence of endpoints. However, this is not a serious limitation in case of type II error, thus the method yields a good approximation even if the condition of independence is not fulfilled. The advantages of the new method are (a) a considerable saving (up to 24% in our examples) in the overall sample size, (b) the flexibility as it can be applied to any combination of endpoint types (e.g., normally distributed + binomial, survival + binomial, etc.) and (c) easy to program.

]]>In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky, or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. Existing verification bias correction in three-way ROC analysis focuses on ordinal tests. We propose verification bias-correction methods to construct ROC surface and estimate the VUS for a continuous diagnostic test, based on inverse probability weighting. By applying U-statistics theory, we develop asymptotic properties for the estimator. A Jackknife estimator of variance is also derived. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. The proposed methods are used to assess the ability of a biomarker to accurately identify stages of Alzheimer's disease.

]]>We propose a method to plan the number of occasions of recapture experiments for population size estimation. We do so by fixing the smallest number of capture occasions so that the expected length of the profile confidence interval is less than or equal to a fixed threshold. In some cases, we solve the optimization problem in closed form. For more complex models we use numerical optimization. We detail models assuming homogeneous, time-varying, subject-specific capture probabilities, behavioral response to capture, and combining behavioral response with subject-specific effects. The principle we propose can be extended to plan any other model specification. We formally show the validity of the approach by proving distributional convergence. We illustrate with simulations and challenging examples in epidemiology and ecology. We report that in many cases adding as few as two sampling occasions may substantially reduce the length of confidence intervals.

]]>In scientific research, many hypotheses relate to the comparison of two independent groups. Usually, it is of interest to use a design (i.e., the allocation of sample sizes *m* and *n* for fixed ) that maximizes the power of the applied statistical test. It is known that the two-sample *t*-tests for homogeneous and heterogeneous variances may lose substantial power when variances are unequal but equally large samples are used. We demonstrate that this is not the case for the nonparametric Wilcoxon–Mann–Whitney-test, whose application in biometrical research fields is motivated by two examples from cancer research. We prove the optimality of the design in case of symmetric and identically shaped distributions using normal approximations and show that this design generally offers power only negligibly lower than the optimal design for a wide range of distributions.

In high-dimensional omics studies where multiple molecular profiles are obtained for each set of patients, there is often interest in identifying complex multivariate associations, for example, copy number regulated expression levels in a certain pathway or in a genomic region. To detect such associations, we present a novel approach to test for association between two sets of variables. Our approach generalizes the global test, which tests for association between a group of covariates and a single univariate response, to allow high-dimensional multivariate response. We apply the method to several simulated datasets as well as two publicly available datasets, where we compare the performance of multivariate global test (G2) with univariate global test. The method is implemented in R and will be available as a part of the globaltest package in R.

]]>Spontaneous adverse event reports have a high potential for detecting adverse drug reactions. However, due to their dimension, the analysis of such databases requires statistical methods. In this context, disproportionality measures can be used. Their main idea is to project the data onto contingency tables in order to measure the strength of associations between drugs and adverse events. However, due to the data projection, these methods are sensitive to the problem of coprescriptions and masking effects. Recently, logistic regressions have been used with a Lasso type penalty to perform the detection of associations between drugs and adverse events. On different examples, this approach limits the drawbacks of the disproportionality methods, but the choice of the penalty value is open to criticism while it strongly influences the results. In this paper, we propose to use a logistic regression whose sparsity is viewed as a model selection challenge. Since the model space is huge, a Metropolis–Hastings algorithm carries out the model selection by maximizing the BIC criterion. Thus, we avoid the calibration of penalty or threshold. During our application on the French pharmacovigilance database, the proposed method is compared to well-established approaches on a reference dataset, and obtains better rates of positive and negative controls. However, many signals (i.e., specific drug–event associations) are not detected by the proposed method. So, we conclude that this method should be used in parallel to existing measures in pharmacovigilance.

Code implementing the proposed method is available at the following url: https://github.com/masedki/MHTrajectoryR.

We define an adaptive procedure for control of the false discovery rate that is uniformly more powerful than the procedure of Benjamini and Hochberg. The power gain is tiny, however, and only appreciable for small numbers of hypotheses. We illustrate the new method with the case of two hypotheses, for which so far no procedure was known that controls false discovery rate but not also familywise error rate under positive dependence.

]]>Few articles have been written on analyzing three-way interactions between drugs. It may seem to be quite straightforward to extend a statistical method from two-drugs to three-drugs. However, there may exist more complex nonlinear response surface of the interaction index () with more complex local synergy and/or local antagonism interspersed in different regions of drug combinations in a three-drug study, compared in a two-drug study. In addition, it is not possible to obtain a four-dimensional (4D) response surface plot for a three-drug study. We propose an analysis procedure to construct the dose combination regions of interest (say, the synergistic areas with ). First, use the model robust regression method (MRR), a semiparametric method, to fit the entire response surface of the , which allows to fit a complex response surface with local synergy/antagonism. Second, we run a modified genetic algorithm (MGA), a stochastic optimization method, many times with different random seeds, to allow to collect as many feasible points as possible that satisfy the estimated values of . Last, all these feasible points are used to construct the approximate dose regions of interest in a 3D. A case study with three anti-cancer drugs in an in vitro experiment is employed to illustrate how to find the dose regions of interest.

]]>The problem of choosing a sample size for a clinical trial is a very common one. In some settings, such as rare diseases or other small populations, the large sample sizes usually associated with the standard frequentist approach may be infeasible, suggesting that the sample size chosen should reflect the size of the population under consideration. Incorporation of the population size is possible in a decision-theoretic approach either explicitly by assuming that the population size is fixed and known, or implicitly through geometric discounting of the gain from future patients reflecting the expected population size. This paper develops such approaches. Building on previous work, an asymptotic expression is derived for the sample size for single and two-arm clinical trials in the general case of a clinical trial with a primary endpoint with a distribution of one parameter exponential family form that optimizes a utility function that quantifies the cost and gain per patient as a continuous function of this parameter. It is shown that as the size of the population, *N*, or expected size, in the case of geometric discounting, becomes large, the optimal trial size is or . The sample size obtained from the asymptotic expression is also compared with the exact optimal sample size in examples with responses with Bernoulli and Poisson distributions, showing that the asymptotic approximations can also be reasonable in relatively small sample sizes.

In randomized trials with noncompliance, causal effects cannot be identified without strong assumptions. Therefore, several authors have considered bounds on the causal effects. Applying an idea of VanderWeele (), Chiba () gave bounds on the average causal effects in randomized trials with noncompliance using the information on the randomized assignment, the treatment received and the outcome under monotonicity assumptions about covariates. But he did not consider any observed covariates. If there are some observed covariates such as age, gender, and race in a trial, we propose new bounds using the observed covariate information under some monotonicity assumptions similar to those of VanderWeele and Chiba. And we compare the three bounds in a real example.

]]>Many attempts have been made to formalize ethical requirements for research. Among the most prominent mechanisms are informed consent requirements and data protection regimes. These mechanisms, however, sometimes appear as obstacles to research. In this opinion paper, we critically discuss conventional approaches to research ethics that emphasize consent and data protection. Several recent debates have highlighted other important ethical issues and underlined the need for greater openness in order to uphold the integrity of health-related research. Some of these measures, such as the sharing of individual-level data, pose problems for standard understandings of consent and privacy. Here, we argue that these interpretations tend to be overdemanding: They do not really protect research subjects and they hinder the research process. Accordingly, we suggest another way of framing these requirements. Individual consent must be situated alongside the wider distribution of knowledge created when the actions, commitments, and procedures of researchers and their institutions are opened to scrutiny. And instead of simply emphasizing privacy or data protection, we should understand confidentiality as a principle that facilitates the sharing of information while upholding important safeguards. Consent and confidentiality belong to a broader set of safeguards and procedures to uphold the integrity of the research process.

]]>Pooled study designs, where individual biospecimens are combined prior to measurement via a laboratory assay, can reduce lab costs while maintaining statistical efficiency. Analysis of the resulting pooled measurements, however, often requires specialized techniques. Existing methods can effectively estimate the relation between a binary outcome and a continuous pooled exposure when pools are matched on disease status. When pools are of mixed disease status, however, the existing methods may not be applicable. By exploiting characteristics of the gamma distribution, we propose a flexible method for estimating odds ratios from pooled measurements of mixed and matched status. We use simulation studies to compare consistency and efficiency of risk effect estimates from our proposed methods to existing methods. We then demonstrate the efficacy of our method applied to an analysis of pregnancy outcomes and pooled cytokine concentrations. Our proposed approach contributes to the toolkit of available methods for analyzing odds ratios of a pooled exposure, without restricting pools to be matched on a specific outcome.

]]>In this paper, a new class of models for autoradiographic hot-line data is proposed. The models, for which there is theoretical justification, are a linear combination of generalized Student's *t*-distributions and have as special cases all currently accepted line-spread models. The new models are used to analyse experimental hot-line data and compared with the fit of current models. The data are from a line source labelled with iodine-125 in a resin section of 0.6 m in thickness. It will be shown that a significant improvement in goodness of fit, over that of previous models, can be achieved by choosing from this new class of models. A single model from this class will be proposed that has a simple form made up of only two components, but which fits experimental data significantly better than previous models. A short sensitivity analysis indicates that estimation is reliable. The modelling approach, although motivated by and applied to autoradiography, is appropriate for any mixture modelling situation.

Biomarkers are subject to censoring whenever some measurements are not quantifiable given a laboratory detection limit. Methods for handling censoring have received less attention in genetic epidemiology, and censored data are still often replaced with a fixed value. We compared different strategies for handling a left-censored continuous biomarker in a family-based study, where the biomarker is tested for association with a genetic variant, , adjusting for a covariate, X. Allowing different correlations between X and , we compared simple substitution of censored observations with the detection limit followed by a linear mixed effect model (LMM), Bayesian model with noninformative priors, Tobit model with robust standard errors, the multiple imputation (MI) with and without in the imputation followed by a LMM. Our comparison was based on real and simulated data in which 20% and 40% censoring were artificially induced. The complete data were also analyzed with a LMM. In the MICROS study, the Bayesian model gave results closer to those obtained with the complete data. In the simulations, simple substitution was always the most biased method, the Tobit approach gave the least biased estimates at all censoring levels and correlation values, the Bayesian model and both MI approaches gave slightly biased estimates but smaller root mean square errors. On the basis of these results the Bayesian approach is highly recommended for candidate gene studies; however, the computationally simpler Tobit and the MI without are both good options for genome-wide studies.

]]>In a linear multilevel model, significance of all fixed effects can be determined using *F* tests under maximum likelihood (ML) or restricted maximum likelihood (REML). In this paper, we demonstrate that in the presence of primary unit sparseness, the performance of the *F* test under both REML and ML is rather poor. Using simulations based on the structure of a data example on ceftriaxone consumption in hospitalized children, we studied variability, type I error rate and power in scenarios with a varying number of secondary units within the primary units. In general, the variability in the estimates for the effect of the primary unit decreased as the number of secondary units increased. In the presence of singletons (i.e., only one secondary unit within a primary unit), REML consistently outperformed ML, although even under REML the performance of the *F* test was found inadequate. When modeling the primary unit as a random effect, the power was lower while the type I error rate was unstable. The options of dropping, regrouping, or splitting the singletons could solve either the problem of a high type I error rate or a low power, while worsening the other. The permutation test appeared to be a valid alternative as it outperformed the *F* test, especially under REML. We conclude that in the presence of singletons, one should be careful in using the *F* test to determine the significance of the fixed effects, and propose the permutation test (under REML) as an alternative.

Discrete state-space models are used in ecology to describe the dynamics of wild animal populations, with parameters, such as the probability of survival, being of ecological interest. For a particular parametrization of a model it is not always clear which parameters can be estimated. This inability to estimate all parameters is known as parameter redundancy or a model is described as nonidentifiable. In this paper we develop methods that can be used to detect parameter redundancy in discrete state-space models. An exhaustive summary is a combination of parameters that fully specify a model. To use general methods for detecting parameter redundancy a suitable exhaustive summary is required. This paper proposes two methods for the derivation of an exhaustive summary for discrete state-space models using discrete analogues of methods for continuous state-space models. We also demonstrate that combining multiple data sets, through the use of an integrated population model, may result in a model in which all parameters are estimable, even though models fitted to the separate data sets may be parameter redundant.

]]>One of the main goals in spatial epidemiology is to study the geographical pattern of disease risks. For such purpose, the convolution model composed of correlated and uncorrelated components is often used. However, one of the two components could be predominant in some regions. To investigate the predominance of the correlated or uncorrelated component for multiple scale data, we propose four different spatial mixture multiscale models by mixing spatially varying probability weights of correlated (CH) and uncorrelated heterogeneities (UH). The first model assumes that there is no linkage between the different scales and, hence, we consider independent mixture convolution models at each scale. The second model introduces linkage between finer and coarser scales via a shared uncorrelated component of the mixture convolution model. The third model is similar to the second model but the linkage between the scales is introduced through the correlated component. Finally, the fourth model accommodates for a scale effect by sharing both CH and UH simultaneously. We applied these models to real and simulated data, and found that the fourth model is the best model followed by the second model.

]]>In this paper, we propose a test procedure to detect change points of multidimensional autoregressive processes. The considered process differs from typical applied spatial autoregressive processes in that it is assumed to evolve from a predefined center into every dimension. Additionally, structural breaks in the process can occur at a certain distance from the predefined center. The main aim of this paper is to detect such spatial changes. In particular, we focus on shifts in the mean and the autoregressive parameter. The proposed test procedure is based on the likelihood-ratio approach. Eventually, the goodness-of-fit values of the estimators are compared for different shifts. Moreover, the empirical distribution of the test statistic of the likelihood-ratio test is obtained via Monte Carlo simulations. We show that the generalized Gumbel distribution seems to be a suitable limiting distribution of the proposed test statistic. Finally, we discuss the detection of lung cancer in computed tomography scans and illustrate the proposed test procedure.

]]>Disease mapping of a single disease has been widely studied in the public health setup. Simultaneous modeling of related diseases can also be a valuable tool both from the epidemiological and from the statistical point of view. In particular, when we have several measurements recorded at each spatial location, we need to consider multivariate models in order to handle the dependence among the multivariate components as well as the spatial dependence between locations. It is then customary to use multivariate spatial models assuming the same distribution through the entire population density. However, in many circumstances, it is a very strong assumption to have the same distribution for all the areas of population density. To overcome this issue, we propose a hierarchical multivariate mixture generalized linear model to simultaneously analyze spatial Normal and non-Normal outcomes. As an application of our proposed approach, esophageal and lung cancer deaths in Minnesota are used to show the outperformance of assuming different distributions for different counties of Minnesota rather than assuming a single distribution for the population density. Performance of the proposed approach is also evaluated through a simulation study.

]]>Recently, personalized medicine has received great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient's characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally the multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. However, using median as the cutoff value is quite subjective and sometimes may be inappropriate in situations where data are imbalanced. Here, we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply k-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to two public cancer data sets are also conducted for illustration.

]]>Existing cure-rate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. This paper proposes a novel class of cure-rate model, the transform-both-sides cure-rate model (TBSCRM), that can be used to make inferences about both the cure-rate and the survival quantiles. We develop the Bayesian inference about the covariate effects on the cure-rate as well as on the survival quantiles via Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRM-based Bayesian method outperforms existing cure-rate models based methods in our simulation studies and in application to the breast cancer survival data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database.

]]>Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa.

]]>In many studies in medicine, including clinical trials and epidemiological investigations, data are clustered into groups such as health centers or herds in veterinary medicine. Such data are usually analyzed by hierarchical regression models to account for possible variation between groups. When such variation is large, it is of potential interest to explore whether additionally the effect of a within-group predictor varies between groups. In survival analysis, this may be investigated by including two frailty terms at group level in a Cox proportional hazards model. Several estimation methods have been proposed to estimate this type of frailty Cox models. We review four of these methods, apply them to real data from veterinary medicine, and compare them using a simulation study.

]]>The interest in individualized medicines and upcoming or renewed regulatory requests to assess treatment effects in subgroups of confirmatory trials requires statistical methods that account for selection uncertainty and selection bias after having performed the search for meaningful subgroups. The challenge is to judge the strength of the apparent findings after mining the same data to discover them. In this paper, we describe a resampling approach that allows to replicate the subgroup finding process many times. The replicates are used to adjust the effect estimates for selection bias and to provide variance estimators that account for selection uncertainty. A simulation study provides some evidence of the performance of the method and an example from oncology illustrates its use.

]]>In this work we propose the use of functional data analysis (FDA) to deal with a very large dataset of atmospheric aerosol size distribution resolved in both space and time. Data come from a mobile measurement platform in the town of Perugia (Central Italy). An OPC (Optical Particle Counter) is integrated on a cabin of the Minimetrò, an urban transportation system, that moves along a monorail on a line transect of the town. The OPC takes a sample of air every six seconds and counts the number of particles of urban aerosols with a diameter between 0.28 m and 10 m and classifies such particles into 21 size bins according to their diameter. Here, we adopt a 2D functional data representation for each of the 21 spatiotemporal series. In fact, space is unidimensional since it is measured as the distance on the monorail from the base station of the Minimetrò. FDA allows for a reduction of the dimensionality of each dataset and accounts for the high space-time resolution of the data. Functional cluster analysis is then performed to search for similarities among the 21 size channels in terms of their spatiotemporal pattern. Results provide a good classification of the 21 size bins into a relatively small number of groups (between three and four) according to the season of the year. Groups including coarser particles have more similar patterns, while those including finer particles show a more different behavior according to the period of the year. Such features are consistent with the physics of atmospheric aerosol and the highlighted patterns provide a very useful ground for prospective model-based studies.

]]>There are several arthropods that can transmit disease to humans. To make inferences about the rate of infection of these arthropods, it is common to collect a large sample of vectors, divide them into groups (called pools), and apply a test to detect infection. This paper presents an approximate likelihood point estimator to rate of infection for pools of different sizes, when the variability of these sizes is small and the infection rate is low. The performance of this estimator was evaluated in four simulated scenarios, created from real experiments selected in the literature. The new estimator performed well in three of these scenarios. As expected, the new estimator performed poorly in the scenario with great variability in the size of the pools for some values of the parameter space.

]]>