The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods.

]]>Many attempts have been made to formalize ethical requirements for research. Among the most prominent mechanisms are informed consent requirements and data protection regimes. These mechanisms, however, sometimes appear as obstacles to research. In this opinion paper, we critically discuss conventional approaches to research ethics that emphasize consent and data protection. Several recent debates have highlighted other important ethical issues and underlined the need for greater openness in order to uphold the integrity of health-related research. Some of these measures, such as the sharing of individual-level data, pose problems for standard understandings of consent and privacy. Here, we argue that these interpretations tend to be overdemanding: They do not really protect research subjects and they hinder the research process. Accordingly, we suggest another way of framing these requirements. Individual consent must be situated alongside the wider distribution of knowledge created when the actions, commitments, and procedures of researchers and their institutions are opened to scrutiny. And instead of simply emphasizing privacy or data protection, we should understand confidentiality as a principle that facilitates the sharing of information while upholding important safeguards. Consent and confidentiality belong to a broader set of safeguards and procedures to uphold the integrity of the research process.

]]>Pooled study designs, where individual biospecimens are combined prior to measurement via a laboratory assay, can reduce lab costs while maintaining statistical efficiency. Analysis of the resulting pooled measurements, however, often requires specialized techniques. Existing methods can effectively estimate the relation between a binary outcome and a continuous pooled exposure when pools are matched on disease status. When pools are of mixed disease status, however, the existing methods may not be applicable. By exploiting characteristics of the gamma distribution, we propose a flexible method for estimating odds ratios from pooled measurements of mixed and matched status. We use simulation studies to compare consistency and efficiency of risk effect estimates from our proposed methods to existing methods. We then demonstrate the efficacy of our method applied to an analysis of pregnancy outcomes and pooled cytokine concentrations. Our proposed approach contributes to the toolkit of available methods for analyzing odds ratios of a pooled exposure, without restricting pools to be matched on a specific outcome.

]]>A diagnostic cut-off point of a biomarker measurement is needed for classifying a random subject to be either diseased or healthy. However, the cut-off point is usually unknown and needs to be estimated by some optimization criteria. One important criterion is the Youden index, which has been widely adopted in practice. The Youden index, which is defined as the maximum of (sensitivity + specificity −1), directly measures the largest total diagnostic accuracy a biomarker can achieve. Therefore, it is desirable to estimate the optimal cut-off point associated with the Youden index. Sometimes, taking the actual measurements of a biomarker is very difficult and expensive, while ranking them without the actual measurement can be relatively easy. In such cases, ranked set sampling can give more precise estimation than simple random sampling, as ranked set samples are more likely to span the full range of the population. In this study, kernel density estimation is utilized to numerically solve for an estimate of the optimal cut-off point. The asymptotic distributions of the kernel estimators based on two sampling schemes are derived analytically and we prove that the estimators based on ranked set sampling are relatively more efficient than that of simple random sampling and both estimators are asymptotically unbiased. Furthermore, the asymptotic confidence intervals are derived. Intensive simulations are carried out to compare the proposed method using ranked set sampling with simple random sampling, with the proposed method outperforming simple random sampling in all cases. A real data set is analyzed for illustrating the proposed method.

]]>The multilevel item response theory (MLIRT) models have been increasingly used in longitudinal clinical studies that collect multiple outcomes. The MLIRT models account for all the information from multiple longitudinal outcomes of mixed types (e.g., continuous, binary, and ordinal) and can provide valid inference for the overall treatment effects. However, the continuous outcomes and the random effects in the MLIRT models are often assumed to be normally distributed. The normality assumption can sometimes be unrealistic and thus may produce misleading results. The normal/independent (NI) distributions have been increasingly used to handle the outlier and heavy tail problems in order to produce robust inference. In this article, we developed a Bayesian approach that implemented the NI distributions on both continuous outcomes and random effects in the MLIRT models and discussed different strategies of implementing the NI distributions. Extensive simulation studies were conducted to demonstrate the advantage of our proposed models, which provided parameter estimates with smaller bias and more reasonable coverage probabilities. Our proposed models were applied to a motivating Parkinson's disease study, the DATATOP study, to investigate the effect of deprenyl in slowing down the disease progression.

]]>Reproducible research (RR) constitutes the idea that a publication should be accompanied by all relevant material to reproduce the results and findings of a scientific work. Hence, results can be verified and researchers are able to build upon these. Efforts of the *Biometrical Journal* over the last five years have increased the number of manuscripts which are reproducible by a factor of 4 to almost 50%. Yet, more than half of the code submission could not be executed in the initial review due to missing code, missing data or errors in the code. Careful checks of the submitted code as part of the reviewing process are essential to eliminate these issues and to foster RR. In this article, we reviewed recent submissions of code and data to identify common reproducibility issues. Based on these findings, guidelines for structuring code submission to the *Biometrical Journal* have been established to help authors. These guidelines should help researchers to implement RR in general. Together with the code reviews, this supports the mission of the *Biometrical Journal*
in publishing highest quality, novel and relevant papers on statistical methods and their applications in life sciences. Source code and data to reproduce the presented data analyses are available as Supplementary Material on the journal's web page.

We propose tests for main and simple treatment effects, time effects, as well as treatment by time interactions in possibly high-dimensional multigroup repeated measures designs. The proposed inference procedures extend the work by Brunner et al. (2012) from two to several treatment groups and remain valid for unbalanced data and under unequal covariance matrices. In addition to showing consistency when sample size and dimension tend to infinity at the same rate, we provide finite sample approximations and evaluate their performance in a simulation study, demonstrating better maintenance of the nominal α-level than the popular Box-Greenhouse–Geisser and Huynh–Feldt methods, and a gain in power for informatively increasing dimension. Application is illustrated using electroencephalography (EEG) data from a neurological study involving patients with Alzheimer's disease and other cognitive impairments.

]]>The purpose of this article is to make the standard promotion cure rate model (Yakovlev and Tsodikov, ) more flexible by assuming that the number of lesions or altered cells after a treatment follows a fractional Poisson distribution (Laskin, ). It is proved that the well-known Mittag-Leffler relaxation function (Berberan-Santos, ) is a simple way to obtain a new cure rate model that is a compromise between the promotion and geometric cure rate models allowing for superdispersion. So, the relaxed cure rate model developed here can be considered as a natural and less restrictive extension of the popular Poisson cure rate model at the cost of an additional parameter, but a competitor to negative-binomial cure rate models (Rodrigues et al., ). Some mathematical properties of a proper relaxed Poisson density are explored. A simulation study and an illustration of the proposed cure rate model from the Bayesian point of view are finally presented.

]]>By starting from the Johnson distribution pioneered by Johnson (), we propose a broad class of distributions with bounded support on the basis of the symmetric family of distributions. The new class of distributions provides a rich source of alternative distributions for analyzing univariate bounded data. A comprehensive account of the mathematical properties of the new family is provided. We briefly discuss estimation of the model parameters of the new class of distributions based on two estimation methods. Additionally, a new regression model is introduced by considering the distribution proposed in this article, which is useful for situations where the response is restricted to the standard unit interval and the regression structure involves regressors and unknown parameters. The regression model allows to model both location and dispersion effects. We define two residuals for the proposed regression model to assess departures from model assumptions as well as to detect outlying observations, and discuss some influence methods such as the local influence and generalized leverage. Finally, an application to real data is presented to show the usefulness of the new regression model.

]]>We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real-data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data.

]]>Several intervals have been proposed to quantify the agreement of two methods intended to measure the same quantity in the situation where only one measurement per method and subject is available. The limits of agreement are probably the most well-known among these intervals, which are all based on the differences between the two measurement methods. The different meanings of the intervals are not always properly recognized in applications. However, at least for small-to-moderate sample sizes, the differences will be substantial. This is illustrated both using the width of the intervals and on probabilistic scales related to the definitions of the intervals. In particular, for small-to-moderate sample sizes, it is shown that limits of agreement and prediction intervals should not be used to make statements about the distribution of the differences between the two measurement methods or about a plausible range for all future differences. Care should therefore be taken to ensure the correct choice of the interval for the intended interpretation.

]]>Nonlinear (systems of) ordinary differential equations (ODEs) are common tools in the analysis of complex one-dimensional dynamic systems. We propose a smoothing approach regularized by a quasilinearized ODE-based penalty. Within the quasilinearized spline-based framework, the estimation reduces to a conditionally linear problem for the optimization of the spline coefficients. Furthermore, standard ODE compliance parameter(s) selection criteria are applicable. We evaluate the performances of the proposed strategy through simulated and real data examples. Simulation studies suggest that the proposed procedure ensures more accurate estimates than standard nonlinear least squares approaches when the state (initial and/or boundary) conditions are not known.

]]>Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer–Lemeshow () and Pigeon–Heyse (*J*^{2}) statistics can be applied directly. In a simulation study, , , and *J*^{2} were used to evaluate the fit of probit, log–log, complementary log–log, and log models, all calculated with a common grouping method. The statistic consistently maintained Type I error rates, while those of and *J*^{2} were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, had more power than or *J*^{2}.

We develop time-varying association analyses for onset ages of two lung infections to address the statistical challenges in utilizing registry data where onset ages are left-truncated by ages of entry and competing-risk censored by deaths. Two types of association estimators are proposed based on conditional cause-specific hazard function and cumulative incidence function that are adapted from unconditional quantities to handle left truncation. Asymptotic properties of the estimators are established by using the empirical process techniques. Our simulation study shows that the estimators perform well with moderate sample sizes. We apply our methods to the Cystic Fibrosis Foundation Registry data to study the relationship between onset ages of *Pseudomonas aeruginosa* and *Staphylococcus aureus* infections.

There have been considerable advances in the methodology for estimating dynamic treatment regimens, and for the design of sequential trials that can be used to collect unconfounded data to inform such regimens. However, relatively little attention has been paid to how such methodology could be used to advance understanding of optimal treatment strategies in a continuous dose setting, even though it is often the case that considerable patient heterogeneity in drug response along with a narrow therapeutic window may necessitate the tailoring of dosing over time. Such is the case with warfarin, a common oral anticoagulant. We propose novel, realistic simulation models based on pharmacokinetic-pharmacodynamic properties of the drug that can be used to evaluate potentially optimal dosing strategies. Our results suggest that this methodology can lead to a dosing strategy that performs well both within and across populations with different pharmacokinetic characteristics, and may assist in the design of randomized trials by narrowing the list of potential dosing strategies to those which are most promising.

]]>We propose criteria for variable selection in the mean model and for the selection of a working correlation structure in longitudinal data with dropout missingness using weighted generalized estimating equations. The proposed criteria are based on a weighted quasi-likelihood function and a penalty term. Our simulation results show that the proposed criteria frequently select the correct model in candidate mean models. The proposed criteria also have good performance in selecting the working correlation structure for binary and normal outcomes. We illustrate our approaches using two empirical examples. In the first example, we use data from a randomized double-blind study to test the cancer-preventing effects of beta carotene. In the second example, we use longitudinal CD4 count data from a randomized double-blind study.

]]>The self-controlled case series (SCCS) method, commonly used to investigate the safety of vaccines, requires information on cases only and automatically controls all age-independent multiplicative confounders, while allowing for an age-dependent baseline incidence. Currently, the SCCS method represents the time-varying exposures using step functions with pre-determined cut points. A less prescriptive approach may be beneficial when the shape of the relative risk function associated with exposure is not known a priori, especially when exposure effects can be long-lasting. We therefore propose to model exposure effects using flexible smooth functions. Specifically, we used a linear combination of cubic M-splines which, in addition to giving plausible shapes, avoids the integral in the log-likelihood function of the SCCS model. The methods, though developed specifically for vaccines, are applicable more widely. Simulations showed that the new approach generally performs better than the step function method. We applied the new method to two data sets, on febrile convulsion and exposure to MMR vaccine, and on fractures and thiazolidinedione use.

]]>The aim of dose finding studies is sometimes to estimate parameters in a fitted model. The precision of the parameter estimates should be as high as possible. This can be obtained by increasing the number of subjects in the study, *N*, choosing a good and efficient estimation approach, and by designing the dose finding study in an optimal way. Increasing the number of subjects is not always feasible because of increasing cost, time limitations, etc. In this paper, we assume fixed *N* and consider estimation approaches and study designs for multiresponse dose finding studies. We work with diabetes dose–response data and compare a system estimation approach that fits a multiresponse Emax model to the data to equation-by-equation estimation that fits uniresponse Emax models to the data. We then derive some optimal designs for estimating the parameters in the multi- and uniresponse Emax model and study the efficiency of these designs.

Dropouts are common in longitudinal study. If the dropout probability depends on the missing observations at or after dropout, this type of dropout is called informative (or nonignorable) dropout (ID). Failure to accommodate such dropout mechanism into the model will bias the parameter estimates. We propose a conditional autoregressive model for longitudinal binary data with an ID model such that the probabilities of positive outcomes as well as the drop-out indicator in each occasion are logit linear in some covariates and outcomes. This model adopting a marginal model for outcomes and a conditional model for dropouts is called a selection model. To allow for the heterogeneity and clustering effects, the outcome model is extended to incorporate mixture and random effects. Lastly, the model is further extended to a novel model that models the outcome and dropout jointly such that their dependency is formulated through an odds ratio function. Parameters are estimated by a Bayesian approach implemented using the user-friendly Bayesian software WinBUGS. A methadone clinic dataset is analyzed to illustrate the proposed models. Result shows that the treatment time effect is still significant but weaker after allowing for an ID process in the data. Finally the effect of drop-out on parameter estimates is evaluated through simulation studies.

]]>Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed.

]]>In longitudinal studies of disease, patients may experience several events through a follow-up period. In these studies, the sequentially ordered events are often of interest and lead to problems that have received much attention recently. Issues of interest include the estimation of bivariate survival, marginal distributions, and the conditional distribution of gap times. In this work, we consider the estimation of the survival function conditional to a previous event. Different nonparametric approaches will be considered for estimating these quantities, all based on the Kaplan–Meier estimator of the survival function. We explore the finite sample behavior of the estimators through simulations. The different methods proposed in this article are applied to a dataset from a German Breast Cancer Study. The methods are used to obtain predictors for the conditional survival probabilities as well as to study the influence of recurrence in overall survival.

]]>Person-time incidence rates are frequently used in medical research. However, standard estimation theory for this measure of event occurrence is based on the assumption of independent and identically distributed (iid) exponential event times, which implies that the hazard function remains constant over time. Under this assumption and assuming independent censoring, observed person-time incidence rate is the maximum-likelihood estimator of the constant hazard, and asymptotic variance of the log rate can be estimated consistently by the inverse of the number of events. However, in many practical applications, the assumption of constant hazard is not very plausible. In the present paper, an average rate parameter is defined as the ratio of expected event count to the expected total time at risk. This rate parameter is equal to the hazard function under constant hazard. For inference about the average rate parameter, an asymptotically robust variance estimator of the log rate is proposed. Given some very general conditions, the robust variance estimator is consistent under arbitrary iid event times, and is also consistent or asymptotically conservative when event times are independent but nonidentically distributed. In contrast, the standard maximum-likelihood estimator may become anticonservative under nonconstant hazard, producing confidence intervals with less-than-nominal asymptotic coverage. These results are derived analytically and illustrated with simulations. The two estimators are also compared in five datasets from oncology studies.

]]>We evaluate the spatiotemporal changes in the density of a particular species of crustacean known as deep-water rose shrimp, *Parapenaeus longirostris*, based on biological sample data collected during trawl surveys carried out from 1995 to 2006 as part of the international project MEDITS (MEDiterranean International Trawl Surveys). As is the case for many biological variables, density data are continuous and characterized by unusually large amounts of zeros, accompanied by a skewed distribution of the remaining values. Here we analyze the normalized density data by a Bayesian delta-normal semiparametric additive model including the effects of covariates, using penalized regression with low-rank thin-plate splines for nonlinear spatial and temporal effects. Modeling the zero and nonzero values by two joint processes, as we propose in this work, allows to obtain great flexibility and easily handling of complex likelihood functions, avoiding inaccurate statistical inferences due to misclassification of the high proportion of exact zeros in the model. Bayesian model estimation is obtained by Markov chain Monte Carlo simulations, suitably specifying the complex likelihood function of the zero-inflated density data. The study highlights relevant nonlinear spatial and temporal effects and the influence of the annual Mediterranean oscillations index and of the sea surface temperature on the distribution of the deep-water rose shrimp density.

The bootstrap method has become a widely used tool applied in diverse areas where results based on asymptotic theory are scarce. It can be applied, for example, for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently, some approaches have been proposed in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it were the original sample. *P*-values computed from bootstrap samples have been used, for example, in the statistics and bioinformatics literature for ranking genes with respect to their differential expression, for estimating the variability of *p*-values and for model stability investigations. Procedures which make use of bootstrapped information criteria are often applied in model stability investigations and model averaging approaches as well as when estimating the error of model selection procedures which involve tuning parameters. From the literature, however, there is evidence that *p*-values and model selection criteria evaluated on bootstrap data sets do not represent what would be obtained on the original data or new data drawn from the overall population. We explain the reasons for this and, through the use of a real data set and simulations, we assess the practical impact on procedures relevant to biometrical applications in cases where it has not yet been studied. Moreover, we investigate the behavior of subsampling (i.e., drawing from a data set without replacement) as a potential alternative solution to the bootstrap for these procedures.

Health researchers are often interested in assessing the direct effect of a treatment or exposure on an outcome variable, as well as its indirect (or mediation) effect through an intermediate variable (or mediator). For an outcome following a nonlinear model, the mediation formula may be used to estimate causally interpretable mediation effects. This method, like others, assumes that the mediator is observed. However, as is common in structural equations modeling, we may wish to consider a latent (unobserved) mediator. We follow a potential outcomes framework and assume a generalized structural equations model (GSEM). We provide maximum-likelihood estimation of GSEM parameters using an approximate Monte Carlo EM algorithm, coupled with a mediation formula approach to estimate natural direct and indirect effects. The method relies on an untestable sequential ignorability assumption; we assess robustness to this assumption by adapting a recently proposed method for sensitivity analysis. Simulation studies show good properties of the proposed estimators in plausible scenarios. Our method is applied to a study of the effect of mother education on occurrence of adolescent dental caries, in which we examine possible mediation through latent oral health behavior.

]]>In many biological applications, for example high-dimensional metabolic data, the measurements consist of several continuous measurements of subjects or tissues over multiple attributes or metabolites. Measurement values are put in a matrix with subjects in rows and attributes in columns. The analysis of such data requires grouping subjects and attributes to provide a primitive guide toward data modeling. A common approach is to group subjects and attributes separately, and construct a two-dimensional dendrogram tree, once on rows and then on columns. This simple approach provides a grouping visualization through two separate trees, which is difficult to interpret jointly. When a joint grouping of rows and columns is of interest, it is more natural to partition the data matrix directly. Our suggestion is to build a dendrogram on the matrix directly, thus generalizing the two-dimensional dendrogram tree to a three-dimensional forest. The contribution of this research to the statistical analysis of metabolic data is threefold. First, a novel spike-and-slab model in various hierarchies is proposed to identify discriminant rows and columns. Second, an agglomerative approach is suggested to organize joint clusters. Third, a new visualization tool is invented to demonstrate the collection of joint clusters. The new method is motivated over gas chromatography mass spectrometry (GCMS) metabolic data, but can be applied to other continuous measurements with spike at zero property.

]]>In chemical risk assessment, it is important to determine the quantiles of the distribution of concentration data. The selection of an appropriate distribution and the estimation of particular quantiles of interest are largely hindered by the omnipresence of observations below the limit of detection, leading to left-censored data. The log-normal distribution is a common choice, but this distribution is not the only possibility and alternatives should be considered as well. Here, we focus on several distributions that are related to the log-normal distribution or that are seminonparametric extensions of the log-normal distribution. Whereas previous work focused on the estimation of the cumulative distribution function, our interest here goes to the estimation of quantiles, particularly in the left tail of the distribution where most of the left-censored data are located. Two different model averaged quantile estimators are defined and compared for different families of candidate models. The models and methods of selection and averaging are further investigated through simulations and illustrated on data of cadmium concentration in food products. The approach is extended to include covariates and to deal with uncertainty about the values of the limit of detection. These extensions are illustrated with ^{134}cesium measurements from Fukushima Prefecture, Japan. We can conclude that averaged models do achieve good performance characteristics in case no useful prior knowledge about the true distribution is available; that there is no structural difference in the performance of the direct and indirect method; and that, not surprisingly, only the true or closely approximating model can deal with extremely high percentages of censoring.

We describe a mixed-effects model for nonnegative continuous cross-sectional data in a two-part modelling framework. A potentially endogenous binary variable is included in the model specification and association between the outcomes is modeled through a (discrete) latent structure. We show how model parameters can be estimated in a finite mixture context, allowing for skewness, multivariate association between random effects and endogeneity. The model behavior is investigated through a large-scale simulation experiment. The proposed model is computationally parsimonious and seems to produce acceptable results even if the underlying random effects structure follows a continuous parametric (e.g. Gaussian) distribution. The proposed approach is motivated by the analysis of a sample taken from the Medical Expenditure Panel Survey. The analyzed outcome, that is ambulatory health expenditure, is a mixture of zeros and continuous values. The effects of socio-demographic characteristics on health expenditure are investigated and, as a by-product of the estimation procedure, two subpopulations (i.e. high and low users) are identified.

]]>The development of methods for dealing with continuous data with a spike at zero has lagged behind those for overdispersed or zero-inflated count data. We consider longitudinal ecological data corresponding to an annual average of 26 weekly maximum counts of birds, and are hence effectively continuous, bounded below by zero but also with a discrete mass at zero. We develop a Bayesian hierarchical Tweedie regression model that can directly accommodate the excess number of zeros common to this type of data, whilst accounting for both spatial and temporal correlation. Implementation of the model is conducted in a Markov chain Monte Carlo (MCMC) framework, using reversible jump MCMC to explore uncertainty across both parameter and model spaces. This regression modelling framework is very flexible and removes the need to make strong assumptions about mean-variance relationships *a priori*. It can also directly account for the spike at zero, whilst being easily applicable to other types of data and other model formulations. Whilst a correlative study such as this cannot prove causation, our results suggest that an increase in an avian predator may have led to an overall decrease in the number of one of its prey species visiting garden feeding stations in the United Kingdom. This may reflect a change in behaviour of house sparrows to avoid feeding stations frequented by sparrowhawks, or a reduction in house sparrow population size as a result of sparrowhawk increase.

We develop an asymptotic likelihood ratio test for multivariate lognormal data with a point mass at zero in each dimension. The test generalizes Wilks' lambda and Hotelling *T*-test to the case of semicontinuous data. Simulations show that the resulting test statistic attains the nominal Type I error rate and has good power for reasonable alternatives. We conclude with an application to exploration of ecological niches of trees of South Africa.

Survey data often contain measurements for variables that are semicontinuous in nature, i.e. they either take a single fixed value (we assume this is zero) or they have a continuous, often skewed, distribution on the positive real line. Standard methods for small area estimation (SAE) based on the use of linear mixed models can be inefficient for such variables. We discuss SAE techniques for semicontinuous variables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the nonzero values of the response variable. In particular, we first model the excess zeros via a generalized linear mixed model fitted to the probability of a nonzero, i.e. strictly positive, value being observed, and then model the response, given that it is strictly positive, using a linear mixed model fitted on the logarithmic scale. Empirical results suggest that the proposed method leads to efficient small area estimates for semicontinuous data of this type. We also propose a parametric bootstrap method to estimate the MSE of the proposed small area estimator. These bootstrap estimates of the MSE are compared to the true MSE in a simulation study.

]]>While benefit-risk assessment is a key component of the drug development and maintenance process, it is often described in a narrative. In contrast, structured benefit-risk assessment builds on established ideas from decision analysis and comprises a qualitative framework and quantitative methodology. We compare two such frameworks, applying multi-criteria decision-analysis (MCDA) within the PrOACT-URL framework and weighted net clinical benefit (wNCB), within the BRAT framework. These are applied to a case study of natalizumab for the treatment of relapsing remitting multiple sclerosis. We focus on the practical considerations of applying these methods and give recommendations for visual presentation of results. In the case study, we found structured benefit-risk analysis to be a useful tool for structuring, quantifying, and communicating the relative benefit and safety profiles of drugs in a transparent, rational and consistent way. The two frameworks were similar. MCDA is a generic and flexible methodology that can be used to perform a structured benefit-risk in any common context. wNCB is a special case of MCDA and is shown to be equivalent to an extension of the number needed to treat (NNT) principle. It is simpler to apply and understand than MCDA and can be applied when all outcomes are measured on a binary scale.

]]>Quantitative decision models such as multiple criteria decision analysis (MCDA) can be used in benefit-risk assessment to formalize trade-offs between benefits and risks, providing transparency to the assessment process. There is however no well-established method for propagating uncertainty of treatment effects data through such models to provide a sense of the variability of the benefit-risk balance. Here, we present a Bayesian statistical method that directly models the outcomes observed in randomized placebo-controlled trials and uses this to infer indirect comparisons between competing active treatments. The resulting treatment effects estimates are suitable for use within the MCDA setting, and it is possible to derive the distribution of the overall benefit-risk balance through Markov Chain Monte Carlo simulation. The method is illustrated using a case study of natalizumab for relapsing-remitting multiple sclerosis.

]]>At the beginning of 2011, the early benefit assessment of new drugs was introduced in Germany with the Act on the Reform of the Market for Medicinal Products (AMNOG). The Federal Joint Committee (G-BA) generally commissions the Institute for Quality and Efficiency in Health Care (IQWiG) with this type of assessment, which examines whether a new drug shows an added benefit (a positive patient-relevant treatment effect) over the current standard therapy. IQWiG is required to assess the extent of added benefit on the basis of a dossier submitted by the pharmaceutical company responsible. In this context, IQWiG was faced with the task of developing a transparent and plausible approach for operationalizing how to determine the extent of added benefit. In the case of an added benefit, the law specifies three main extent categories (minor, considerable, major). To restrict value judgements to a minimum in the first stage of the assessment process, an explicit and abstract operationalization was needed. The present paper is limited to the situation of binary data (analysis of 2 × 2 tables), using the relative risk as an effect measure. For the treatment effect to be classified as a minor, considerable, or major added benefit, the methodological approach stipulates that the (two-sided) 95% confidence interval of the effect must exceed a specified distance to the zero effect. In summary, we assume that our approach provides a robust, transparent, and thus predictable foundation to determine minor, considerable, and major treatment effects on binary outcomes in the early benefit assessment of new drugs in Germany. After a decision on the added benefit of a new drug by G-BA, the classification of added benefit is used to inform pricing negotiations between the umbrella organization of statutory health insurance and the pharmaceutical companies.

]]>Recently, the topic of assessing clinical relevance on top of statistical significance in the analysis of randomized control trials (RCTs) has got increasing attention, in particular as part of benefit assessments. Several formal criteria to serve this purpose have been published. In this paper, we present a framework to assess the value of the application of such criteria. We propose to quantify the need for the assessment of clinical relevance by the actual risk of having accepted a benefit for a treatment with an irrelevant effect in a successful RCT. We then study how this risk can be controlled by two popular criteria based on comparing the effect estimate or the lower bound of the confidence interval with a given threshold. We further propose to quantify the impact of using formal criteria by considering the expected costs when specifying error-specific costs for each of the three possible types of errors: A benefit may be accepted for a treatment, which is actually inferior, or which is not inferior, but only implies an irrelevant improvement, or a benefit may be rejected for a treatment implying a relevant improvement. This way we can demonstrate that the impact depends on parameters which are typically not explicitly defined in the frame of benefit assessments. Depending on the values of these parameters, formal checks of clinical relevance may imply better decisions on average, but they may also imply more harm than good on average.

]]>In 2010, the Federal Parliament (Bundestag) of Germany passed a new law (Arzneimittelmarktneuordnungsgesetz, AMNOG) on the regulation of medicinal products that applies to all pharmaceutical products with active ingredients that are launched beginning January 1, 2011. The law describes the process to determine the price at which an approved new product will be reimbursed by the statutory health insurance system. The process consists of two phases. The first phase assesses the additional benefit of the new product versus an appropriate comparator (zweckmäßige Vergleichstherapie, zVT). The second phase involves price negotiation. Focusing on the first phase, this paper investigates requirements of benefit assessment of a new product under this law with special attention on the methods applied by the German authorities on issues such as the choice of the comparator, patient relevant endpoints, subgroup analyses, extent of benefit, determination of net benefit, primary and secondary endpoints, and uncertainty of the additional benefit. We propose alternative approaches to address the requirements in some cases and invite other researchers to help develop solutions in other cases.

]]>Since the introduction of benefit assessment to support reimbursement decisions in Germany there seems to be the impression that totally distinct methodology and strategies for decision making would apply in the field of drug licensing and reimbursement. In this article, the position is held that, while decisions may differ due to differing mandates of drug licensing and reimbursement bodies, the underlying strategies are quite similar. For this purpose, we briefly summarize the legal basis for decision making in both fields from a methodological point of view, and review two recent decisions about reimbursement regarding grounds for approval. We comment on two examples, where decision making was based on the same pivotal studies in the licensing and reimbursement process. We conclude that strategies in the field of reimbursement are (from a methodological standpoint) until now more liberal than established rules in the field of drug licensing, but apply the same principles. Formal proof of efficacy preceding benefit assessment can thus be understood as a gatekeeper against principally wrong decision making about efficacy and risks of new drugs in full recognition that more is needed. We elaborate on the differences between formal proof of efficacy on the one hand and the assessment of benefit/risk or added benefit on the other hand, because it is important for statisticians to understand the difference between the two approaches.

]]>A surrogate endpoint is intended to replace a clinical endpoint for the evaluation of new treatments when it can be measured more cheaply, more conveniently, more frequently, or earlier than that clinical endpoint. A surrogate endpoint is expected to predict clinical benefit, harm, or lack of these. Besides the biological plausibility of a surrogate, a quantitative assessment of the strength of evidence for surrogacy requires the demonstration of the prognostic value of the surrogate for the clinical outcome, and evidence that treatment effects on the surrogate reliably predict treatment effects on the clinical outcome. We focus on these two conditions, and outline the statistical approaches that have been proposed to assess the extent to which these conditions are fulfilled. When data are available from a single trial, one can assess the “individual level association” between the surrogate and the true endpoint. When data are available from several trials, one can additionally assess the “trial level association” between the treatment effect on the surrogate and the treatment effect on the true endpoint. In the latter case, the “surrogate threshold effect” can be estimated as the minimum effect on the surrogate endpoint that predicts a statistically significant effect on the clinical endpoint. All these concepts are discussed in the context of randomized clinical trials in oncology, and illustrated with two meta-analyses in gastric cancer.

]]>Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the “truth” since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes).

]]>A method for simultaneously assessing noninferiority with respect to efficacy and superiority with respect to another endpoint in two-arm noninferiority trials is presented. The procedure controls both the average type I error rate for the intersection-union test problem and the frequentist type I error rate for the noninferiority test by α while allowing an increased level for the superiority test. For normally distributed outcomes, two methods are presented to deal with the uncertainty about the correlation between the endpoints which defines the adjusted levels. The operating characteristics of these procedures are investigated. Furthermore, the sample size required when applying the proposed method is compared with that of alternative procedures. Application of the method in the situation of binary endpoints and mixed normal and binary endpoints, respectively, is sketched. An illustrative example is provided demonstrating implementation of the proposed approach in a clinical trial.

]]>The analysis of count data is commonly done using Poisson models. Negative binomial models are a straightforward and readily motivated generalization for the case of overdispersed data, that is, when the observed variance is greater than expected under a Poissonian model. Rate and overdispersion parameters then need to be considered jointly, which in general is not trivial. Here, we are concerned with evidence synthesis in the case where the reporting of data is rather heterogeneous, that is, events are reported either in terms of mean event counts, the proportion of event-free patients, or rate estimates and standard errors. Either figure carries some information about the relevant parameters, and it is the joint modeling that allows for coherent inference on the parameters of interest. The methods are motivated and illustrated by a systematic review in chronic obstructive pulmonary disease.

]]>In addition to getting a preliminary assessment of efficacy, phase II trials can also help to determine dose(s) that have an acceptable toxicity profile over repeated cycles as well as identify subgroups with particularly poor toxicity profiles. Correct modeling of the dose-toxicity relationship in patients receiving multiple cycles of the same dose in oncology trials is crucial. A major challenge lies in taking advantage of the conditional nature of data collection, that is each cycle is observed conditional on having no previous toxicities on earlier cycles. We develop a novel and parsimonious model for the probability of toxicity during a cycle of therapy, conditional on not seeing toxicity in any of the previous cycles using a Markov model, hereafter we refer to these probabilities as conditional probabilities of toxicity. Our model allows the conditional probability of toxicity to depend on randomized dose group, cumulative dose from prior cycles, a measure of how consistently a patient responds to the same dose exposure and individual risk factors influencing the ability to tolerate the treatment regimen. Simulations studying finite sample properties of the model are given. Finally, the approach is demonstrated in a phase II trial studying two dose levels of ifosfamide plus doxorubicin and granulocyte colony-stimulating factor in soft tissue sarcoma patients over four cycles. The Markov model provides correct estimates of the probabilities of toxicity in finite sample simulations. It also correctly models the data from the phase II clinical trial, and identifies particularly high cumulative toxicity in females.

]]>Mixed effects Beta regression models based on Beta distributions have been widely used to analyze longitudinal percentage or proportional data ranging between zero and one. However, Beta distributions are not flexible to extreme outliers or excessive events around tail areas, and they do not account for the presence of the boundary values zeros and ones because these values are not in the support of the Beta distributions. To address these issues, we propose a mixed effects model using Beta rectangular distribution and augment it with the probabilities of zero and one. We conduct extensive simulation studies to assess the performance of mixed effects models based on both the Beta and Beta rectangular distributions under various scenarios. The simulation studies suggest that the regression models based on Beta rectangular distributions improve the accuracy of parameter estimates in the presence of outliers and heavy tails. The proposed models are applied to the motivating Neuroprotection Exploratory Trials in Parkinson's Disease (PD) Long-term Study-1 (LS-1 study, *n* = 1741), developed by The National Institute of Neurological Disorders and Stroke Exploratory Trials in Parkinson's Disease (NINDS NET-PD) network.

We discuss the semiparametric modeling of mark-recapture-recovery data where the temporal and/or individual variation of model parameters is explained via covariates. Typically, in such analyses a fixed (or mixed) effects parametric model is specified for the relationship between the model parameters and the covariates of interest. In this paper, we discuss the modeling of the relationship via the use of penalized splines, to allow for considerably more flexible functional forms. Corresponding models can be fitted via numerical maximum penalized likelihood estimation, employing cross-validation to choose the smoothing parameters in a data-driven way. Our contribution builds on and extends the existing literature, providing a unified inferential framework for semiparametric mark-recapture-recovery models for open populations, where the interest typically lies in the estimation of survival probabilities. The approach is applied to two real datasets, corresponding to gray herons (*Ardea cinerea*), where we model the survival probability as a function of environmental condition (a time-varying global covariate), and Soay sheep (*Ovis aries*), where we model the survival probability as a function of individual weight (a time-varying individual-specific covariate). The proposed semiparametric approach is compared to a standard parametric (logistic) regression and new interesting underlying dynamics are observed in both cases.