A fundamental theorem in hypothesis testing is the Neyman-Pearson (N-P) lemma, which creates the most powerful test of simple hypotheses. In this article, we establish Bayesian framework of hypothesis testing, and extend the Neyman-Pearson lemma to create the Bayesian most powerful test of general hypotheses, thus providing optimality theory to determine thresholds of Bayes factors. Unlike conventional Bayes tests, the proposed Bayesian test is able to control the type I error.

In this paper, we consider James–Stein shrinkage and pretest estimation methods for time series following generalized linear models when it is conjectured that some of the regression parameters may be restricted to a subspace. Efficient estimation strategies are developed when there are many covariates in the model and some of them are not statistically significant. Statistical properties of the pretest and shrinkage estimation methods including asymptotic distributional bias and risk are developed. We investigate the relative performances of shrinkage and pretest estimators with respect to the unrestricted maximum partial likelihood estimator (MPLE). We show that the shrinkage estimators have a lower relative mean squared error as compared to the unrestricted MPLE when the number of significant covariates exceeds two. Monte Carlo simulation experiments were conducted for different combinations of inactive covariates and the performance of each estimator was evaluated in terms of its mean squared error. The practical benefits of the proposed methods are illustrated using two real data sets.

In the regression setting, dimension reduction allows for complicated regression structures to be detected via visualisation in a low-dimensional framework. However, some popular dimension reduction methodologies fail to achieve this aim when faced with a problem often referred to as symmetric dependency. In this paper we show how vastly superior results can be achieved when carrying out response and predictor transformations for methods such as least squares and sliced inverse regression. These transformations are simple to implement and utilise estimates from other dimension reduction methods that are not faced with the symmetric dependency problem. We highlight the effectiveness of our approach via simulation and an example. Furthermore, we show that ordinary least squares can effectively detect multiple dimension reduction directions. Methods robust to extreme response values are also considered.

The book “Modeling to Inform Infectious Disease Control” by Niels G. Becker is reviewed.

]]>“Introduction to ecological sampling”, a practical and accessible monograph covering sampling techniques for environmental/ecological applications.

]]>An effective method for improving the communication skills of graduate students in statistics and biostatistics is to provide consultations with non-statistical researchers. Unfortunately, those experiences can be difficult to arrange or occur too infrequently to be reliable. The current study sought to help students develop both written and oral communication skills within an existing graduate biostatistics course by having students partake in role-playing consultations. Though the class size was small, the students felt these activities helped improve their oral and written communication skills, and made them more aware of a biostatistician's role in consulting. There was also modest improvement in the students perceived function as a consulting biostatistician. Simulated consultations can be an effective educational tool for promoting the development of soft skills necessary for developing successful statisticians, can be implemented in existing courses, and do not require reliance upon external collaborators. Embedding these types of exercises within an existing curriculum can also be a cost-effective alternative for programs that do not have formal consulting training.

Penalised likelihood methods, such as the least absolute shrinkage and selection operator (Lasso) and the smoothly clipped absolute deviation penalty, have become widely used for variable selection in recent years. These methods impose penalties on regression coefficients to shrink a subset of them towards zero to achieve parameter estimation and model selection simultaneously. The amount of shrinkage is controlled by the regularisation parameter. Popular approaches for choosing the regularisation parameter include cross-validation, various information criteria and bootstrapping methods that are based on mean square error. In this paper, a new data-driven method for choosing the regularisation parameter is proposed and the consistency of the method is established. It holds not only for the usual fixed-dimensional case but also for the divergent setting. Simulation results show that the new method outperforms other popular approaches. An application of the proposed method to motif discovery in gene expression analysis is included in this paper.

Single cohort stage-frequency data are considered when assessing the stage reached by individuals through destructive sampling. For this type of data, when all hazard rates are assumed constant and equal, Laplace transform methods have been applied in the past to estimate the parameters in each stage-duration distribution and the overall hazard rates. If hazard rates are not all equal, estimating stage-duration parameters using Laplace transform methods becomes complex. In this paper, two new models are proposed to estimate stage-dependent maturation parameters using Laplace transform methods where non-trivial hazard rates apply. The first model encompasses hazard rates that are constant within each stage but vary between stages. The second model encompasses time-dependent hazard rates within stages. Moreover, this paper introduces a method for estimating the hazard rate in each stage for the stage-wise constant hazard rates model. This work presents methods that could be used in specific types of laboratory studies, but the main motivation is to explore the relationships between stage maturation parameters that, in future work, could be exploited in applying Bayesian approaches. The application of the methodology in each model is evaluated using simulated data in order to illustrate the structure of these models.

Missing observations in both responses and covariates arise frequently in longitudinal studies. When missing data are missing not at random, inferences under the likelihood framework often require joint modelling of response and covariate processes, as well as missing data processes associated with incompleteness of responses and covariates. Specification of these four joint distributions is a nontrivial issue from the perspectives of both modelling and computation. To get around this problem, we employ pairwise likelihood formulations, which avoid the specification of third or higher order association structures. In this paper, we consider three specific missing data mechanisms which lead to further simplified pairwise likelihood (SPL) formulations. Under these missing data mechanisms, inference methods based on SPL formulations are developed. The resultant estimators are consistent, and enjoy better robustness and computation convenience. The performance is evaluated empirically though simulation studies. Longitudinal data from the National Population Health Survey and Waterloo Smoking Prevention Project are analysed to illustrate the usage of our methods.

This paper deals with a longitudinal semi-parametric regression model in a generalised linear model setup for repeated count data collected from a large number of independent individuals. To accommodate the longitudinal correlations, we consider a dynamic model for repeated counts which has decaying auto-correlations as the time lag increases between the repeated responses. The semi-parametric regression function involved in the model contains a specified regression function in some suitable time-dependent covariates and a non-parametric function in some other time-dependent covariates. As far as the inference is concerned, because the non-parametric function is of secondary interest, we estimate this function consistently using the independence assumption-based well-known quasi-likelihood approach. Next, the proposed longitudinal correlation structure and the estimate of the non-parametric function are used to develop a semi-parametric generalised quasi-likelihood approach for consistent and efficient estimation of the regression effects in the parametric regression function. The finite sample performance of the proposed estimation approach is examined through an intensive simulation study based on both large and small samples. Both balanced and unbalanced cluster sizes are incorporated in the simulation study. The asymptotic performances of the estimators are given. The estimation methodology is illustrated by reanalysing the well-known health care utilisation data consisting of counts of yearly visits to a physician by 180 individuals for four years and several important primary and secondary covariates.