We propose a beta spatial linear mixed model with variable dispersion using Monte Carlo maximum likelihood. The proposed method is useful for those situations where the response variable is a rate or a proportion. An approach to the spatial generalized linear mixed models using the Box–Cox transformation in the precision model is presented. Thus, the parameter optimization process is developed for both the spatial mean model and the spatial variable dispersion model. All the parameters are estimated using Markov chain Monte Carlo maximum likelihood. Statistical inference over the parameters is performed using approximations obtained from the asymptotic normality of the maximum likelihood estimator. Diagnosis and prediction of a new observation are also developed. The method is illustrated with the analysis of one simulated case and two studies: clay and magnesium contents. In the clay study, 147 soil profile observations were taken from the research area of the Tropenbos Cameroon Programme, with explanatory variables: elevation in metres above sea level, agro-ecological zone, reference soil group and land cover type. In the magnesium content, the soil samples were taken from 0- to 20-cm-depth layer at each of the 178 locations, and the response variable is related to the spatial locations, altitude and sub-region. Copyright © 2015 John Wiley & Sons, Ltd.

]]>In this paper, a new randomized response model is proposed, which is shown to have a Cramer–Rao lower bound of variance that is lower than the Cramer–Rao lower bound of variance suggested by Singh and Sedory at equal protection or greater protection of respondents. A new measure of protection of respondents in the setup of the efficient use of two decks of cards, because of Odumade and Singh, is also suggested. The developed Cramer–Rao lower bounds of variances are compared under different situations through exact numerical illustrations. Survey data to estimate the proportion of students who have sometimes driven a vehicle after drinking alcohol and feeling over the legal limit are collected by using the proposed randomization device and then analyzed. The proposed randomized response technique is also compared with a black box technique within the same survey. A method to determine minimum sample size in randomized response sampling based on a small pilot survey is also given.

]]>In this paper, we examine the estimation of linear models subject to inequality constraints with a special focus on new variance approximations for the estimated parameters. For models with one inequality restriction, the proposed variance formulas are exact. The variance approximations proposed in this paper can be used in regression analysis, Kalman filtering, and balancing national accounts, when inequality constraints are to be incorporated in the estimation procedure.

]]>In statistical diagnostics and sensitivity analysis, the local influence method plays an important role and has certain advantages over other methods in several situations. In this paper, we use this method to study time series of count data when employing a Poisson autoregressive model. We consider case-weights, scale, data, and additive perturbation schemes to obtain their corresponding vectors and matrices of derivatives for the measures of slope and normal curvatures. Based on the curvature diagnostics, we take a stepwise local influence approach to deal with data with possible masking effects. Finally, our established results are illustrated to be effective by analyzing a stock transactions data set.

]]>No abstract is available for this article.

In the case of two independent samples, it turns out that among the procedures taken in consideration, BOSCHLOO'S technique of raising the nominal level in the standard conditional test as far as admissible performs best in terms of power against almost all alternatives. The computational burden entailed in exact sample size calculation is comparatively modest for both the uniformly most powerful unbiased randomized and the conservative non-randomized version of the exact Fisher-type test. Computing these values yields a pair of bounds enclosing the exact sample size required for the Boschloo test, and it seems reasonable to replace the exact value with the middle of the corresponding interval. Comparisons between these mid-N estimates and the fully exact sample sizes lead to the conclusion that the extra computational effort required for obtaining the latter is mostly dispensable. This holds also true in the case of paired binary data (McNemar setting). In the latter, the level-corrected score test turns out to be almost as powerful as the randomized uniformly most powerful unbiased test and should be preferred to the McNemar–Boschloo test. The mid-N rule provides a fairly tight upper bound to the exact sample size for the score test for paired proportions.

]]>In epidemiology and clinical research, there is often a proportion of unexposed individuals resulting in zero values of exposure, meaning that some individuals are not exposed and those exposed have some continuous distribution. Examples are smoking or alcohol consumption. We will call these variables with a spike at zero (SAZ). In this paper, we performed a systematic investigation on how to model covariates with a SAZ and derived theoretical odds ratio functions for selected bivariate distributions. We consider the bivariate normal and bivariate log normal distribution with a SAZ. Both confounding and effect modification can be elegantly described by formalizing the covariance matrix given the binary outcome variable *Y*. To model the effect of these variables, we use a procedure based on fractional polynomials first introduced by Royston and Altman (1994, *Applied Statistics* 43: 429–467) and modified for the SAZ situation (Royston and Sauerbrei, 2008, *Multivariable model-building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables*, Wiley; Becher *et al*., 2012, *Biometrical Journal* 54: 686–700). We aim to contribute to theory, practical procedures and application in epidemiology and clinical research to derive multivariable models for variables with a SAZ. As an example, we use data from a case–control study on lung cancer.

We propose a non-parametric test to compare two correlated diagnostic tests for a three-category classification problem. Our development was motivated by a proteomic study where the objectives are to detect glycan biomarkers for liver cancer and to compare the discrimination ability of various markers. Three distinct disease categories need to be identified from this analysis. We therefore chose to use three-dimensional receiver operating characteristic (ROC) surfaces and volumes under the ROC surfaces to describe the overall accuracy for different biomarkers. Each marker in this study might include a cluster of similar individual markers and thus was considered as a hierarchically structured sample. Our proposed statistical test incorporated the within-marker correlation as well as the between-marker correlation. We derived asymptotic distributions for three-dimensional ROC surfaces and subsequently implemented bootstrap methods to facilitate the inferences. Simulation and real-data analysis were included to illustrate our methods. Our distribution-free test may be simplified for paired and independent two-sample comparisons as well. Previously, only parametric tests were known for clustered and correlated three-category ROC analyses.

]]>Typical data that arise from surveys, experiments, and observational studies include continuous and discrete variables. In this article, we study the interdependence among a mixed (continuous, count, ordered categorical, and binary) set of variables via graphical models. We propose an *ℓ*_{1}-penalized extended rank likelihood with an ascent Monte Carlo expectation maximization approach for the copula Gaussian graphical models and establish near conditional independence relations and zero elements of a precision matrix. In particular, we focus on high-dimensional inference where the number of observations are in the same order or less than the number of variables under consideration. To illustrate how to infer networks for mixed variables through conditional independence, we consider two datasets: one in the area of sports and the other concerning breast cancer.

This paper introduces some new elements to measure the skewness of a probability distribution, suggesting that a given distribution can have both positive and negative skewness, depending on the centred sub-interval of the support set being observed. A skewness function for positive reals is defined, from which a bivariate index of positive–negative skewness is obtained. Certain interesting properties of this new index are studied, and they are also obtained for some common discrete distributions. We show the advantages of their use as a complement to the information derived by traditional measures of skewness.

]]>A test statistic is developed for making inference about a block-diagonal structure of the covariance matrix when the dimensionality *p* exceeds *n*, where *n* = *N* − 1 and *N* denotes the sample size. The suggested procedure extends the complete independence results. Because the classical hypothesis testing methods based on the likelihood ratio degenerate when *p* > *n*, the main idea is to turn instead to a distance function between the null and alternative hypotheses. The test statistic is then constructed using a consistent estimator of this function, where consistency is considered in an asymptotic framework that allows *p* to grow together with *n*. The suggested statistic is also shown to have an asymptotic normality under the null hypothesis. Some auxiliary results on the moments of products of multivariate normal random vectors and higher-order moments of the Wishart matrices, which are important for our evaluation of the test statistic, are derived. We perform empirical power analysis for a number of alternative covariance structures.

In this paper, we propose an automatic selection of the bandwidth of the recursive kernel estimators of a regression function defined by the stochastic approximation algorithm. We showed that, using the selected bandwidth and the stepsize which minimize the *mean weighted integrated squared error*, the recursive estimator will be better than the non-recursive one for small sample setting in terms of estimation error and computational costs. We corroborated these theoretical results through simulation study and a real dataset.

This paper concerns a class of model selection criteria based on cross-validation techniques and estimative predictive densities. Both the simple or leave-one-out and the multifold or leave-*m*-out cross-validation procedures are considered. These cross-validation criteria define suitable estimators for the expected Kullback–Liebler risk, which measures the expected discrepancy between the fitted candidate model and the true one. In particular, we shall investigate the potential bias of these estimators, under alternative asymptotic regimes for *m*. The results are obtained within the general context of independent, but not necessarily identically distributed, observations and by assuming that the candidate model may not contain the true distribution. An application to the class of normal regression models is also presented, and simulation results are obtained in order to gain some further understanding on the behavior of the estimators.