Mixed models are regularly used in the analysis of clustered data, but are only recently being used for imputation of missing data. In household surveys where multiple people are selected from each household, imputation of missing values should preserve the structure pertaining to people within households and should not artificially change the apparent intracluster correlation (ICC). This paper focuses on the use of multilevel models for imputation of missing data in household surveys. In particular, the performance of a best linear unbiased predictor for both stochastic and deterministic imputation using a linear mixed model is compared to imputation based on a single level linear model, both with and without information about household respondents. In this paper an evaluation is carried out in the context of imputing hourly wage rate in the Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated under various assumptions about the missingness mechanism for persons and households, and with low, moderate and high intra-household correlation to assess the benefits of the multilevel imputation model under different conditions. The mixed model and single level model with information about the household respondent lead to clear improvements when the ICC is moderate or high, and when there is informative missingness.

This article is based on a transcript of the Memorial Session held on 14 August 2013 at the 22nd International Workshop on Matrices and Statistics held at the University of Toronto, Toronto, Canada, to honour the life and contributions of Professor Emeritus Shayle Searle. The speakers were Jeffrey J. Hunter, David A. Harville, Jon N. K. Rao, Robert Rodriguez, and Shayle's daughters, Susan Searle and Heather Selvaggio.

The Fay–Herriot model is a standard model for direct survey estimators in which the true quantity of interest, the superpopulation mean, is latent and its estimation is improved through the use of auxiliary covariates. In the context of small area estimation, these estimates can be further improved by borrowing strength across spatial regions or by considering multiple outcomes simultaneously. We provide here two formulations to perform small area estimation with Fay–Herriot models that include both multivariate outcomes and latent spatial dependence. We consider two model formulations. In one of these formulations the outcome-by-space dependence structure is separable. The other accounts for the cross dependence through the use of a generalized multivariate conditional autoregressive (GMCAR) structure. The GMCAR model is shown, in a state-level example, to produce smaller mean square prediction errors, relative to equivalent census variables, than the separable model and the state-of-the-art multivariate model with unstructured dependence between outcomes and no spatial dependence. In addition, both the GMCAR and the separable models give smaller mean squared prediction error than the state-of-the-art model when conducting small area estimation on county level data from the American Community Survey.

This paper deals with statistical inference on the parameters of a stochastic model, describing curved fibrous objects in three dimensions, that is based on multivariate autoregressive processes. The model is fitted to experimental data consisting of a large number of short independently sampled trajectories of multivariate autoregressive processes. We discuss relevant statistical properties (e.g. asymptotic behaviour as the number of trajectories tends to infinity) of the maximum likelihood (ML) estimators for such processes. Numerical studies are also performed to analyse some of the more intractable properties of the ML estimators. Finally the whole methodology, i.e., the fibre model and its statistical inference, is applied to appropriately describe the tracking of fibres in real materials.

This paper develops a Twenty20 cricket simulator for matches between sides belonging to the International Cricket Council. As input, the simulator requires the probabilities of batting outcomes which are dependent on the batsman, the bowler, the number of overs consumed and the number of wickets lost. The determination of batting probabilities is based on an amalgam of standard classical estimation techniques and a hierarchical empirical Bayes approach where the probabilities of batting outcomes borrow information from related scenarios. Initially, the probabilities of batting outcomes are obtained for the first innings. In the second innings, the target score obtained from the first innings affects the aggressiveness of batting during the second innings. We use the target score to modify batting probabilities in the second innings simulation. This gives rise to the suggestion that teams may not be adjusting their second innings batting aggressiveness in an optimal way. The adequacy of the simulator is addressed through various goodness-of-fit diagnostics.

One of the most important issues in toxicity studies is the identification of the equivalence of treatments with a placebo. Because it is unacceptable to declare non-equivalent treatments to be equivalent, it is important to adopt a reliable statistical method to properly control the family-wise error rate (FWER). In dealing with this issue, it is important to keep in mind that overestimating toxicity equivalence is a more serious error than underestimating toxicity equivalence. Consequently asymmetric loss functions are more appropriate than symmetric loss functions. Recently Tao, Tang & Shi (2010) developed a new procedure based on an asymmetric loss function. However, their procedure is somewhat unsatisfactory because it assumes that the variances of various dose levels are known. This assumption is restrictive for some applications. In this study we propose an improved approach based on asymmetric confidence intervals without the restrictive assumption of known variances. The asymmetry guarantees reliability in the sense that the FWER is well controlled. Although our procedure is developed assuming that the variances of various dose levels are unknown but equal, simulation studies show that our procedure still performs quite well when the variances are unequal.

Consider a linear regression model with independent normally distributed errors. Suppose that the scalar parameter of interest is a specified linear combination of the components of the regression parameter vector. Also suppose that we have uncertain prior information that a distinct specified linear combination of these components takes the value zero. We provide succinct and informative descriptions of interval estimators for the parameter of interest using the new concepts of *scaled offset* and *scaled half-length*. We describe the Bayesian equi-tailed and shortest credible intervals for the parameter of interest that result from a prior density for the parameter about which we have uncertain prior information that is a mixture of a rectangular ‘slab’ and a Dirac delta function ‘spike’, combined with noninformative prior densities for the other parameters of the model. This prior belongs to the class of ‘slab and spike’ priors, which have been used for Bayesian variable selection. We compare these credible intervals with Kabaila and Giri's frequentist confidence interval for the parameter of interest that utilizes this uncertain prior information. We show that these frequentist and Bayesian interval estimators depend on the data in very different ways. We also consider some close variants of this prior distribution that lead to Bayesian and frequentist interval estimators with greater similarity. Nonetheless, as we show, substantial differences between these interval estimators remain.

We develop fast mean field variational methodology for Bayesian heteroscedastic semiparametric regression, in which both the mean and variance are smooth, but otherwise arbitrary, functions of the predictors. Our resulting algorithms are purely algebraic, devoid of numerical integration and Monte Carlo sampling. The locality property of mean field variational Bayes implies that the methodology also applies to larger models possessing variance function components. Simulation studies indicate good to excellent accuracy, and considerable time savings compared with Markov chain Monte Carlo. We also provide some illustrations from applications.

Quantile regression methods have been widely used in many research areas in recent years. However conventional estimation methods for quantile regression models do not guarantee that the estimated quantile curves will be non-crossing. While there are various methods in the literature to deal with this problem, many of these methods force the model parameters to lie within a subset of the parameter space in order for the required monotonicity to be satisfied. Note that different methods may use different subspaces of the space of model parameters. This paper establishes a relationship between the monotonicity of the estimated conditional quantiles and the comonotonicity of the model parameters. We develope a novel quasi-Bayesian method for parameter estimation which can be used to deal with both time series and independent statistical data. Simulation studies and an application to real financial returns show that the proposed method has the potential to be very useful in practice.

]]>